+
+
+
\ No newline at end of file
diff --git a/ppt/Lecture1.pptx b/ppt/Lecture1.pptx
new file mode 100644
index 00000000..94acf701
Binary files /dev/null and b/ppt/Lecture1.pptx differ
diff --git a/ppt/Lecture10.pptx b/ppt/Lecture10.pptx
new file mode 100644
index 00000000..b7ba6a06
Binary files /dev/null and b/ppt/Lecture10.pptx differ
diff --git a/ppt/Lecture11.pptx b/ppt/Lecture11.pptx
new file mode 100644
index 00000000..1acc241f
Binary files /dev/null and b/ppt/Lecture11.pptx differ
diff --git a/ppt/Lecture12.pptx b/ppt/Lecture12.pptx
new file mode 100644
index 00000000..5785481f
Binary files /dev/null and b/ppt/Lecture12.pptx differ
diff --git a/ppt/Lecture13.pptx b/ppt/Lecture13.pptx
new file mode 100644
index 00000000..93db5f81
Binary files /dev/null and b/ppt/Lecture13.pptx differ
diff --git a/ppt/Lecture14.pptx b/ppt/Lecture14.pptx
new file mode 100644
index 00000000..dc2bb6ee
Binary files /dev/null and b/ppt/Lecture14.pptx differ
diff --git a/ppt/Lecture15.pptx b/ppt/Lecture15.pptx
new file mode 100644
index 00000000..3ab106dd
Binary files /dev/null and b/ppt/Lecture15.pptx differ
diff --git a/ppt/Lecture16.pptx b/ppt/Lecture16.pptx
new file mode 100644
index 00000000..0a634080
Binary files /dev/null and b/ppt/Lecture16.pptx differ
diff --git a/ppt/Lecture17.pptx b/ppt/Lecture17.pptx
new file mode 100644
index 00000000..faa5dbe5
Binary files /dev/null and b/ppt/Lecture17.pptx differ
diff --git a/ppt/Lecture18.pptx b/ppt/Lecture18.pptx
new file mode 100644
index 00000000..4434e8f1
Binary files /dev/null and b/ppt/Lecture18.pptx differ
diff --git a/ppt/Lecture2.pptx b/ppt/Lecture2.pptx
new file mode 100644
index 00000000..0c4fc8c4
Binary files /dev/null and b/ppt/Lecture2.pptx differ
diff --git a/ppt/Lecture3.pptx b/ppt/Lecture3.pptx
new file mode 100644
index 00000000..35a8cbda
Binary files /dev/null and b/ppt/Lecture3.pptx differ
diff --git a/ppt/Lecture4.pptx b/ppt/Lecture4.pptx
new file mode 100644
index 00000000..41a2f721
Binary files /dev/null and b/ppt/Lecture4.pptx differ
diff --git a/ppt/Lecture5.pptx b/ppt/Lecture5.pptx
new file mode 100644
index 00000000..4a5646cc
Binary files /dev/null and b/ppt/Lecture5.pptx differ
diff --git a/ppt/Lecture6.pptx b/ppt/Lecture6.pptx
new file mode 100644
index 00000000..fffc5929
Binary files /dev/null and b/ppt/Lecture6.pptx differ
diff --git a/ppt/Lecture7.pptx b/ppt/Lecture7.pptx
new file mode 100644
index 00000000..f7a10df4
Binary files /dev/null and b/ppt/Lecture7.pptx differ
diff --git a/ppt/Lecture8.pptx b/ppt/Lecture8.pptx
new file mode 100644
index 00000000..604d6139
Binary files /dev/null and b/ppt/Lecture8.pptx differ
diff --git a/ppt/Lecture9.pptx b/ppt/Lecture9.pptx
new file mode 100644
index 00000000..f7434992
Binary files /dev/null and b/ppt/Lecture9.pptx differ
diff --git a/srt/1 - 1 - Welcome (7 min).srt b/srt/1 - 1 - Welcome (7 min).srt
new file mode 100644
index 00000000..eb636536
--- /dev/null
+++ b/srt/1 - 1 - Welcome (7 min).srt
@@ -0,0 +1,515 @@
+1
+00:00:00,000 --> 00:00:04,262
+Welcome to this free online class on
+machine learning. Machine learning is one
+欢迎来到机器学习免费在线课程。机器学习是
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:04,262 --> 00:00:08,579
+of the most exciting recent technologies.
+And in this class, you learn about the
+目前最激动人心的技术之一。本课程中,你将学习
+
+3
+00:00:08,579 --> 00:00:13,115
+state of the art and also gain practice
+implementing and deploying these algorithms
+机器学习的发展,并且实现这些算法。
+
+4
+00:00:13,115 --> 00:00:17,487
+yourself. You've probably use a learning
+algorithm dozens of times a day without
+你每天都要多次使用学习,但并没有意识到。
+
+5
+00:00:17,487 --> 00:00:21,422
+knowing it. Every time you use a web
+search engine like Google or Bing to
+每次当你使用Google或Bing等搜索引擎时,
+
+6
+00:00:21,422 --> 00:00:25,794
+search the internet, one of the reasons
+that works so well is because a learning
+它能给出如此满意的结果,原因之一就是使用了学习算法。
+
+7
+00:00:25,794 --> 00:00:30,002
+algorithm, one implemented by Google or
+ Microsoft, has learned how to rank web
+由Google或微软实现的算法学会如何给网页排序。
+
+8
+00:00:30,002 --> 00:00:35,144
+pages. Every time you use Facebook or
+Apple's photo typing application and it
+每次你使用Facebook或苹果的相片分类功能,
+
+9
+00:00:35,144 --> 00:00:40,595
+recognizes your friends' photos, that's
+also machine learning. Every time you read
+它能识别出你朋友的相片,这也是机器学习。每次当你阅读邮件时,
+
+10
+00:00:40,595 --> 00:00:46,054
+your email and your spam filter saves you
+ from having to wade through tons of spam
+你的垃圾邮件过滤器帮助你过滤大量的垃圾邮件,
+
+11
+00:00:46,054 --> 00:00:50,980
+email, that's also a learning algorithm.
+For me one of the reasons I'm excited is
+这也是学习算法。对我而言,我兴奋的原因之一是
+
+12
+00:00:50,980 --> 00:00:55,643
+the AI dream of someday building machines
+as intelligent as you or me. We're a long
+AI的梦想就是有一天能建造像你我一样智能的机器。
+
+13
+00:00:55,643 --> 00:01:00,076
+way away from that goal, but many AI
+researchers believe that the best way to
+我们离这个目标还很远,但是许多AI研究者相信实现这个目标的最好方法
+
+14
+00:01:00,076 --> 00:01:04,567
+towards that goal is through learning
+ algorithms that try to mimic how the human
+就是采用学习算法试图模拟人类大脑是如何学习的。
+
+15
+00:01:04,567 --> 00:01:08,994
+brain learns. I'll tell you a little bit
+ about that too in this class. In this
+在本课程中,我将会向你们介绍部分这方面的内容。
+
+16
+00:01:08,994 --> 00:01:13,542
+class you learn about state-of-the-art
+machine learning algorithms. But it turns
+本课程中,你将学习机器学习算法的发展。
+
+17
+00:01:13,542 --> 00:01:17,919
+out just knowing the algorithms and
+knowing the math isn't that much good if
+但是,仅知道算法以及算法的数学含义,
+
+18
+00:01:17,919 --> 00:01:22,466
+you don't also know how to actually get
+this stuff to work on problems that you
+却不知道如何在你关心的问题上运用,是远远不够的。
+
+19
+00:01:22,466 --> 00:01:26,844
+care about. So, we've also spent a lot
+of time developing exercises for you to
+因此,我们也会花很多时间让你练习如何
+
+20
+00:01:26,844 --> 00:01:32,088
+implement each of these algorithms and
+ see how they work for yourself. So why is
+实现每一个算法,如何被你所用。
+
+21
+00:01:32,088 --> 00:01:37,075
+machine learning so prevalent today?
+It turns out that machine learning is a
+这就是为什么机器学习如此流行。机器学习
+
+22
+00:01:37,075 --> 00:01:41,713
+field that had grown out of the field of
+AI, or artificial intelligence. We wanted
+是从AI发展出来的一个领域。我们想
+
+23
+00:01:41,713 --> 00:01:46,642
+to build intelligent machines and it turns
+out that there are a few basic things that
+建造智能机器,那就是说我们要编程使机器能做很多基本的事情,
+
+24
+00:01:46,642 --> 00:01:51,454
+we could program a machine to do such as
+how to find the shortest path from A to B.
+比如找到从A到B的最短路径。
+
+25
+00:01:51,454 --> 00:01:56,267
+But for the most part we just did not know
+how to write AI programs to do the more
+但大多数情况下,我们不知道如何编写AI程序使机器做更多
+26
+00:01:56,267 --> 00:02:00,905
+interesting things such as web search or
+photo tagging or email anti-spam. There
+有趣的事情,如网页搜索、相片标记、反垃圾邮件。
+
+26
+00:02:00,905 --> 00:02:05,718
+was a realization that the only way to do
+ these things was to have a machine learn
+人们认识到做到这些事情,唯一的方法就是使机器本事学习如何去做。
+
+27
+00:02:05,718 --> 00:02:11,237
+to do it by itself. So, machine learning
+was developed as a new capability for
+因此,机器学习是计算机需要开发的一项新能力,
+
+28
+00:02:11,237 --> 00:02:16,950
+computers and today it touches many
+segments of industry and basic science.
+并且它设计工业和基础科学中的许多内容。
+
+29
+00:02:16,950 --> 00:02:21,496
+For me, I work on machine learning and
+in a typical week I might end up talking to
+对我而言,我研究机器学习,并且在有代表性的一周中,我可能会与
+
+30
+00:02:21,496 --> 00:02:25,698
+helicopter pilots, biologists, a bunch
+of computer systems people (so my
+直升飞机飞行员,生物学家,很多计算机系统的人员交流
+
+31
+00:02:25,698 --> 00:02:30,590
+colleagues here at Stanford) and averaging
+ two or three times a week I get email from
+并且每周2~3次与
+
+32
+00:02:30,590 --> 00:02:35,021
+people in industry from Silicon Valley
+contacting me who have an interest in
+硅谷工业界的人员互通email,他们对在他们的问题上
+
+33
+00:02:35,021 --> 00:02:39,741
+applying learning algorithms to their own
+problems. This is a sign of the range of
+应用学习算法感兴趣。以下是一些
+
+34
+00:02:39,741 --> 00:02:44,000
+problems that machine learning touches.
+There is autonomous robotics, computational
+机器学习涉及的领域,自主机器人,计算生物学,
+
+35
+00:02:44,000 --> 00:02:48,777
+biology, tons of things in Silicon Valley
+ that machine learning is having an impact
+以及其它一些被机器学习影响的领域。
+
+36
+00:02:48,777 --> 00:02:55,320
+on. Here are some other examples of
+ machine learning. There's database mining.
+还有一些其它的例子,如数据挖掘。
+
+37
+00:02:55,320 --> 00:03:00,063
+One of the reasons machine learning has so
+pervaded is the growth of the web and the
+机器学习如此普遍的原因之一就是网络的快速发展和
+
+38
+00:03:00,063 --> 00:03:04,751
+growth of automation. All this means that
+we have much larger data sets than ever
+自动化技术的快速发展。这意味着我们拥有了前所未有的大量的数据集。
+
+39
+00:03:04,751 --> 00:03:09,272
+before. So, for example tons of Silicon
+Valley companies are today collecting web
+因此,现在大量硅谷公司收集网络点击数据,
+
+40
+00:03:09,272 --> 00:03:13,737
+click data, also called clickstream data,
+and are trying to use machine learning
+被称为点击流数据,并试图采用机器学习算法
+
+41
+00:03:13,737 --> 00:03:18,481
+algorithms to mine this data to understand
+the users better and to serve the users
+来挖掘数据,更好地理解用户,并更好地为用户服务。
+
+42
+00:03:18,481 --> 00:03:22,327
+better, that's a huge segment of
+Silicon Valley right now. Medical
+这占目前硅谷工作的很大部分。
+
+43
+00:03:22,327 --> 00:03:27,483
+records. With the advent of automation, we
+now have electronic medical records, so if
+医疗记录。随着自动化的出现,现在我们使用电子医疗记录,
+
+44
+00:03:27,483 --> 00:03:32,640
+we can turn medical records into medical
+knowledge, then we can start to understand
+因此,假如我们能将医疗记录转化为医疗知识,那我们就能更好地理解疾病。
+
+45
+00:03:32,640 --> 00:03:37,238
+disease better. Computational biology.
+With automation again, biologists are
+计算生物学。还是因为自动化,生物学家
+
+46
+00:03:37,238 --> 00:03:41,774
+collecting lots of data about gene
+sequences, DNA sequences, and so on, and
+收集了大量的数据,关于基因序列,DNA序列等。
+
+47
+00:03:41,774 --> 00:03:46,931
+machines learning algorithms are giving us
+a much better understanding of the human
+机器学习算法让我们更好地理解基因组,
+
+48
+00:03:46,931 --> 00:03:51,376
+genome, and what it means to be human.
+And in engineering as well, in all fields of
+以及它对人类的意义。在工程学中,在工程学所有的领域,
+
+49
+00:03:51,376 --> 00:03:55,034
+engineering, we have larger and larger,
+and larger and larger data sets, that
+我们有越来越大,越来越大的数据集,
+
+50
+00:03:55,034 --> 00:03:59,249
+we're trying to understand using learning
+algorithms. A second range of machinery
+我们正设法采用学习算法来理解。
+
+51
+00:03:59,249 --> 00:04:03,440
+applications is ones that we cannot
+program by hand. So for example, I've
+机器应用的第二个领域是我们无法手动编写程序。例如,
+
+52
+00:04:03,440 --> 00:04:08,328
+worked on autonomous helicopters for many
+ years. We just did not know how to write a
+我们已经在自动直升飞机领域研究了很多年,仍不知道如何编写
+
+53
+00:04:08,328 --> 00:04:18,023
+computer program to make this helicopter
+fly by itself. The only thing that worked
+计算机程序使得直升飞机自己飞行。唯一有用的
+
+54
+00:04:18,023 --> 00:04:35,580
+was having a computer learn by itself how
+to fly this helicopter. [Helicopter whirling]
+就是使计算机自己学习如何使直升飞机飞行。
+
+55
+00:04:37,120 --> 00:04:42,880
+Handwriting recognition. It turns out one
+of the reasons it's so inexpensive today to
+手写体识别。今天,邮寄不再昂贵的原因之一,
+
+56
+00:04:42,880 --> 00:04:47,330
+route a piece of mail across the
+countries, in the US and internationally,
+无论在美国或国际上,
+
+57
+00:04:47,330 --> 00:04:51,899
+is that when you write an envelope like
+ this, it turns out there's a learning
+就是当你写好信封后,有学习算法
+
+58
+00:04:51,899 --> 00:04:56,943
+algorithm that has learned how to read your
+handwriting so that it can automatically
+来学习如何读取你的手写体,使得它能自动地
+
+59
+00:04:56,943 --> 00:05:01,749
+route this envelope on its way, and so it
+costs us a few cents to send this thing
+给你的信规划路线。因此,邮寄几千公里之外的信也只需要花费几分钱。
+
+60
+00:05:01,749 --> 00:05:06,318
+thousands of miles. And in fact if you've
+seen the fields of natural language
+事实上,假如你知道自然语言处理
+
+61
+00:05:06,318 --> 00:05:10,531
+processing or computer vision,
+these are the fields of AI pertaining to
+或计算机视觉,这些都是AI中有关
+
+62
+00:05:10,531 --> 00:05:15,321
+understanding language or understanding
+images. Most of natural language processing
+理解语言或理解图像的领域。今天,大部分的自然语言处理和
+
+63
+00:05:15,321 --> 00:05:20,689
+and most of computer vision today is
+ applied machine learning. Learning
+大部分的计算机视觉采用机器学习。
+
+64
+00:05:20,689 --> 00:05:25,576
+algorithms are also widely used for self-
+customizing programs. Every time you go to
+学习算法还应用在自我定制程序中。每次当你使用
+
+65
+00:05:25,576 --> 00:05:30,286
+Amazon or Netflix or iTunes Genius, and it
+recommends the movies or products and
+亚马逊或Netflix或iTunes天才,它就会推荐电影或产品或音乐给你,
+
+66
+00:05:30,286 --> 00:05:35,073
+music to you, that's a learning algorithm.
+If you think about it they have million
+这就是学习算法。假如你想象一下,他们有百万用户,
+
+67
+00:05:35,073 --> 00:05:39,999
+users; there is no way to write a million
+different programs for your million users.
+不可能为百万用户编写百万个不同的程序。
+
+68
+00:05:39,999 --> 00:05:44,807
+The only way to have software give these
+customized recommendations is to become
+用软件给出这些客户推荐的唯一方法就是
+
+69
+00:05:44,807 --> 00:05:49,258
+learn by itself to customize itself to
+your preferences. Finally learning
+自我学习并为你定制你偏爱的东西。
+
+70
+00:05:49,258 --> 00:05:53,294
+algorithms are being used today to
+understand human learning and to
+另外,今天学习算法还被使用来理解人类的学习
+
+71
+00:05:53,294 --> 00:05:58,042
+understand the brain. We'll talk about
+how researches are using this to make
+和大脑。我们将会讨论研究者是如何使用这些来
+
+72
+00:05:58,042 --> 00:06:03,182
+progress towards the big AI dream. A few
+months ago, a student showed me an article
+朝着AI梦前进的。几个月前,一个学生给我看了篇文章,
+
+73
+00:06:03,182 --> 00:06:07,996
+on the top twelve IT skills. The skills
+that information technology hiring
+列出了12个IT技术。这些技术是信息科技雇主
+
+74
+00:06:07,996 --> 00:06:13,006
+managers cannot say no to. It was a
+ slightly older article, but at the top of
+最喜爱的。这篇文章有点旧了,但是在这张表中,
+
+75
+00:06:13,006 --> 00:06:17,988
+this list of the twelve most desirable IT
+ skills was machine learning. Here at
+机器学习位列第一。
+
+76
+00:06:17,988 --> 00:06:21,793
+Stanford, the number of recruiters
+that contact me asking if I know any
+在斯坦福,联系我的招聘人员,需要的机器学习毕业的学生
+
+77
+00:06:21,793 --> 00:06:25,920
+graduating machine learning students
+is far larger than the machine learning
+的数量远大于每年毕业的机器学习的学生。
+
+78
+00:06:25,920 --> 00:06:30,047
+students we graduate each year. So I
+think there is a vast, unfulfilled demand
+因此,我认为这项技术有大量的未完成的需求。
+
+79
+00:06:30,047 --> 00:06:34,280
+for this skill set, and this is a great time to
+be learning about machine learning, and I
+这也是一个学习机器学习的好时机。
+
+80
+00:06:34,280 --> 00:06:38,454
+hope to teach you a lot about machine
+learning in this class. In the next video,
+我希望能在这门课中教给你机器学习的知识。在下一个视频中,
+
+81
+00:06:38,454 --> 00:06:42,123
+we'll start to give a more formal
+definition of what is machine learning.
+我们将开始给出机器学习的更正式的定义。
+
+82
+00:06:42,123 --> 00:06:46,044
+And we'll begin to talk about the main
+types of machine learning problems and
+我们将开始谈到机器学习问题的主要类型,及算法。
+
+83
+00:06:46,044 --> 00:06:49,864
+algorithms. You'll pick up some of the
+main machine learning terminology, and
+你们将会学到一些主要的机器学习术语,
+
+84
+00:06:49,864 --> 00:06:53,684
+start to get a sense of what are the
+different algorithms, and when each one
+及开始理解不同的机器学习算法,及每个算法在什么时候是合适的。
+
+85
+00:06:53,684 --> 00:06:54,740
+might be appropriate.
+也许管用。
+
diff --git a/srt/1 - 2 - What is Machine Learning_ (7 min).srt b/srt/1 - 2 - What is Machine Learning_ (7 min).srt
new file mode 100644
index 00000000..21bec22d
--- /dev/null
+++ b/srt/1 - 2 - What is Machine Learning_ (7 min).srt
@@ -0,0 +1,527 @@
+1
+00:00:00,000 --> 00:00:04,904
+What is machine learning? In this video we
+will try to define what it is and also try
+什么是机器学习?本视频中将设法定义什么是机器学习,并
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:04,904 --> 00:00:09,520
+to give you a sense of when you want to
+use machine learning. Even among machine
+设法告诉你机器学习应该在什么情况下使用。即使在
+
+3
+00:00:09,520 --> 00:00:14,252
+learning practitioners there isn't a well
+accepted definition of what is and what
+机器学习从业者中,也没有对机器学习的统一定义。
+
+4
+00:00:14,252 --> 00:00:18,926
+isn't machine learning. But let me show
+ you a couple of examples of the ways that
+但下面我将告诉你们一些现有的对机器学习的定义。
+
+5
+00:00:18,926 --> 00:00:23,600
+people have tried to define it. Here's the
+definition of what is machine learning
+下面是Arthur Samuel提出的关于机器学习做什么的定义。
+
+6
+00:00:23,600 --> 00:00:28,520
+does to Arthur Samuel. He defined machine
+learning as the field of study that gives
+他定义机器学习是一个学习领域,并在没有明确训练的情况下,
+
+7
+00:00:29,037 --> 00:00:33,554
+computers the ability to learn without being
+explicitly programmed. Samuel's claim to
+教给计算机学习的能力。追溯到1950's,
+
+8
+00:00:33,554 --> 00:00:38,452
+fame was that back in the 1950's, he wrote
+a checkers playing program. And the
+Samuel成名的原因是他编写了跳棋游戏程序。
+
+9
+00:00:38,452 --> 00:00:43,603
+amazing thing about this checkers playing
+ program, was that Arthur Samuel himself,
+并且这个跳棋游戏程序的令人惊讶之处是Arthur Samuel自己并不是一个玩跳棋高手。
+
+10
+00:00:43,603 --> 00:00:48,268
+wasn't a very good checkers player. But
+what he did was, he had to program for it to play
+但他所做的是编程使计算机与自己
+
+11
+00:00:48,268 --> 00:00:52,245
+10's of 1000's of games against itself.
+And by watching what sorts of board
+玩10次’1000次。通过观察哪些布局会赢,
+
+12
+00:00:52,245 --> 00:00:56,698
+positions tended to lead to wins, and what
+ sort of board positions tended to lead to
+哪些布局会输。
+
+13
+00:00:56,698 --> 00:01:00,725
+losses. The checkers playing program
+learns over time what are good board
+一段时间后,跳棋游戏程序就学到了什么是好的布局,
+
+14
+00:01:00,725 --> 00:01:04,713
+positions and what are bad board
+positions. And eventually learn to play
+什么是不好的布局。最终,就学会了玩跳棋,
+
+15
+00:01:04,713 --> 00:01:09,514
+checkers better than Arthur Samuel himself
+was able to. This was a remarkable result.
+比Arthur Samuel玩得更好。这是一个引人注目的结果。
+
+16
+00:01:09,514 --> 00:01:14,535
+Although Samuel himself turned out not to be a
+very good checkers player. But because the
+虽然,Samuel自己并不是一个好的跳棋玩者,但
+
+17
+00:01:14,535 --> 00:01:19,254
+computer has the patience to play tens
+ of thousands of games itself. No
+计算机有耐心与自己玩成百上千次。
+
+18
+00:01:19,254 --> 00:01:24,275
+human, has the patience to play that many
+games. By doing this the computer was able
+没有一个人有这种耐心玩这么多次游戏。因此,计算机能获得
+
+19
+00:01:24,275 --> 00:01:29,235
+to get so much checkers-playing experience that it eventually became a
+如此丰富的跳棋游戏经验,最终成为了一个
+
+20
+00:01:29,235 --> 00:01:33,817
+better checkers player than Arthur Samuel
+ himself. This is somewhat informal
+比Arthur Samuel更好的跳棋玩者。这是有点不正式的定义,
+
+21
+00:01:33,817 --> 00:01:38,547
+definition, and an older one. Here's a
+slightly more recent definition by Tom
+也是较老的一个定义。下面是一个稍近时间的定义,由Tom Mitchell定义,
+
+22
+00:01:38,547 --> 00:01:43,607
+Mitchell, who's a friend out of Carnegie
+Mellon. So Tom defines machine learning by
+他是在卡内基梅陇大学的一个朋友。Tom是这样定义机器学习的,
+
+23
+00:01:43,607 --> 00:01:48,819
+saying that, a well posed learning problem
+is defined as follows. He says, a computer
+一个良好的学习问题定义如下:
+
+24
+00:01:48,819 --> 00:01:53,843
+program is said to learn from experience
+E, with respect to some task T, and some
+计算机程序从经验E中学习任务T,
+
+25
+00:01:53,843 --> 00:01:58,678
+performance measure P, if its
+performance on T as measured by P improves
+并用度量P来衡量性能。条件是它由P定义的关于T的性能随着经验E而提高。
+
+26
+00:01:58,678 --> 00:02:03,764
+with experience E. I actually think he came
+ up with this definition just to make it
+我确实认为他如此提出定义只是想使它更押韵。
+
+27
+00:02:03,764 --> 00:02:08,346
+rhyme. For the checkers playing
+example the experience e, will be the
+对于跳棋游戏,经验E就是
+
+28
+00:02:08,346 --> 00:02:13,253
+experience of having the program play 10's
+of 1000's of games against itself. The
+计算机与自己玩10次,1000次的跳棋;
+
+29
+00:02:13,253 --> 00:02:17,735
+task t, will be the task of playing
+ checkers. And the performance measure p,
+任务T就是玩跳棋的任务;性能度量P就是
+
+30
+00:02:17,735 --> 00:02:22,399
+will be the probability that it
+wins the next game of checkers against
+与新对手玩跳棋时赢的概率。
+
+31
+00:02:22,399 --> 00:02:27,157
+some new opponent. Throughout these
+videos, besides me trying to teach you
+本视频中,除了由我教你们学习外,
+
+32
+00:02:27,157 --> 00:02:32,291
+study, I will occasionally ask you a
+ question to make sure you understand the
+我将偶尔问你们一个问题以确保你们已经理解内容了。
+
+33
+00:02:32,291 --> 00:02:36,891
+content. Here's one, on top is a
+definition of machine learning by Tom
+
+34
+00:02:36,891 --> 00:02:42,292
+Mitchell. Let's say your email program
+watches which emails you do or do not flag
+假设你的邮件程序观察哪封邮件你将标为垃圾邮件,哪封将不标。
+
+35
+00:02:42,292 --> 00:02:47,826
+as spam. So in an email client like this
+ you might click this spam button to report
+因此在邮件客户端,你可能点击垃圾邮件按钮标记这一些邮件为垃圾邮件,
+
+36
+00:02:47,826 --> 00:02:53,263
+some email as spam, but not other emails
+and. Based on which emails you mark as
+另外一些不是。基于此,
+
+37
+00:02:53,263 --> 00:02:59,046
+spam, so your e-mail program learns better
+how to filter spam e-mail. What is the
+你的邮件程序更好地学习如何过滤垃圾邮件。
+
+38
+00:02:59,046 --> 00:03:04,290
+task T in this setting? In a few seconds,
+the video will pause. And when it does so,
+在这个例子中,任务T是什么?接下来的几秒,视频会暂停,
+
+39
+00:03:04,290 --> 00:03:09,598
+you can use your mouse to select one of
+these four radio buttons to let, to let me
+你们可以用鼠标来选择四个按键中的一个告诉我
+
+40
+00:03:09,598 --> 00:03:40,190
+know which of these four you think is the
+right answer to this question. That might
+你认为四个中的哪个是正确答案。
+
+41
+00:03:40,190 --> 00:03:45,747
+be a performance measure P. And so, our
+ task performance on the task our system's
+它可能是性能度量P。因此,我们的系统关于任务T的性能,
+
+42
+00:03:45,747 --> 00:03:50,529
+performance on the task T, on the
+performance measure P will improve after
+随着经验E而提高的性能度量P。
+
+43
+00:03:50,529 --> 00:03:55,957
+the experience E. In this class I hope to
+teach you about various different types of
+本课程中,我希望教给你们各种各样不同类型的学习算法。
+
+44
+00:03:55,957 --> 00:04:00,933
+learning algorithms. There are several
+different types of learning algorithms.
+目前也有各种不同类型的学习算法。
+
+45
+00:04:00,933 --> 00:04:05,650
+The main two types are what we call
+supervised learning and unsupervised
+主要有两类:监督学习和非监督学习。
+
+46
+00:04:05,650 --> 00:04:10,690
+learning. I'll define what these terms
+mean more in the next couple videos. But
+在下个视频中,我将定义这些名词的含义。
+
+47
+00:04:10,690 --> 00:04:16,028
+it turns out that in supervised learning,
+the idea is that we're going to teach the
+监督学习,就是我们教计算机如何做事情,
+
+48
+00:04:16,028 --> 00:04:20,513
+computer how to do something, whereas in
+unsupervised learning we're going let
+然而,在非监督学习中,我们将让计算机自己学习。
+
+49
+00:04:20,513 --> 00:04:25,016
+it learn by itself. Don't worry if these
+two terms don't make sense yet, in the
+不要担心这两个词没有意义,
+
+50
+00:04:25,016 --> 00:04:29,739
+next two videos I'm going to say exactly
+what these two types of learning are. You
+在下个视频中,我将准确说明这两类学习是什么。
+
+51
+00:04:29,739 --> 00:04:34,070
+will also hear other buzz terms such as
+reinforcement learning and recommender
+你也会听到其它的名词,如增强学习和推荐系统。
+
+52
+00:04:34,070 --> 00:04:38,621
+systems. These are other types of machine
+learning algorithms that we'll talk about
+这些是其它类型的机器学习算法,我们将会在以后讨论。
+
+53
+00:04:38,621 --> 00:04:42,460
+later but the two most used types of
+learning algorithms are probably
+但是最常使用的两类算法是
+
+54
+00:04:42,460 --> 00:04:46,791
+supervised learning and unsupervised
+learning and I'll define them in the next
+监督学习和非监督学习。我将在下两个视频中定义,
+
+55
+00:04:46,791 --> 00:04:51,123
+two videos and we'll spend most of this
+class talking about these two types of
+并且我们会用大部分的课时来讨论这两类学习算法。
+
+56
+00:04:51,123 --> 00:04:55,720
+learning algorithms. It turns out one of
+the other things we'll spend a lot of time
+另外,我们将会花大量时间说明
+
+57
+00:04:55,720 --> 00:05:00,054
+on in this class is practical advice for
+applying learning algorithms. This is
+应用学习算法时的实际建议。
+
+58
+00:05:00,054 --> 00:05:04,444
+something that I feel pretty strongly
+ about, and it's actually something that I
+这也是我觉得很有用的,也是据我所知
+
+59
+00:05:04,444 --> 00:05:08,167
+don't know of any other university
+teaches. Teaching about learning
+其它大学未教的。教授学习算法
+
+60
+00:05:08,167 --> 00:05:12,509
+algorithms is like giving you a set of
+tools, and equally important or more
+就像给你一组工具,但和给你工具同样重要的或者说更重要的是
+
+61
+00:05:12,509 --> 00:05:17,616
+important to giving you the tools is to
+teach you how to apply these tools. I like
+教你如何使用这些工具。
+
+62
+00:05:17,616 --> 00:05:22,413
+to make an analogy to learning to become a
+carpenter. Imagine that someone is
+我喜欢用学习成为一个木匠做比喻。想象一下,某人
+
+63
+00:05:22,413 --> 00:05:26,959
+teaching you how to be a carpenter and
+they say here's a hammer, here's a
+在教你如何成为一个木匠,并且他说这是一个榔头,这是一个
+
+64
+00:05:26,959 --> 00:05:31,077
+screwdriver, here's a saw, good luck.
+Well, that's no good, right? You, you, you
+螺丝刀,这是一个锯子。好运气,可是,事实上并不好。
+
+65
+00:05:31,077 --> 00:05:34,799
+have all these tools, but the more
+important thing, is to learn how to use
+你有了所有的工具,但更重要的是要学习如何正确地使用这些工具。
+
+66
+00:05:34,799 --> 00:05:38,927
+these tools properly. There's a huge
+ difference between, between people that
+对知道如何使用机器学习算法的人,
+
+67
+00:05:38,927 --> 00:05:43,456
+know how to use these machines learning
+algorithms, versus people who don't know
+与不知道如何使用机器学习的比较,将有很大的不同。
+
+68
+00:05:43,456 --> 00:05:47,626
+how to use these tools well. Here in
+Silicon Valley where I live, when I go
+在硅谷,当我
+
+69
+00:05:47,626 --> 00:05:52,328
+visit different companies even at the
+top Silicon Valley companies very often I see
+参观不同的公司,即使是在那些我看到
+
+70
+00:05:52,328 --> 00:05:56,428
+people are trying to apply machine
+ learning algorithms to some problem and
+他们设法在一些问题上应用机器学习算法的公司,
+
+71
+00:05:56,428 --> 00:06:00,857
+sometimes they have been going at it for
+six months. But sometimes when I look at
+有时他们已经在问题上做了6个月。
+
+72
+00:06:00,857 --> 00:06:05,121
+what they're doing I, I, I say, you know,
+I could have told them like, gee, I could
+当我看到他们在做时,我会说,你知道,我真希望我在
+
+73
+00:06:05,121 --> 00:06:09,714
+have told you six months ago that you
+should be taking a learning algorithm and
+六个月前就告诉你们,你们应该使用学习算法,
+
+74
+00:06:09,714 --> 00:06:14,470
+applying it in like the slightly modified
+way and your chance of success would have
+并且小小修改一下,成功率就会高很多。
+
+75
+00:06:14,470 --> 00:06:19,648
+been much higher. So what we're going to
+do in this class is actually spend a lot
+因此,本课程中要做的就是花大量的时间讨论
+
+76
+00:06:19,648 --> 00:06:23,523
+of time talking about how, if you actually
+ tried to develop a machine learning
+假设你确实设法开发机器学习系统,
+
+77
+00:06:23,523 --> 00:06:27,596
+system, how to make those best practices
+type decisions about the way in which you
+那么对于你建立你的系统的方式,如何作出那些最佳的实际类型决策,
+
+78
+00:06:27,596 --> 00:06:31,321
+build your system so that when you're
+applying learning algorithm you're less
+以致于当你应用学习算法时,你不大可能
+
+79
+00:06:31,321 --> 00:06:35,394
+likely to end up one of those people who
+ end up pursuing some path for six months
+像那些在做了六个月后放弃的人一样,就此结束放弃。
+
+80
+00:06:35,394 --> 00:06:39,373
+that, you know, someone else could have
+ figured out it just wasn't gonna work at
+其它人可能已经明白它根本不值得做,
+
+81
+00:06:39,373 --> 00:06:43,515
+all and it's just a waste of time for six
+months. So I'm actually going to spend a
+它只是浪费了六个月的时间。因此,我将花费大量的时间
+
+82
+00:06:43,515 --> 00:06:47,707
+lot of the time teaching you those sorts
+of best practices in machine learning and
+来教你们机器学习和AI中那些最好的实践经验,
+
+83
+00:06:47,707 --> 00:06:52,052
+AI and how to get this stuff to work and
+how we do it, how the best people do it in
+以及如何使这项任务工作,我们如何来做,硅谷和世界上最优秀的人是如何做的。
+
+84
+00:06:52,052 --> 00:06:56,143
+Silicon Valley and around the world. I
+hope to make you one of the best people in
+我希望使你们成为
+
+85
+00:06:56,143 --> 00:06:59,905
+knowing how to design and build serious
+machine learning and AI systems. So,
+在知道如何设计和建立各种各样的机器学习及AI系统方面最好的人。
+
+86
+00:06:59,905 --> 00:07:04,698
+that's machine learning and these are the
+main topics I hope to teach. In the next
+因此,机器学习以及如上提到的是我最希望教的。
+
+87
+00:07:04,698 --> 00:07:09,023
+video, I'm going to define what is
+supervised learning and after that, what
+下一个视频,我将定义什么是监督学习,之后介绍什么是非监督学习。
+
+88
+00:07:09,023 --> 00:07:13,757
+is unsupervised learning. And also, start
+to talk about when you would use each of them.
+并且,开始讨论你可以在什么时候使用它们。
+
diff --git a/srt/1 - 3 - Supervised Learning (12 min).srt b/srt/1 - 3 - Supervised Learning (12 min).srt
new file mode 100644
index 00000000..4c6aa651
--- /dev/null
+++ b/srt/1 - 3 - Supervised Learning (12 min).srt
@@ -0,0 +1,870 @@
+1
+00:00:00,000 --> 00:00:04,620
+In this video I am going to define what is
+probably the most common type of machine
+本视频中,我将定义机器学习问题的最普通类型大概是什么,
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:04,620 --> 00:00:08,910
+learning problem, which is supervised
+learning. I'll define supervised learning
+哪些是监督学习。随后,我将给出监督学习的更正式的定义,
+
+3
+00:00:08,910 --> 00:00:13,255
+more formally later, but it's probably
+best to explain or start with an example
+但是最好还是从例子开始解释。
+
+4
+00:00:13,255 --> 00:00:17,820
+of what it is and we'll do the formal
+definition later. Let's say you want to
+假设你想要预测房屋价格。
+
+5
+00:00:17,820 --> 00:00:23,072
+predict housing prices. A while back, a
+student collected data sets from the
+不久前,一个学生从波特兰俄勒冈研究所收集数据。
+
+6
+00:00:23,072 --> 00:00:28,745
+Institute of Portland Oregon. And let's
+ say you plot a data set and it looks like
+假设你绘制了一个数据集,就像这样。
+
+7
+00:00:28,745 --> 00:00:34,347
+this. Here on the horizontal axis, the
+size of different houses in square feet,
+水平轴上,不同房屋的尺寸是平方英尺
+
+8
+00:00:34,347 --> 00:00:39,879
+and on the vertical axis, the price of
+ different houses in thousands of dollars.
+垂直轴上,是不同房子的价格,千万美元。
+
+9
+00:00:39,879 --> 00:00:45,168
+So. Given this data, let's say you have a
+friend who owns a house that is, say 750
+给定数据,假设有一个朋友有一栋房子,750平方英尺,
+
+10
+00:00:45,168 --> 00:00:50,708
+square feet and hoping to sell the house
+and they want to know how much they can
+想要卖掉这个房子,他们想知道能卖多少钱。
+
+11
+00:00:50,708 --> 00:00:56,116
+get for the house. So how can the learning
+ algorithm help you? One thing a learning
+那么,学习算法能帮你什么呢?
+
+12
+00:00:56,116 --> 00:01:01,524
+algorithm might be able to do is put a
+ straight line through the data or to fit a
+学习算法可做的一件事可能是根据数据画一条直线或者说
+
+13
+00:01:01,524 --> 00:01:07,111
+straight line to the data and, based on
+that, it looks like maybe the house can be
+用一条直线拟合数据。基于此,看上去房子可以卖大约150000美元。
+
+14
+00:01:07,111 --> 00:01:13,239
+sold for maybe about $150,000. But maybe this
+isn't the only learning algorithm you can
+但是,可能这不是你能使用的唯一的学习算法。
+
+15
+00:01:13,239 --> 00:01:18,536
+use. There might be a better one. For
+example, instead of sending a straight
+可能有一个更好的。例如,不是用一条直线拟合数据,
+
+16
+00:01:18,536 --> 00:01:23,620
+line to the data, we might decide that
+it's better to fit a quadratic
+可能觉得用一个二次函数或二阶多项式来拟合数据更好。
+
+17
+00:01:23,620 --> 00:01:29,110
+function or a second-order polynomial to
+this data. And if you do that, and make a
+假设这么做,
+
+18
+00:01:29,110 --> 00:01:34,667
+prediction here, then it looks like, well,
+maybe we can sell the house for closer to
+在这儿做一个预测,然后看上去我们可以以接近200000美元的价格卖掉这个房子。
+
+19
+00:01:34,667 --> 00:01:39,184
+$200,000. One of the things we'll talk
+about later is how to choose and how to
+后面我们要讨论的一件事就是如何选择,如何
+
+20
+00:01:39,184 --> 00:01:43,792
+decide do you want to fit a straight line
+to the data or do you want to fit the
+决定你是用直线拟合数据还是用二次函数拟合数据。
+
+21
+00:01:43,792 --> 00:01:48,631
+quadratic function to the data and there's
+no fair picking whichever one gives your
+当然,无论取哪个模型给你的朋友卖房子都是不公平的。
+
+22
+00:01:48,631 --> 00:01:53,182
+friend the better house to sell. But each
+ of these would be a fine example of a
+但这是一个学习算法的很好的例子。
+
+23
+00:01:53,182 --> 00:01:57,834
+learning algorithm. So this is an example
+ of a supervised learning algorithm. And
+这也是监督学习算法的例子。
+
+24
+00:01:57,834 --> 00:02:03,736
+the term supervised learning refers to the
+ fact that we gave the algorithm a data set
+监督学习是指我们给算法一个数据集,并且给定正确答案。
+
+25
+00:02:03,736 --> 00:02:09,089
+in which the "right answers" were
+given. That is, we gave it a data set of
+也就是说,我们给定一个房屋数据集,
+
+26
+00:02:09,089 --> 00:02:14,580
+houses in which for every example in this
+ data set, we told it what is the right
+在这个数据集中的每个例子,我们都给出正确的价格,
+
+27
+00:02:14,580 --> 00:02:20,002
+price so what is the actual price that,
+that house sold for and the toss of the
+也即这个房子卖出的实际价格,
+
+28
+00:02:20,002 --> 00:02:25,423
+algorithm was to just produce more of
+these right answers such as for this new
+算法的目的就是给出更多的正确答案,例如对待售房子,
+
+29
+00:02:25,423 --> 00:02:30,579
+house, you know, that your friend may be
+trying to sell. To define with a bit more
+比如你朋友想要卖的房子给出估价。
+
+30
+00:02:30,579 --> 00:02:35,257
+terminology this is also called a
+regression problem and by regression
+用更专业的术语来定义,它也称为回归问题,
+
+31
+00:02:35,257 --> 00:02:40,467
+problem I mean we're trying to predict a
+continuous value output. Namely the price.
+之所以称之为回归问题,我的意思就是预测连续的输出值,也就是价格。
+
+32
+00:02:40,467 --> 00:02:44,720
+So technically I guess prices can be
+rounded off to the nearest cent. So maybe
+技术上而言,假设价格被圆整到分。
+
+33
+00:02:44,720 --> 00:02:49,246
+prices are actually discrete values, but
+usually we think of the price of a house
+因此,价格实际上是一个离散值,但通常我们认为房子的价格是一个实数,
+
+34
+00:02:49,246 --> 00:02:53,608
+as a real number, as a scalar value, as
+a continuous value number and the term
+作为一个标量,作为一个连续值,
+
+35
+00:02:53,608 --> 00:02:58,080
+regression refers to the fact that we're
+ trying to predict the sort of continuous
+回归指我们设法预测连续值的属性。
+
+36
+00:02:58,080 --> 00:03:02,060
+values attribute. Here's another
+supervised learning example, some friends
+这里是另一个监督学习的例子,我和一些朋友
+
+37
+00:03:02,060 --> 00:03:06,427
+and I were actually working on this
+ earlier. Let's see you want to look at
+早些时候做过这个问题。假设你想要看医疗记录,
+
+38
+00:03:06,427 --> 00:03:11,675
+medical records and try to predict of a
+breast cancer as malignant or benign. If
+并且想设法预测乳腺癌是恶性的还是良性的。
+
+39
+00:03:11,675 --> 00:03:16,856
+someone discovers a breast tumor, a lump
+in their breast, a malignant tumor is a
+假设某人发现了一个乳腺瘤,在乳腺上有个肿块,一个恶性瘤就是
+
+40
+00:03:16,856 --> 00:03:22,300
+tumor that is harmful and dangerous and a
+benign tumor is a tumor that is harmless.
+它是有害的且危险的;良性瘤就是它是无害的。
+
+41
+00:03:22,300 --> 00:03:27,876
+So obviously people care a lot about this.
+Let's see a collected data set and suppose
+显然,人们很关心这个。我们来看收集到的数据集,
+
+42
+00:03:27,876 --> 00:03:33,164
+in your data set you have on your
+horizontal axis the size of the tumor and
+假设,在你的数据集中,水平轴是瘤的尺寸,
+
+43
+00:03:33,164 --> 00:03:39,317
+on the vertical axis I'm going to plot one
+or zero, yes or no, whether or not these are
+垂直轴,可以是1或0,也可以是Y或N。
+
+44
+00:03:39,317 --> 00:03:45,184
+examples of tumors we've seen before are
+malignant–which is one–or zero if not malignant
+设已经看到的肿瘤例子是恶性的对应1,良性的对应0。
+
+45
+00:03:45,184 --> 00:03:50,392
+or benign. So let's say our data set looks
+like this where we saw a tumor of this
+那么,你的数据集就像这样,我们看到这些尺寸的瘤是良性的,
+
+46
+00:03:50,392 --> 00:03:56,283
+size that turned out to be benign. One of
+ this size, one of this size. And so on.
+这个尺寸,这个尺寸,等等。
+
+47
+00:03:56,283 --> 00:04:02,227
+And sadly we also saw a few malignant
+ tumors, one of that size, one of that
+很遗憾,我们也看到了一些恶性的瘤,这个尺寸,这个尺寸,
+
+48
+00:04:02,227 --> 00:04:08,572
+size, one of that size... So on. So this
+ example... I have five examples of benign
+这个尺寸,等等。因此,这个例子,这里我给出了五个良性肿瘤的例子,
+
+49
+00:04:08,572 --> 00:04:15,159
+tumors shown down here, and five examples
+of malignant tumors shown with a vertical
+五个恶性肿瘤的例子,恶性瘤对应垂直轴上的1.
+
+50
+00:04:15,159 --> 00:04:21,504
+axis value of one. And let's say we have
+a friend who tragically has a breast
+假设我们有个朋友,很不幸,有乳腺癌,
+
+51
+00:04:21,504 --> 00:04:28,097
+tumor, and let's say her breast tumor size
+is maybe somewhere around this value. The
+假设她的乳腺瘤的尺寸可能是这个值附近。
+
+52
+00:04:28,097 --> 00:04:32,930
+machine learning question is, can you
+ estimate what is the probability, what is
+机器学习问题就是,你能否估计出瘤是良性的还是恶性的概率。
+
+53
+00:04:32,930 --> 00:04:37,819
+the chance that a tumor is malignant
+versus benign? To introduce a bit more
+引入一个更专业的术语,
+
+54
+00:04:37,819 --> 00:04:42,719
+terminology this is an example of a
+classification problem. The term
+这就是一个分类问题。
+
+55
+00:04:42,719 --> 00:04:47,342
+classification refers to the fact that
+here we're trying to predict a discrete
+分类是指我们设法预测一个离散的输出值,
+
+56
+00:04:47,342 --> 00:04:52,321
+value output: zero or one, malignant or
+benign. And it turns out that in
+0或1,恶性或良性。
+
+57
+00:04:52,321 --> 00:04:58,331
+classification problems sometimes you can
+have more than two values for the two
+在分类问题中,对于输出,你也可以多于两个值。
+
+58
+00:04:58,331 --> 00:05:03,852
+possible values for the output. As a
+ concrete example maybe there are three
+在实际例子中就是,可能有三种类型的乳腺癌,
+
+59
+00:05:03,852 --> 00:05:09,947
+types of breast cancers and so you may try
+ to predict the discrete value of zero,
+因此,你可能要设法预测离散值0,
+
+60
+00:05:09,947 --> 00:05:15,138
+one, two, or three with zero being benign.
+Benign tumor, so no cancer. And one may
+1,2或3,其中0是良性的。良性瘤就是说没有癌症。
+
+61
+00:05:15,138 --> 00:05:19,836
+mean, type one cancer, like, you have
+ three types of cancer, whatever type one
+1就是指第一种癌症,
+
+62
+00:05:19,836 --> 00:05:24,654
+means. And two may mean a second type of
+cancer, a three may mean a third type of
+2就是指第二种癌症,3就是指第三种癌症。
+
+63
+00:05:24,654 --> 00:05:29,111
+cancer. But this would also be a
+classification problem, because this other
+这也是一个分类问题,因为这些离散的输出值,
+
+64
+00:05:29,111 --> 00:05:33,929
+discrete value set of output corresponding
+to, you know, no cancer, or cancer type
+对应于你知道没有癌症,或者癌症一、
+
+65
+00:05:33,929 --> 00:05:39,094
+one, or cancer type two, or cancer type
+three. In classification problems there is
+癌症二、或癌症三。在分类问题中,
+
+66
+00:05:39,094 --> 00:05:44,413
+another way to plot this data. Let me show
+you what I mean. Let me use a slightly
+有另一种方法来绘制这些数据。
+
+67
+00:05:44,413 --> 00:05:49,206
+different set of symbols to plot this
+data. So if tumor size is going to be the
+我们用有点不同的符号集来绘制这组数据。假设瘤的尺寸是我用来
+
+68
+00:05:49,206 --> 00:05:54,303
+attribute that I'm going to use to predict
+malignancy or benignness, I can also draw
+预测恶性或良性的特征。
+
+69
+00:05:54,303 --> 00:05:58,975
+my data like this. I'm going to use
+different symbols to denote my benign and
+我也可以这样画我的数据。我将用不同的符号来表示我的良性或恶性,
+
+70
+00:05:58,975 --> 00:06:03,707
+malignant, or my negative and positive
+examples. So instead of drawing crosses,
+或者说我的反例和正例。因此,不是画叉,
+
+71
+00:06:03,707 --> 00:06:11,595
+I'm now going to draw O's for the benign
+tumors. Like so. And I'm going to keep
+这里我将用O来表示良性瘤,就像这样。
+
+72
+00:06:11,595 --> 00:06:18,655
+using X's to denote my malignant tumors.
+Okay? I hope this is beginning to make
+我还是继续用叉表示恶性瘤。我希望这是可理解的。
+
+73
+00:06:18,655 --> 00:06:23,624
+sense. All I did was I took, you know,
+these, my data set on top and I just
+我所做的就是我将我的数据集映射到这些实线。
+
+74
+00:06:23,624 --> 00:06:30,894
+mapped it down. To this real line like so.
+And started to use different symbols,
+并且,使用不同的符号,
+
+75
+00:06:30,894 --> 00:06:35,828
+circles and crosses, to denote malignant
+versus benign examples. Now, in this
+圆圈和叉,表示恶性和良性例子。
+
+76
+00:06:35,828 --> 00:06:41,091
+example we use only one feature or one
+ attribute, mainly, the tumor size in order
+在这个例子中,我们只使用了一个特征或者说一个属性,瘤的尺寸,
+
+77
+00:06:41,091 --> 00:06:46,289
+to predict whether the tumor is malignant
+or benign. In other machine learning
+来预测瘤是恶性的还是良性的。在其它的机器学习问题中,
+
+78
+00:06:46,289 --> 00:06:51,355
+problems when we have more than one
+feature, more than one attribute. Here's
+我们会有多个特征,多个属性。
+
+79
+00:06:51,355 --> 00:06:56,749
+an example. Let's say that instead of just
+knowing the tumor size, we know both the
+这儿有个例子。假设,不仅只指定瘤的尺寸,我们还知道病人的年纪。
+
+80
+00:06:56,749 --> 00:07:02,387
+age of the patients and the tumor size. In
+ that case maybe your data set will look
+在这种情况下,你的数据集就是这样的,
+
+81
+00:07:02,387 --> 00:07:08,562
+like this where I may have a set of patients
+with those ages and that tumor size and
+我可能有一组病人是这些年龄的,瘤的尺寸是这样的。
+
+82
+00:07:08,562 --> 00:07:14,980
+they look like this. And a different set
+of patients, they look a little different,
+另一组病人,他们看上去有点不同,
+
+83
+00:07:15,600 --> 00:07:23,968
+whose tumors turn out to be malignant, as
+ denoted by the crosses. So, let's say you
+他们的瘤是恶性的,用叉表示。
+
+84
+00:07:23,968 --> 00:07:32,027
+have a friend who tragically has a
+tumor. And maybe, their tumor size and age
+假设你有一个朋友,很不幸有一个瘤。可能,他的瘤的尺寸和她的年记在这儿。
+
+85
+00:07:32,027 --> 00:07:37,657
+falls around there. So given a data set
+like this, what the learning algorithm
+因此,给定如此的数据集,学习算法能做的
+
+86
+00:07:37,657 --> 00:07:42,462
+might do is throw the straight line
+through the data to try to separate out
+可能就是在数据上给出这样一条直线,设法将恶性瘤和良性瘤分开。
+
+87
+00:07:42,462 --> 00:07:47,710
+the malignant tumors from the benign one s
+and, so the learning algorithm may decide
+从而,学习算法可能决定
+
+88
+00:07:47,710 --> 00:07:53,004
+to throw the straight line like that to
+separate out the two classes of tumors.
+用这样的直线来分离这两类瘤。
+
+89
+00:07:53,004 --> 00:07:57,644
+And. You know, with this, hopefully you
+ can decide that your friend's tumor is
+有了这个,很幸运地,你就可以决定你朋友的瘤很有可能是什么样的,
+
+90
+00:07:57,644 --> 00:08:02,322
+more likely to if it's over there,
+that hopefully your learning algorithm
+假设它在这儿,你的学习算法将会说
+
+91
+00:08:02,322 --> 00:08:07,305
+will say that your friend's tumor falls on
+ this benign side and is therefore more
+你朋友的瘤位于良性区域,因此,是良性的可能性比恶性的大。
+
+92
+00:08:07,305 --> 00:08:12,044
+likely to be benign than malignant. In
+this example we had two features, namely,
+在这个例子中,我们有两种特征,
+
+93
+00:08:12,044 --> 00:08:17,147
+the age of the patient and the size of the
+tumor. In other machine learning problems
+病人的年级和瘤的尺寸。在其它的机器学习算法中,可能会有更多的特征。
+
+94
+00:08:17,147 --> 00:08:21,454
+we will often have more features, and my
+friends that work on this problem, they
+我的朋友做了这样的问题,
+
+95
+00:08:21,454 --> 00:08:25,849
+actually use other features like these,
+which is clump thickness, the clump thickness of
+他们实际上使用了其它的特征,如肿块的厚度,
+
+96
+00:08:25,849 --> 00:08:30,299
+the breast tumor. Uniformity of cell size
+of the tumor. Uniformity of cell shape of
+还有瘤细胞的尺寸的均匀性,瘤细胞形状的均匀性等,
+
+97
+00:08:30,299 --> 00:08:34,911
+the tumor, and so on, and other features
+ as well. And it turns out one of the interes-,
+以及其它的特征。一个有趣的,
+
+98
+00:08:34,911 --> 00:08:39,907
+most interesting learning algorithms that
+we'll see in this class is a learning
+或更有趣的学习算法是,我们会看到,
+
+99
+00:08:39,907 --> 00:08:45,153
+algorithm that can deal with, not just two
+or three or five features, but an infinite
+一个学习算法可处理,不仅仅两到三个或五个特征,而是无穷多的特征。
+
+100
+00:08:45,153 --> 00:08:50,150
+number of features. On this slide, I've
+listed a total of five different features.
+在这个幻灯片中,我列出了总共五种不同的特征。
+
+101
+00:08:50,150 --> 00:08:54,482
+Right, two on the axes and three more up here.
+But it turns out that for some learning
+两个在轴上,三个在这儿。但是,这说明对某些学习问题,
+
+102
+00:08:54,482 --> 00:08:58,497
+problems, what you really want is not to
+use, like, three or five features. But
+你真正想要的不是使用像三个还是五个特征,
+
+103
+00:08:58,497 --> 00:09:02,566
+instead, you want to use an infinite
+number of features, an infinite number of
+而是,你想使用一个无穷多数量的特征集,一个无穷多数量的属性集。
+
+104
+00:09:02,566 --> 00:09:06,211
+attributes, so that your learning
+algorithm has lots of attributes or
+因此,你的学习算法有很多的属性或特征或线索来做预测。
+
+105
+00:09:06,211 --> 00:09:10,333
+features or cues with which to make those
+predictions. So how do you deal with an
+因此,你如何来处理这些特征呢。
+
+106
+00:09:10,333 --> 00:09:14,439
+infinite number of features. How do you even
+store an infinite number of
+你如何处理无穷多数量的特征,如何在计算机中存储无穷多数量的事物,
+
+107
+00:09:14,439 --> 00:09:18,290
+things on the computer when your
+computer is gonna run out of memory. It
+这时,你的计算机可能会溢出。
+
+108
+00:09:18,290 --> 00:09:22,188
+turns out that when we talk about an
+algorithm called the Support Vector
+以支持向量机算法为例,
+
+109
+00:09:22,188 --> 00:09:26,675
+Machine, there will be a neat mathematical
+ trick that will allow a computer to deal
+就有一个灵巧的数学技巧,它允许计算机处理无穷多的特征。
+
+110
+00:09:26,675 --> 00:09:31,214
+with an infinite number of features. Imagine
+that I didn't just write down two features
+假设我没有在这儿写下两个特征,
+
+111
+00:09:31,214 --> 00:09:35,487
+here and three features on the right. But, imagine that
+I wrote down an infinitely long list, I
+也没有在这写三个特征。 假设我写下无穷长的列表,
+
+112
+00:09:35,487 --> 00:09:39,866
+just kept writing more and more and more
+features. Like an infinitely long list of
+我不停地写更多更多的特征,无限长的特征列表。
+
+113
+00:09:39,866 --> 00:09:44,192
+features. Turns out, we'll be able to come
+up with an algorithm that can deal with
+结果是,我们能提出一个算法来处理它。
+
+114
+00:09:44,192 --> 00:09:49,701
+that. So, just to recap. In this
+ class we'll talk about supervised
+概括一下,在这节课上,我们仅讨论监督学习。
+
+115
+00:09:49,701 --> 00:09:54,167
+learning. And the idea is that, in
+ supervised learning, in every example in
+思想是,在监督学习中,数据集中的每个例子,
+
+116
+00:09:54,167 --> 00:09:58,880
+our data set, we are told what is the
+"correct answer" that we would have
+算法将预测得到例子的“正确答案”。
+
+117
+00:09:58,880 --> 00:10:03,960
+quite liked the algorithms have predicted
+on that example. Such as the price of the
+像房子的价格,
+
+118
+00:10:03,960 --> 00:10:08,428
+house, or whether a tumor is malignant or
+benign. We also talked about the
+或瘤是恶性的还是良性的。
+
+119
+00:10:08,428 --> 00:10:13,202
+regression problem. And by regression,
+that means that our goal is to predict a
+我们也讨论了回归问题。回归是指我们的目标是预测一个连续的输出值。
+
+120
+00:10:13,202 --> 00:10:17,977
+continuous valued output. And we talked
+ about the classification problem, where
+我们讨论了分类问题,
+
+121
+00:10:17,977 --> 00:10:22,690
+the goal is to predict a discrete value
+output. Just a quick wrap up question:
+目的是预测离散的输出值。
+
+122
+00:10:22,690 --> 00:10:27,541
+Suppose you're running a company and you
+want to develop learning algorithms to
+假设你运作一个公司,并且你想开发学习算法来处理两个问题中的一个。
+
+123
+00:10:27,541 --> 00:10:32,618
+address each of two problems. In the first
+problem, you have a large inventory of
+第一个问题,你有一个相同项目的清单。
+
+124
+00:10:32,618 --> 00:10:38,113
+identical items. So imagine that you have
+thousands of copies of some identical
+假设你有某些相同项目的成千上万的副本要卖,
+
+125
+00:10:38,113 --> 00:10:43,607
+items to sell and you want to predict how
+many of these items you sell within the
+你想预测下在接下来的三个月内你会卖出这些项目中的多少个。
+
+126
+00:10:43,607 --> 00:10:49,172
+next three months. In the second problem,
+problem two, you'd like-- you have lots of
+第二个问题是,你有很多用户,
+
+127
+00:10:49,172 --> 00:10:54,145
+users and you want to write software to
+examine each individual of your
+你想要写一个软件来检查每一个客户的账户,
+
+128
+00:10:54,145 --> 00:10:59,193
+customer's accounts, so each one of your
+customer's accounts; and for each account,
+你客户账户的每个账户,
+
+129
+00:10:59,193 --> 00:11:04,178
+decide whether or not the account has been
+hacked or compromised. So, for each of
+并且决定这个账户被入侵还是被破坏了。
+
+130
+00:11:04,178 --> 00:11:08,914
+these problems, should they be treated as
+a classification problem, or as a
+对每一个这种问题,应该被认为是分类问题还是回归问题?
+
+131
+00:11:08,914 --> 00:11:14,087
+regression problem? When the video pauses,
+please use your mouse to select whichever
+当视频暂停时,用你的鼠标选择四个选项中的正确答案。
+
+132
+00:11:14,087 --> 00:11:20,884
+of these four options on the left you
+ think is the correct answer. So hopefully,
+
+133
+00:11:20,884 --> 00:11:25,871
+you got that this is the answer. For
+problem one, I would treat this as a
+很高兴,你选择了正确答案。问题以,我会把它处理为回归问题,
+
+134
+00:11:25,871 --> 00:11:31,058
+regression problem, because if I have, you
+know, thousands of items, well, I would
+因为,假设我有成千上万的项目,我很可能将它看成一个实数,
+
+135
+00:11:31,058 --> 00:11:36,071
+probably just treat this as a real value,
+as a continuous value. And
+即一个连续的值。
+
+136
+00:11:36,290 --> 00:11:41,837
+treat, therefore, the number of items I sell,
+as a continuous value. And for the
+即把我要卖的项目数量看成一个连续的值。
+
+137
+00:11:41,837 --> 00:11:47,748
+second problem, I would treat that as a
+classification problem, because I might
+第二个问题,我将看成一个分类问题,
+
+138
+00:11:47,748 --> 00:11:53,659
+say, set the value I want to predict with
+zero, to denote the account has not been
+因为,我可能会设置我要预测的值为0,表示账户没有没入侵;
+
+139
+00:11:53,659 --> 00:11:58,850
+hacked. And set the value one to denote an
+account that has been hacked into. So just
+设置值为1,表示账户已经被入侵。
+
+140
+00:11:58,850 --> 00:12:03,287
+like, you know, breast cancer, is,
+zero is benign, one is malignant. So I
+因此,正如你知道的,乳腺癌是0的就是良性,1就是恶性。
+
+141
+00:12:03,287 --> 00:12:08,150
+might set this be zero or one depending on
+whether it's been hacked, and have an
+因此,我可能会设置它为0或1,看它是否被入侵。
+
+142
+00:12:08,150 --> 00:12:13,134
+algorithm try to predict each one of these
+two discrete values. And because there's a
+并用一个算法来预测这两个离散值中的一个。
+
+143
+00:12:13,134 --> 00:12:17,693
+small number of discrete values, I would
+ therefore treat it as a classification
+因为,只有少量的离散值,因此,我把它作为一个分类问题。
+
+144
+00:12:17,693 --> 00:12:23,075
+problem. So, that's it for supervised
+ learning and in the next video I'll talk
+这就是监督学习。下个视频中,我们将讨论非监督学习,
+
+145
+00:12:23,075 --> 00:12:28,325
+about unsupervised learning, which is the
+other major category of learning algorithms.
+它是学习问题的另一大类。
+
diff --git a/srt/1 - 4 - Unsupervised Learning (14 min).srt b/srt/1 - 4 - Unsupervised Learning (14 min).srt
new file mode 100644
index 00000000..5a867d52
--- /dev/null
+++ b/srt/1 - 4 - Unsupervised Learning (14 min).srt
@@ -0,0 +1,2060 @@
+1
+00:00:00,380 --> 00:00:01,550
+In this video, we'll talk about
+在这段视频中 我们将讨论
+
+
+
+
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,670 --> 00:00:02,690
+the second major type of machine
+第二种主要的机器学习问题
+
+3
+00:00:03,010 --> 00:00:05,030
+learning problem, called Unsupervised Learning.
+叫做无监督学习
+
+4
+00:00:06,300 --> 00:00:08,500
+In the last video, we talked about Supervised Learning.
+在上一节视频中 我们已经讲过了监督学习
+
+5
+00:00:09,250 --> 00:00:10,700
+Back then, recall data sets
+回想起上次的数据集
+
+6
+00:00:11,020 --> 00:00:12,670
+that look like this, where each
+每个样本
+
+7
+00:00:12,890 --> 00:00:15,150
+example was labeled either
+都已经被标明为
+
+8
+00:00:15,610 --> 00:00:16,900
+as a positive or negative example,
+正样本或者负样本
+
+9
+00:00:17,530 --> 00:00:19,800
+whether it was a benign or a malignant tumor.
+即良性或恶性肿瘤
+
+10
+00:00:20,850 --> 00:00:21,920
+So for each example in Supervised
+因此 对于监督学习中的每一个样本
+
+11
+00:00:22,410 --> 00:00:24,270
+Learning, we were told explicitly what
+我们已经被清楚地告知了
+
+12
+00:00:24,440 --> 00:00:25,760
+is the so-called right answer,
+什么是所谓的正确答案
+
+13
+00:00:26,490 --> 00:00:27,580
+whether it's benign or malignant.
+即它们是良性还是恶性
+
+14
+00:00:28,550 --> 00:00:30,210
+In Unsupervised Learning, we're given
+在无监督学习中
+
+15
+00:00:30,540 --> 00:00:31,720
+data that looks different
+我们用的数据会和监督学习里的看起来有些不一样
+
+16
+00:00:31,950 --> 00:00:32,910
+than data that looks like
+在无监督学习中
+
+17
+00:00:33,190 --> 00:00:34,600
+this that doesn't have
+没有属性或标签这一概念
+
+18
+00:00:34,720 --> 00:00:35,920
+any labels or that all
+也就是说所有的数据
+
+19
+00:00:36,130 --> 00:00:37,460
+has the same label or really no labels.
+都是一样的 没有区别
+
+20
+00:00:39,680 --> 00:00:40,740
+So we're given the data set and
+所以在无监督学习中 我们只有一个数据集
+
+21
+00:00:40,980 --> 00:00:42,460
+we're not told what to
+没人告诉我们该怎么做
+
+22
+00:00:42,560 --> 00:00:43,290
+do with it and we're not
+我们也不知道
+
+23
+00:00:43,640 --> 00:00:44,800
+told what each data point is.
+每个数据点究竟是什么意思
+
+24
+00:00:45,290 --> 00:00:47,190
+Instead we're just told, here is a data set.
+相反 它只告诉我们 现在有一个数据集
+
+25
+00:00:47,870 --> 00:00:49,650
+Can you find some structure in the data?
+你能在其中找到某种结构吗?
+
+26
+00:00:50,480 --> 00:00:51,670
+Given this data set, an
+对于给定的数据集
+
+27
+00:00:52,350 --> 00:00:53,940
+Unsupervised Learning algorithm might decide that
+无监督学习算法可能判定
+
+28
+00:00:54,060 --> 00:00:56,090
+the data lives in two different clusters.
+该数据集包含两个不同的聚类
+
+29
+00:00:56,800 --> 00:00:57,960
+And so there's one cluster
+你看 这是第一个聚类
+
+30
+00:00:59,120 --> 00:00:59,910
+and there's a different cluster.
+然后这是另一个聚类
+
+31
+00:01:01,110 --> 00:01:02,710
+And yes, Supervised Learning algorithm may
+你猜对了 无监督学习算法
+
+32
+00:01:03,040 --> 00:01:05,070
+break these data into these two separate clusters.
+会把这些数据分成两个不同的聚类
+
+33
+00:01:06,410 --> 00:01:08,000
+So this is called a clustering algorithm.
+所以这就是所谓的聚类算法
+
+34
+00:01:08,860 --> 00:01:10,310
+And this turns out to be used in many places.
+实际上它被用在许多地方
+
+35
+00:01:11,930 --> 00:01:13,310
+One example where clustering
+我们来举一个聚类算法的栗子
+
+36
+00:01:13,530 --> 00:01:14,860
+is used is in Google
+Google 新闻的例子
+
+37
+00:01:15,060 --> 00:01:16,160
+News and if you have not
+如果你还没见过这个页面的话
+
+38
+00:01:16,360 --> 00:01:17,320
+seen this before, you can actually
+你可以到这个URL
+
+39
+00:01:18,210 --> 00:01:19,040
+go to this URL news.google.com
+news.google.com
+
+40
+00:01:19,830 --> 00:01:20,460
+to take a look.
+去看看
+
+41
+00:01:21,280 --> 00:01:22,970
+What Google News does is everyday
+谷歌新闻每天都在干什么呢?
+
+42
+00:01:23,480 --> 00:01:24,220
+it goes and looks at tens
+他们每天会去收集
+
+43
+00:01:24,470 --> 00:01:25,430
+of thousands or hundreds of
+成千上万的
+
+44
+00:01:25,720 --> 00:01:26,740
+thousands of new stories on the
+网络上的新闻
+
+45
+00:01:26,800 --> 00:01:29,410
+web and it groups them into cohesive news stories.
+然后将他们分组 组成一个个新闻专题
+
+46
+00:01:30,730 --> 00:01:31,690
+For example, let's look here.
+比如 让我们来看看这里
+
+47
+00:01:33,380 --> 00:01:35,370
+The URLs here link
+这里的URL链接
+
+48
+00:01:35,910 --> 00:01:37,260
+to different news stories
+连接着不同的
+
+49
+00:01:38,010 --> 00:01:40,110
+about the BP Oil Well story.
+有关BP油井事故的报道
+
+50
+00:01:41,300 --> 00:01:42,160
+So, let's click on
+所以 让我们点击
+
+51
+00:01:42,260 --> 00:01:43,090
+one of these URL's and we'll
+这些URL中的一个
+
+52
+00:01:43,550 --> 00:01:44,780
+click on one of these URL's.
+恩 让我们点一个
+
+53
+00:01:45,100 --> 00:01:46,970
+What I'll get to is a web page like this.
+然后我们会来到这样一个网页
+
+54
+00:01:47,210 --> 00:01:48,390
+Here's a Wall Street
+这是一篇来自华尔街日报的
+
+55
+00:01:48,590 --> 00:01:50,180
+Journal article about, you know, the BP
+有关……你懂的
+
+56
+00:01:51,110 --> 00:01:52,530
+Oil Well Spill stories of
+有关BP油井泄漏事故的报道
+
+57
+00:01:52,920 --> 00:01:54,350
+"BP Kills Macondo",
+标题为《BP杀死了Macondo》
+
+58
+00:01:54,590 --> 00:01:55,700
+which is a name of the
+Macondo 是个地名
+
+59
+00:01:55,980 --> 00:01:57,960
+spill and if you
+就是那个漏油事故的地方
+
+60
+00:01:58,020 --> 00:01:59,360
+click on a different URL
+如果你从这个组里点击一个不同的URL
+
+61
+00:02:00,690 --> 00:02:02,500
+from that group then you might get the different story.
+那么你可能会得到不同的新闻
+
+62
+00:02:02,950 --> 00:02:04,760
+Here's the CNN story about a
+这里是一则CNN的新闻
+
+63
+00:02:04,820 --> 00:02:06,090
+game, the BP Oil Spill,
+是一个有关BP石油泄漏的视频
+
+64
+00:02:07,090 --> 00:02:08,180
+and if you click on yet
+如果你再点击第三个链接
+
+65
+00:02:08,740 --> 00:02:10,990
+a third link, then you might get a different story.
+又会出现不同的新闻
+
+66
+00:02:11,440 --> 00:02:13,380
+Here's the UK Guardian story
+这边是英国卫报的报道
+
+67
+00:02:13,940 --> 00:02:15,510
+about the BP Oil Spill.
+也是关于BP石油泄漏
+
+68
+00:02:16,530 --> 00:02:17,790
+So what Google News has done
+所以 谷歌新闻所做的就是
+
+69
+00:02:17,990 --> 00:02:19,440
+is look for tens of thousands of
+去搜索成千上万条新闻
+
+70
+00:02:19,490 --> 00:02:22,170
+news stories and automatically cluster them together.
+然后自动的将他们聚合在一起
+
+71
+00:02:23,030 --> 00:02:24,660
+So, the news stories that are all
+因此 有关同一主题的
+
+72
+00:02:25,080 --> 00:02:27,010
+about the same topic get displayed together.
+新闻被显示在一起
+
+73
+00:02:27,210 --> 00:02:29,170
+It turns out that
+其实
+
+74
+00:02:29,380 --> 00:02:31,020
+clustering algorithms and Unsupervised Learning
+聚类算法和无监督学习算法
+
+75
+00:02:31,530 --> 00:02:33,550
+algorithms are used in many other problems as well.
+也可以被用于许多其他的问题
+
+76
+00:02:35,320 --> 00:02:36,690
+Here's one on understanding genomics.
+这里我们举个它在基因组学中的应用
+
+77
+00:02:38,270 --> 00:02:40,510
+Here's an example of DNA microarray data.
+下面是一个关于基因芯片的例子
+
+78
+00:02:40,990 --> 00:02:42,230
+The idea is put
+基本的思想是
+
+79
+00:02:42,430 --> 00:02:44,360
+a group of different individuals and
+给定一组不同的个体
+
+80
+00:02:44,510 --> 00:02:45,590
+for each of them, you measure
+对于每个个体
+
+81
+00:02:46,100 --> 00:02:48,580
+how much they do or do not have a certain gene.
+检测它们是否拥有某个特定的基因
+
+82
+00:02:49,050 --> 00:02:51,640
+Technically you measure how much certain genes are expressed.
+也就是说,你要去分析有多少基因显现出来了
+
+83
+00:02:52,000 --> 00:02:54,190
+So these colors, red, green,
+因此 这些颜色 红 绿
+
+84
+00:02:54,930 --> 00:02:56,210
+gray and so on, they
+灰 等等 它们
+
+85
+00:02:56,340 --> 00:02:57,500
+show the degree to which
+展示了这些不同的个体
+
+86
+00:02:57,780 --> 00:02:59,440
+different individuals do or
+是否拥有一个特定基因
+
+87
+00:02:59,510 --> 00:03:01,270
+do not have a specific gene.
+的不同程度
+
+88
+00:03:02,500 --> 00:03:03,400
+And what you can do is then
+然后你所能做的就是
+
+89
+00:03:03,610 --> 00:03:05,070
+run a clustering algorithm to group
+运行一个聚类算法
+
+90
+00:03:05,380 --> 00:03:07,140
+individuals into different categories
+把不同的个体归入不同的类
+
+91
+00:03:07,780 --> 00:03:08,810
+or into different types of people.
+或归为不同类型的人
+
+92
+00:03:10,230 --> 00:03:11,660
+So this is Unsupervised Learning because
+这就是无监督学习
+
+93
+00:03:11,930 --> 00:03:14,010
+we're not telling the algorithm in advance
+我们没有提前告知这个算法
+
+94
+00:03:14,590 --> 00:03:15,690
+that these are type 1 people,
+这些是第一类的人
+
+95
+00:03:16,130 --> 00:03:17,420
+those are type 2 persons, those
+这些是第二类的人
+
+96
+00:03:17,560 --> 00:03:18,650
+are type 3 persons and so
+这些是第三类的人等等
+
+97
+00:03:19,610 --> 00:03:22,390
+on and instead what were saying is yeah here's a bunch of data.
+相反我们只是告诉算法 你看 这儿有一堆数据
+
+98
+00:03:23,110 --> 00:03:24,030
+I don't know what's in this data.
+我不知道这个数据是什么东东
+
+99
+00:03:24,750 --> 00:03:25,870
+I don't know who's and what type.
+我不知道里面都有些什么类型 叫什么名字
+
+100
+00:03:26,150 --> 00:03:26,940
+I don't even know what the different
+我甚至不知道都有哪些类型
+
+101
+00:03:27,260 --> 00:03:28,480
+types of people are, but can
+但是
+
+102
+00:03:28,610 --> 00:03:30,210
+you automatically find structure in
+请问你可以自动的找到这些数据中的类型吗?
+
+103
+00:03:30,360 --> 00:03:31,260
+the data from the you automatically
+然后自动的
+
+104
+00:03:32,180 --> 00:03:33,620
+cluster the individuals into these types
+按得到的类型把这些个体分类
+
+105
+00:03:33,870 --> 00:03:35,490
+that I don't know in advance?
+虽然事先我并不知道哪些类型
+
+106
+00:03:35,890 --> 00:03:37,610
+Because we're not giving the algorithm
+因为对于这些数据样本来说
+
+107
+00:03:38,160 --> 00:03:40,140
+the right answer for the
+我们没有给算法一个
+
+108
+00:03:40,370 --> 00:03:41,270
+examples in my data
+正确答案
+
+109
+00:03:41,590 --> 00:03:43,090
+set, this is Unsupervised Learning.
+所以 这就是无监督学习
+
+110
+00:03:44,290 --> 00:03:47,040
+Unsupervised Learning or clustering is used for a bunch of other applications.
+无监督学习或聚类算法在其他领域也有着大量的应用
+
+111
+00:03:48,340 --> 00:03:50,340
+It's used to organize large computer clusters.
+它被用来组织大型的计算机集群
+
+112
+00:03:51,390 --> 00:03:52,530
+I had some friends looking at
+我有一些朋友在管理
+
+113
+00:03:52,680 --> 00:03:53,970
+large data centers, that is
+大型数据中心 也就是
+
+114
+00:03:54,180 --> 00:03:55,970
+large computer clusters and trying
+大型计算机集群 并试图
+
+115
+00:03:56,230 --> 00:03:57,470
+to figure out which machines tend to
+找出哪些机器趋向于
+
+116
+00:03:57,590 --> 00:03:59,130
+work together and if
+协同工作
+
+117
+00:03:59,200 --> 00:04:00,270
+you can put those machines together,
+如果你把这些机器放在一起
+
+118
+00:04:01,100 --> 00:04:03,220
+you can make your data center work more efficiently.
+你就可以让你的数据中心更高效地工作
+
+119
+00:04:04,810 --> 00:04:06,820
+This second application is on social network analysis.
+第二种应用是用于社交网络的分析
+
+120
+00:04:07,890 --> 00:04:09,230
+So given knowledge about which friends
+所以 如果可以得知
+
+121
+00:04:09,630 --> 00:04:10,840
+you email the most or
+哪些朋友你用email联系的最多
+
+122
+00:04:10,880 --> 00:04:12,150
+given your Facebook friends or
+或者知道你的Facebook好友
+
+123
+00:04:12,180 --> 00:04:14,150
+your Google+ circles, can
+或者你Google+里的朋友
+
+124
+00:04:14,290 --> 00:04:16,380
+we automatically identify which are
+知道了这些之后
+
+125
+00:04:16,450 --> 00:04:17,950
+cohesive groups of friends,
+我们是否可以自动识别
+
+126
+00:04:18,460 --> 00:04:19,420
+also which are groups of people
+哪些是很要好的朋友组
+
+127
+00:04:20,230 --> 00:04:21,010
+that all know each other?
+哪些仅仅是互相认识的朋友组
+
+128
+00:04:22,540 --> 00:04:22,880
+Market segmentation.
+还有在市场分割中的应用
+
+129
+00:04:24,680 --> 00:04:26,780
+Many companies have huge databases of customer information.
+许多公司拥有庞大的客户信息数据库
+
+130
+00:04:27,700 --> 00:04:28,410
+So, can you look at this
+那么 给你一个
+
+131
+00:04:28,510 --> 00:04:30,000
+customer data set and automatically
+客户数据集 你能否
+
+132
+00:04:30,740 --> 00:04:32,340
+discover market segments and automatically
+自动找出不同的市场分割
+
+133
+00:04:33,340 --> 00:04:35,290
+group your customers into different
+并自动将你的客户分到不同的
+
+134
+00:04:35,820 --> 00:04:37,400
+market segments so that
+细分市场中
+
+135
+00:04:37,710 --> 00:04:39,490
+you can automatically and more
+从而有助于我在
+
+136
+00:04:39,650 --> 00:04:41,580
+efficiently sell or market
+不同的细分市场中
+
+137
+00:04:41,890 --> 00:04:43,250
+your different market segments together?
+进行更有效的销售
+
+138
+00:04:44,260 --> 00:04:45,580
+Again, this is Unsupervised Learning
+这也是无监督学习
+
+139
+00:04:45,820 --> 00:04:46,720
+because we have all this
+我们现在有
+
+140
+00:04:46,900 --> 00:04:48,340
+customer data, but we don't
+这些客户数据
+
+141
+00:04:48,590 --> 00:04:49,710
+know in advance what are the
+但我们预先并不知道
+
+142
+00:04:49,790 --> 00:04:51,270
+market segments and for
+有哪些细分市场
+
+143
+00:04:51,440 --> 00:04:52,570
+the customers in our data
+而且
+
+144
+00:04:52,660 --> 00:04:53,590
+set, you know, we don't know in
+对于我们数据集的某个客户
+
+145
+00:04:53,690 --> 00:04:54,700
+advance who is in
+我们也不能预先知道
+
+146
+00:04:54,800 --> 00:04:55,840
+market segment one, who is
+谁属于细分市场一
+
+147
+00:04:55,940 --> 00:04:57,800
+in market segment two, and so on.
+谁又属于细分市场二等等
+
+148
+00:04:57,930 --> 00:05:00,630
+But we have to let the algorithm discover all this just from the data.
+但我们必须让这个算法自己去从数据中发现这一切
+
+149
+00:05:01,970 --> 00:05:03,140
+Finally, it turns out that Unsupervised
+最后
+
+150
+00:05:03,690 --> 00:05:05,620
+Learning is also used for
+事实上无监督学习也被用于
+
+151
+00:05:06,090 --> 00:05:08,060
+surprisingly astronomical data analysis
+天文数据分析
+
+152
+00:05:08,890 --> 00:05:10,390
+and these clustering algorithms gives
+通过这些聚类算法 我们发现了许多
+
+153
+00:05:10,580 --> 00:05:12,440
+surprisingly interesting useful theories
+惊人的、有趣的 以及实用的
+
+154
+00:05:12,900 --> 00:05:15,610
+of how galaxies are born.
+关于星系是如何诞生的理论
+
+155
+00:05:15,880 --> 00:05:17,620
+All of these are examples of clustering,
+所有这些都是聚类算法的例子
+
+156
+00:05:18,400 --> 00:05:20,550
+which is just one type of Unsupervised Learning.
+而聚类只是无监督学习的一种
+
+157
+00:05:21,530 --> 00:05:22,470
+Let me tell you about another one.
+现在让我来告诉你另一种
+
+158
+00:05:23,200 --> 00:05:25,020
+I'm gonna tell you about the cocktail party problem.
+我先来介绍一下鸡尾酒宴问题
+
+159
+00:05:26,310 --> 00:05:28,270
+So, you've been to cocktail parties before, right?
+恩 我想你参加过鸡尾酒会的 是吧?
+
+160
+00:05:28,440 --> 00:05:30,080
+Well, you can imagine there's a
+嗯 想象一下
+
+161
+00:05:30,300 --> 00:05:31,690
+party, room full of people, all
+有一个宴会 有一屋子的人
+
+162
+00:05:31,870 --> 00:05:32,930
+sitting around, all talking at the
+大家都坐在一起
+
+163
+00:05:32,970 --> 00:05:34,390
+same time and there are
+而且在同时说话
+
+164
+00:05:34,480 --> 00:05:36,230
+all these overlapping voices because everyone
+有许多声音混杂在一起
+
+165
+00:05:36,590 --> 00:05:37,920
+is talking at the same time, and
+因为每个人都是在同一时间说话的
+
+166
+00:05:38,070 --> 00:05:39,730
+it is almost hard to hear the person in front of you.
+在这种情况下你很难听清楚你面前的人说的话
+
+167
+00:05:40,690 --> 00:05:41,970
+So maybe at a
+因此 比如有这样一个场景
+
+168
+00:05:42,020 --> 00:05:43,990
+cocktail party with two people,
+宴会上只有两个人
+
+169
+00:05:45,690 --> 00:05:46,670
+two people talking at the same
+两个人
+
+170
+00:05:46,770 --> 00:05:48,090
+time, and it's a somewhat
+同时说话
+
+171
+00:05:48,740 --> 00:05:49,710
+small cocktail party.
+恩 这是个很小的鸡尾酒宴会
+
+172
+00:05:50,690 --> 00:05:51,630
+And we're going to put two
+我们准备好了两个麦克风
+
+173
+00:05:51,890 --> 00:05:53,080
+microphones in the room so
+把它们放在房间里
+
+174
+00:05:54,060 --> 00:05:55,640
+there are microphones, and because
+然后
+
+175
+00:05:56,050 --> 00:05:57,430
+these microphones are at two
+因为这两个麦克风距离这两个人
+
+176
+00:05:57,560 --> 00:05:58,900
+different distances from the
+的距离是不同的
+
+177
+00:05:58,990 --> 00:06:01,250
+speakers, each microphone records
+每个麦克风都记录下了
+
+178
+00:06:01,830 --> 00:06:04,720
+a different combination of these two speaker voices.
+来自两个人的声音的不同组合
+
+179
+00:06:05,810 --> 00:06:06,970
+Maybe speaker one is a
+也许A的声音
+
+180
+00:06:07,120 --> 00:06:08,320
+little louder in microphone one
+在第一个麦克风里的声音会响一点
+
+181
+00:06:09,120 --> 00:06:10,680
+and maybe speaker two is a
+也许B的声音
+
+182
+00:06:10,800 --> 00:06:12,350
+little bit louder on microphone 2
+在第二个麦克风里会比较响一些
+
+183
+00:06:12,560 --> 00:06:14,040
+because the 2 microphones are
+因为2个麦克风
+
+184
+00:06:14,230 --> 00:06:15,950
+at different positions relative to
+的位置相对于
+
+185
+00:06:16,400 --> 00:06:19,020
+the 2 speakers, but each
+2个说话者的位置是不同的
+
+186
+00:06:19,250 --> 00:06:20,390
+microphone would cause an overlapping
+但每个麦克风都会录到
+
+187
+00:06:20,970 --> 00:06:22,590
+combination of both speakers' voices.
+来自两个说话者的重叠部分的声音
+
+188
+00:06:23,960 --> 00:06:25,500
+So here's an actual recording
+这里有一个
+
+189
+00:06:26,520 --> 00:06:29,280
+of two speakers recorded by a researcher.
+来自一个研究员录下的两个说话者的声音
+
+190
+00:06:29,740 --> 00:06:30,950
+Let me play for you the
+让我先放给你听第一个
+
+191
+00:06:31,060 --> 00:06:32,760
+first, what the first microphone sounds like.
+这是第一个麦克风录到的录音:
+
+192
+00:06:33,560 --> 00:06:34,800
+One (uno), two (dos),
+一 (UNO) 二 (DOS)
+
+193
+00:06:35,070 --> 00:06:36,590
+three (tres), four (cuatro), five
+三 (TRES) 四 (CUATRO) 五 (CINCO)
+
+194
+00:06:37,060 --> 00:06:38,550
+(cinco), six (seis), seven (siete),
+六 (SEIS) 七 (SIETE)
+
+195
+00:06:38,990 --> 00:06:40,610
+eight (ocho), nine (nueve), ten (y diez).
+八 (ocho) 九 (NUEVE) 十 (Y DIEZ)
+
+196
+00:06:41,610 --> 00:06:42,650
+All right, maybe not the most interesting cocktail
+好吧 这大概不是什么有趣的酒会……
+
+197
+00:06:43,000 --> 00:06:44,270
+party, there's two people
+……在这个酒会上 有两个人
+
+198
+00:06:44,620 --> 00:06:45,670
+counting from one to ten
+各自从1数到10
+
+199
+00:06:46,010 --> 00:06:47,880
+in two languages but you know.
+但用的是两种不同语言
+
+200
+00:06:48,870 --> 00:06:49,760
+What you just heard was the
+你刚才听到的是
+
+201
+00:06:49,820 --> 00:06:52,500
+first microphone recording, here's the second recording.
+第一个麦克风的录音 这里是第二个的:
+
+202
+00:06:57,440 --> 00:06:58,040
+Uno (one), dos (two), tres (three), cuatro
+一 (UNO) 二 (DOS) 三 (TRES)
+
+203
+00:06:58,060 --> 00:06:58,730
+(four), cinco (five), seis (six), siete (seven),
+四 (CUATRO) 五 (CINCO) 六 (SEIS) 七 (SIETE)
+
+204
+00:06:59,160 --> 00:07:00,900
+ocho (eight), nueve (nine) y diez (ten).
+八 (ocho) 九 (NUEVE) 十 (Y DIEZ)
+
+205
+00:07:01,860 --> 00:07:02,850
+So we can do, is take
+所以 我们能做的就是把
+
+206
+00:07:03,380 --> 00:07:04,660
+these two microphone recorders and give
+这两个录音输入
+
+207
+00:07:04,980 --> 00:07:06,480
+them to an Unsupervised Learning algorithm
+一种无监督学习算法中
+
+208
+00:07:07,010 --> 00:07:08,560
+called the cocktail party algorithm,
+称为“鸡尾酒会算法”
+
+209
+00:07:08,780 --> 00:07:09,910
+and tell the algorithm
+让这个算法
+
+210
+00:07:10,450 --> 00:07:12,140
+- find structure in this data for you.
+帮你找出其中蕴含的分类
+
+211
+00:07:12,250 --> 00:07:14,010
+And what the algorithm will do
+然后这个算法
+
+212
+00:07:14,410 --> 00:07:15,730
+is listen to these
+就会去听这些
+
+213
+00:07:15,980 --> 00:07:17,990
+audio recordings and say, you
+录音 并且你知道
+
+214
+00:07:18,140 --> 00:07:19,020
+know it sounds like the
+这听起来像
+
+215
+00:07:19,360 --> 00:07:20,950
+two audio recordings are being
+两个音频录音
+
+216
+00:07:21,240 --> 00:07:22,450
+added together or that have being
+被叠加在一起
+
+217
+00:07:22,670 --> 00:07:25,220
+summed together to produce these recordings that we had.
+所以我们才能听到这样的效果
+
+218
+00:07:25,990 --> 00:07:27,330
+Moreover, what the cocktail party
+此外 这个算法
+
+219
+00:07:27,710 --> 00:07:29,210
+algorithm will do is separate
+还会分离出
+
+220
+00:07:29,570 --> 00:07:30,810
+out these two audio sources
+这两个被
+
+221
+00:07:31,480 --> 00:07:32,700
+that were being added or being
+叠加到一起的
+
+222
+00:07:33,000 --> 00:07:34,240
+summed together to form other
+音频源
+
+223
+00:07:34,410 --> 00:07:35,600
+recordings and, in fact,
+事实上
+
+224
+00:07:36,200 --> 00:07:38,630
+here's the first output of the cocktail party algorithm.
+这是我们的鸡尾酒会算法的第一个输出
+
+225
+00:07:39,790 --> 00:07:41,910
+One, two, three, four,
+一 二 三 四
+
+226
+00:07:42,590 --> 00:07:46,270
+five, six, seven, eight, nine, ten.
+五 六 七 八 九 十
+
+227
+00:07:47,630 --> 00:07:48,780
+So, I separated out the English
+所以我在一个录音中
+
+228
+00:07:49,240 --> 00:07:51,220
+voice in one of the recordings.
+分离出了英文声音
+
+229
+00:07:52,460 --> 00:07:53,300
+And here's the second of it.
+这是第二个输出
+
+230
+00:07:53,380 --> 00:07:55,280
+Uno, dos, tres, quatro, cinco,
+Uno dos tres quatro cinco
+
+231
+00:07:55,980 --> 00:07:59,830
+seis, siete, ocho, nueve y diez.
+seis siete ocho nueve y diez
+
+232
+00:08:00,270 --> 00:08:01,180
+Not too bad, to give you
+听起来不错嘛
+
+233
+00:08:03,810 --> 00:08:05,270
+one more example, here's another
+再举一个例子 这是另一个录音
+
+234
+00:08:05,600 --> 00:08:07,370
+recording of another similar situation,
+也是在一个类似的场景下
+
+235
+00:08:08,060 --> 00:08:09,790
+here's the first microphone : One,
+这是第一个麦克风的录音:
+
+236
+00:08:10,470 --> 00:08:12,430
+two, three, four, five, six,
+一 二 三 四 五 六
+
+237
+00:08:13,370 --> 00:08:15,710
+seven, eight, nine, ten.
+七 八 九 十
+
+238
+00:08:16,980 --> 00:08:17,920
+OK so the poor guy's gone
+OK 这个可怜的家伙从
+
+239
+00:08:18,180 --> 00:08:19,350
+home from the cocktail party and
+鸡尾酒会回家了
+
+240
+00:08:19,420 --> 00:08:21,880
+he 's now sitting in a room by himself talking to his radio.
+他现在独自一人坐在屋里 对着录音机自言自语
+
+241
+00:08:23,090 --> 00:08:24,130
+Here's the second microphone recording.
+这是第二个麦克风的录音
+
+242
+00:08:28,810 --> 00:08:31,800
+One, two, three, four, five, six, seven, eight, nine, ten.
+一 二 三 四 五 六 七 八 九 十
+
+243
+00:08:33,310 --> 00:08:34,160
+When you give these two microphone
+当你把这两个麦克风录音
+
+244
+00:08:34,610 --> 00:08:35,530
+recordings to the same algorithm,
+送给与刚刚相同的算法处理
+
+245
+00:08:36,360 --> 00:08:37,790
+what it does, is again say,
+它所做的还是
+
+246
+00:08:38,380 --> 00:08:39,470
+you know, it sounds like there
+告诉你 这听起来有
+
+247
+00:08:39,690 --> 00:08:41,370
+are two audio sources, and moreover,
+两种音频源 并且
+
+248
+00:08:42,410 --> 00:08:43,820
+the album says, here is
+算法说
+
+249
+00:08:44,070 --> 00:08:46,010
+the first of the audio sources I found.
+这里是我找到的第一个音频源
+
+250
+00:08:47,480 --> 00:08:49,300
+One, two, three, four,
+一 二 三 四
+
+251
+00:08:49,730 --> 00:08:53,430
+five, six, seven, eight, nine, ten.
+五 六 七 八 九 十
+
+252
+00:08:54,650 --> 00:08:56,110
+So that wasn't perfect, it
+恩 不是太完美
+
+253
+00:08:56,340 --> 00:08:57,360
+got the voice, but it
+提取到了人声
+
+254
+00:08:57,570 --> 00:08:59,070
+also got a little bit of the music in there.
+但还有一点音乐没有剔除掉
+
+255
+00:08:59,890 --> 00:09:01,360
+Then here's the second output to the algorithm.
+这是算法的第二个输出
+
+256
+00:09:10,020 --> 00:09:11,310
+Not too bad, in that second
+还好 在第二个输出中
+
+257
+00:09:11,540 --> 00:09:13,300
+output it managed to get rid of the voice entirely.
+它设法剔除掉了整个人声
+
+258
+00:09:13,760 --> 00:09:14,850
+And just, you know,
+只是清理了下音乐
+
+259
+00:09:15,020 --> 00:09:17,380
+cleaned up the music, got rid of the counting from one to ten.
+剔除了从一到十的计数
+
+260
+00:09:18,840 --> 00:09:20,090
+So you might look at
+所以 你可以看到
+
+261
+00:09:20,180 --> 00:09:21,750
+an Unsupervised Learning algorithm like
+像这样的无监督学习算法
+
+262
+00:09:21,950 --> 00:09:23,050
+this and ask how
+也许你想问 要实现这样的算法
+
+263
+00:09:23,250 --> 00:09:25,110
+complicated this is to implement this, right?
+很复杂吧?
+
+264
+00:09:25,330 --> 00:09:26,560
+It seems like in order to,
+看起来 为了
+
+265
+00:09:26,970 --> 00:09:28,870
+you know, build this application, it seems
+构建这个应用程序
+
+266
+00:09:28,930 --> 00:09:30,550
+like to do this audio processing you
+做这个音频处理
+
+267
+00:09:30,670 --> 00:09:31,430
+need to write a ton of code
+似乎需要写好多代码啊
+
+268
+00:09:32,240 --> 00:09:33,580
+or maybe link into like a
+或者需要链接到
+
+269
+00:09:33,690 --> 00:09:35,380
+bunch of synthesizer Java libraries that
+一堆处理音频的Java库
+
+270
+00:09:35,470 --> 00:09:37,150
+process audio, seems like
+貌似需要一个
+
+271
+00:09:37,240 --> 00:09:38,880
+a really complicated program, to do
+非常复杂的程序
+
+272
+00:09:39,060 --> 00:09:41,040
+this audio, separating out audio and so on.
+分离出音频等
+
+273
+00:09:42,460 --> 00:09:43,860
+It turns out the algorithm, to
+实际上
+
+274
+00:09:44,070 --> 00:09:45,640
+do what you just heard, that
+要实现你刚刚听到的效果
+
+275
+00:09:45,900 --> 00:09:47,280
+can be done with one line
+只需要一行代码就可以了
+
+276
+00:09:47,530 --> 00:09:49,270
+of code - shown right here.
+写在这里呢
+
+277
+00:09:50,640 --> 00:09:52,350
+It take researchers a long
+当然 研究人员
+
+278
+00:09:52,610 --> 00:09:54,060
+time to come up with this line of code.
+花了很长时间才想出这行代码的 ^-^
+
+279
+00:09:54,490 --> 00:09:56,090
+I'm not saying this is an easy problem,
+我不是说这是一个简单的问题
+
+280
+00:09:57,080 --> 00:09:57,980
+But it turns out that when you
+但事实上 如果你
+
+281
+00:09:58,180 --> 00:10:00,330
+use the right programming environment, many learning
+使用正确的编程环境 许多学习
+
+282
+00:10:00,670 --> 00:10:02,060
+algorithms can be really short programs.
+算法是用很短的代码写出来的
+
+283
+00:10:03,510 --> 00:10:04,700
+So this is also why in
+所以这也是为什么在
+
+284
+00:10:04,840 --> 00:10:05,890
+this class we're going to
+这门课中我们要
+
+285
+00:10:06,010 --> 00:10:07,430
+use the Octave programming environment.
+使用Octave的编程环境
+
+286
+00:10:08,550 --> 00:10:09,910
+Octave, is free open source
+Octave是一个免费的
+
+287
+00:10:10,120 --> 00:10:11,620
+software, and using a
+开放源码的软件
+
+288
+00:10:11,670 --> 00:10:13,130
+tool like Octave or Matlab,
+使用Octave或Matlab这类的工具
+
+289
+00:10:14,000 --> 00:10:15,400
+many learning algorithms become just
+许多学习算法
+
+290
+00:10:15,690 --> 00:10:17,910
+a few lines of code to implement.
+都可以用几行代码就可以实现
+
+291
+00:10:18,380 --> 00:10:19,400
+Later in this class, I'll just teach
+在后续课程中
+
+292
+00:10:19,620 --> 00:10:20,570
+you a little bit about how to
+我会教你如何使用Octave
+
+293
+00:10:20,720 --> 00:10:21,920
+use Octave and you'll be
+你会学到
+
+294
+00:10:22,050 --> 00:10:24,590
+implementing some of these algorithms in Octave.
+如何在Octave中实现这些算法
+
+295
+00:10:24,980 --> 00:10:26,050
+Or if you have Matlab you can use that too.
+或者 如果你有Matlab 你可以用它
+
+296
+00:10:27,120 --> 00:10:28,500
+It turns out the Silicon Valley, for
+事实上 在硅谷
+
+297
+00:10:28,620 --> 00:10:29,470
+a lot of machine learning algorithms,
+很多的机器学习算法
+
+298
+00:10:30,290 --> 00:10:31,310
+what we do is first prototype
+我们都是先用Octave
+
+299
+00:10:32,040 --> 00:10:33,900
+our software in Octave because software
+写一个程序原型
+
+300
+00:10:34,330 --> 00:10:35,250
+in Octave makes it incredibly fast
+因为在Octave中实现这些
+
+301
+00:10:35,540 --> 00:10:36,920
+to implement these learning algorithms.
+学习算法的速度快得让你无法想象
+
+302
+00:10:38,230 --> 00:10:39,110
+Here each of these functions
+在这里 每一个函数
+
+303
+00:10:39,720 --> 00:10:41,460
+like for example the SVD
+例如 SVD
+
+304
+00:10:41,680 --> 00:10:42,920
+function that stands for singular
+意思是奇异值分解
+
+305
+00:10:43,240 --> 00:10:44,520
+value decomposition; but that turns
+但这其实是解线性方程
+
+306
+00:10:44,640 --> 00:10:45,690
+out to be a
+的一个惯例
+
+307
+00:10:45,820 --> 00:10:48,420
+linear algebra routine, that is just built into Octave.
+它被内置在Octave软件中了
+
+308
+00:10:49,500 --> 00:10:50,390
+If you were trying to do this
+如果你试图
+
+309
+00:10:50,460 --> 00:10:51,490
+in C++ or Java,
+在C + +或Java中做这个
+
+310
+00:10:51,780 --> 00:10:53,040
+this would be many many lines of
+将需要写N多代码
+
+311
+00:10:53,180 --> 00:10:55,680
+code linking complex C++ or Java libraries.
+并且还要连接复杂的C + +或Java库
+
+312
+00:10:56,440 --> 00:10:57,490
+So, you can implement this stuff as
+所以 你可以在C++或
+
+313
+00:10:57,680 --> 00:10:58,690
+C++ or Java
+Java或Python中
+
+314
+00:10:59,050 --> 00:11:00,090
+or Python, it's just much
+实现这个算法 只是会
+
+315
+00:11:00,290 --> 00:11:02,090
+more complicated to do so in those languages.
+更加复杂而已
+
+316
+00:11:03,750 --> 00:11:05,060
+What I've seen after having taught
+在教授机器学习
+
+317
+00:11:05,300 --> 00:11:06,980
+machine learning for almost a
+将近10年后
+
+318
+00:11:07,210 --> 00:11:08,680
+decade now, is that, you
+我得出的一个经验就是
+
+319
+00:11:08,890 --> 00:11:10,340
+learn much faster if you
+如果你使用Octave的话
+
+320
+00:11:10,480 --> 00:11:11,700
+use Octave as your
+会学的更快
+
+321
+00:11:11,790 --> 00:11:14,070
+programming environment, and if
+并且如果你用
+
+322
+00:11:14,250 --> 00:11:15,570
+you use Octave as your
+Octave作为你的学习工具
+
+323
+00:11:16,260 --> 00:11:17,110
+learning tool and as your
+和开发原型的工具
+
+324
+00:11:17,240 --> 00:11:18,640
+prototyping tool, it'll let
+你的学习和开发过程
+
+325
+00:11:19,000 --> 00:11:21,280
+you learn and prototype learning algorithms much more quickly.
+会变得更快
+
+326
+00:11:22,640 --> 00:11:23,850
+And in fact what many people will
+而事实上在硅谷
+
+327
+00:11:23,990 --> 00:11:25,390
+do to in the large Silicon
+很多人会这样做
+
+328
+00:11:25,730 --> 00:11:27,360
+Valley companies is in fact, use
+他们会先用Octave
+
+329
+00:11:27,560 --> 00:11:29,020
+an algorithm like Octave to first
+来实现这样一个学习算法原型
+
+330
+00:11:29,370 --> 00:11:31,110
+prototype the learning algorithm, and
+只有在确定
+
+331
+00:11:31,510 --> 00:11:32,780
+only after you've gotten it
+这个算法可以工作后
+
+332
+00:11:32,860 --> 00:11:33,820
+to work, then you migrate
+才开始迁移到
+
+333
+00:11:34,390 --> 00:11:35,910
+it to C++ or Java or whatever.
+C++ Java或其它编译环境
+
+334
+00:11:36,890 --> 00:11:37,960
+It turns out that by doing
+事实证明 这样做
+
+335
+00:11:38,220 --> 00:11:39,070
+things this way, you can often
+实现的算法
+
+336
+00:11:39,400 --> 00:11:40,440
+get your algorithm to work much
+比你一开始就用C++
+
+337
+00:11:41,300 --> 00:11:43,050
+faster than if you were starting out in C++.
+实现的算法要快多了
+
+338
+00:11:44,440 --> 00:11:46,010
+So, I know that as an
+所以 我知道
+
+339
+00:11:46,100 --> 00:11:47,490
+instructor, I get to
+作为一个老师
+
+340
+00:11:47,570 --> 00:11:48,580
+say "trust me on
+我不能老是念叨:
+
+341
+00:11:48,730 --> 00:11:49,790
+this one" only a finite
+“在这个问题上相信我“
+
+342
+00:11:50,030 --> 00:11:51,420
+number of times, but for
+但对于
+
+343
+00:11:51,560 --> 00:11:52,720
+those of you who've never used these
+那些从来没有用过这种
+
+344
+00:11:53,330 --> 00:11:54,880
+Octave type programming environments before,
+类似Octave的编程环境的童鞋
+
+345
+00:11:55,240 --> 00:11:56,070
+I am going to ask you
+我还是要请你
+
+346
+00:11:56,130 --> 00:11:56,970
+to trust me on this one,
+相信我这一次
+
+347
+00:11:57,570 --> 00:11:58,950
+and say that you, you will,
+我认为
+
+348
+00:11:59,700 --> 00:12:01,180
+I think your time, your development
+你的时间 研发时间
+
+349
+00:12:01,700 --> 00:12:03,100
+time is one of the most valuable resources.
+是你最宝贵的资源之一
+
+350
+00:12:04,210 --> 00:12:05,570
+And having seen lots
+当见过很多的人这样做以后
+
+351
+00:12:05,800 --> 00:12:06,850
+of people do this, I think
+我觉得如果你也这样做
+
+352
+00:12:07,190 --> 00:12:08,460
+you as a machine learning
+作为一个机器学习的
+
+353
+00:12:08,850 --> 00:12:09,990
+researcher, or machine learning developer
+研究者和开发者
+
+354
+00:12:10,830 --> 00:12:12,080
+will be much more productive if
+你会更有效率
+
+355
+00:12:12,220 --> 00:12:13,010
+you learn to start in prototype,
+如果你学会先用Octave开发原型
+
+356
+00:12:13,580 --> 00:12:15,250
+to start in Octave, in some other language.
+而不是先用其他的编程语言来开发
+
+357
+00:12:17,570 --> 00:12:19,790
+Finally, to wrap
+最后 总结一下
+
+358
+00:12:20,090 --> 00:12:22,890
+up this video, I have one quick review question for you.
+这里有一个问题需要你来解答
+
+359
+00:12:24,400 --> 00:12:26,400
+We talked about Unsupervised Learning, which
+我们谈到了无监督学习
+
+360
+00:12:26,700 --> 00:12:27,670
+is a learning setting where you
+它是一种学习机制
+
+361
+00:12:27,760 --> 00:12:28,730
+give the algorithm a ton
+你给算法大量的数据
+
+362
+00:12:28,840 --> 00:12:30,120
+of data and just ask it
+要求它找出数据中
+
+363
+00:12:30,240 --> 00:12:32,900
+to find structure in the data for us.
+蕴含的类型结构
+
+364
+00:12:33,160 --> 00:12:35,170
+Of the following four examples, which
+以下的四个例子中
+
+365
+00:12:35,490 --> 00:12:36,410
+ones, which of these four
+哪一个
+
+366
+00:12:36,870 --> 00:12:37,630
+do you think would will be
+您认为是
+
+367
+00:12:37,720 --> 00:12:39,520
+an Unsupervised Learning algorithm as
+无监督学习算法
+
+368
+00:12:40,220 --> 00:12:41,950
+opposed to Supervised Learning problem.
+而不是监督学习问题
+
+369
+00:12:42,730 --> 00:12:43,590
+For each of the four
+对于每一个选项
+
+370
+00:12:43,860 --> 00:12:44,850
+check boxes on the left,
+在左边的复选框
+
+371
+00:12:45,640 --> 00:12:46,900
+check the ones for which
+选中你认为
+
+372
+00:12:47,210 --> 00:12:49,400
+you think Unsupervised Learning
+属于无监督学习的
+
+373
+00:12:49,700 --> 00:12:51,300
+algorithm would be appropriate and
+选项
+
+374
+00:12:51,440 --> 00:12:53,930
+then click the button on the lower right to check your answer.
+然后按一下右下角的按钮 提交你的答案
+
+375
+00:12:54,690 --> 00:12:57,030
+So when the video pauses, please
+所以 当视频暂停时
+
+376
+00:12:57,370 --> 00:12:58,750
+answer the question on the slide.
+请回答幻灯片上的这个问题
+
+377
+00:13:01,860 --> 00:13:03,950
+So, hopefully, you've remembered the spam folder problem.
+恩 没忘记垃圾邮件文件夹问题吧?
+
+378
+00:13:04,710 --> 00:13:06,310
+If you have labeled data, you
+如果你已经标记过数据
+
+379
+00:13:06,450 --> 00:13:07,680
+know, with spam and
+那么就有垃圾邮件和
+
+380
+00:13:07,800 --> 00:13:10,470
+non-spam e-mail, we'd treat this as a Supervised Learning problem.
+非垃圾邮件的区别 我们会将此视为一个监督学习问题
+
+381
+00:13:11,620 --> 00:13:13,870
+The news story example, that's
+新闻故事的例子
+
+382
+00:13:14,100 --> 00:13:15,370
+exactly the Google News example
+正是我们在本课中讲到的
+
+383
+00:13:15,910 --> 00:13:16,600
+that we saw in this video,
+谷歌新闻的例子
+
+384
+00:13:17,090 --> 00:13:17,950
+we saw how you can use
+我们介绍了你可以如何使用
+
+385
+00:13:18,080 --> 00:13:19,460
+a clustering algorithm to cluster
+聚类算法这些文章聚合在一起
+
+386
+00:13:19,880 --> 00:13:21,980
+these articles together so that's Unsupervised Learning.
+所以这是无监督学习问题
+
+387
+00:13:23,250 --> 00:13:25,440
+The market segmentation example I
+市场细分的例子
+
+388
+00:13:25,510 --> 00:13:27,120
+talked a little bit earlier, you
+我之前有说过
+
+389
+00:13:27,220 --> 00:13:29,110
+can do that as an Unsupervised Learning problem
+这也是一个无监督学习问题
+
+390
+00:13:29,970 --> 00:13:30,860
+because I am just gonna
+因为我是要
+
+391
+00:13:30,930 --> 00:13:32,340
+get my algorithm data and ask
+拿到数据 然后要求
+
+392
+00:13:32,500 --> 00:13:34,340
+it to discover market segments automatically.
+它自动发现细分市场
+
+393
+00:13:35,610 --> 00:13:37,930
+And the final example, diabetes, well,
+最后一个例子 糖尿病
+
+394
+00:13:38,070 --> 00:13:39,080
+that's actually just like our
+这实际上就像我们
+
+395
+00:13:39,350 --> 00:13:41,480
+breast cancer example from the last video.
+上节课讲到的乳腺癌的例子
+
+396
+00:13:42,190 --> 00:13:43,320
+Only instead of, you know,
+只不过这里不是
+
+397
+00:13:43,600 --> 00:13:45,280
+good and bad cancer tumors or
+好的或坏的癌细胞
+
+398
+00:13:45,550 --> 00:13:47,390
+benign or malignant tumors we
+良性或恶性肿瘤我们
+
+399
+00:13:47,550 --> 00:13:49,270
+instead have diabetes or
+现在是有糖尿病或
+
+400
+00:13:49,330 --> 00:13:50,440
+not and so we will
+没有糖尿病 所以这是
+
+401
+00:13:50,700 --> 00:13:51,830
+use that as a supervised,
+有监督的学习问题
+
+402
+00:13:52,370 --> 00:13:53,740
+we will solve that as
+像处理那个乳腺癌的问题一样
+
+403
+00:13:53,870 --> 00:13:54,670
+a Supervised Learning problem just like
+我们会把它作为一个
+
+404
+00:13:54,730 --> 00:13:56,450
+we did for the breast tumor data.
+有监督的学习问题来处理
+
+405
+00:13:58,270 --> 00:13:59,400
+So, that's it for Unsupervised
+好了 关于无监督学习问题
+
+406
+00:14:00,100 --> 00:14:01,580
+Learning and in the
+就讲这么多了
+
+407
+00:14:01,650 --> 00:14:02,940
+next video, we'll delve more
+下一节课中我们
+
+408
+00:14:03,270 --> 00:14:04,600
+into specific learning algorithms
+会涉及到更具体的学习算法
+
+409
+00:14:05,550 --> 00:14:06,590
+and start to talk about
+并开始讨论
+
+410
+00:14:07,220 --> 00:14:08,750
+just how these algorithms work and
+这些算法是如何工作的
+
+411
+00:14:08,920 --> 00:14:11,270
+how we can, how you can go about implementing them.
+以及我们如何来实现它们
+
diff --git a/srt/10 - 1 - Deciding What to Try Next (6 min).srt b/srt/10 - 1 - Deciding What to Try Next (6 min).srt
new file mode 100644
index 00000000..a2ce9342
--- /dev/null
+++ b/srt/10 - 1 - Deciding What to Try Next (6 min).srt
@@ -0,0 +1,890 @@
+1
+00:00:00,300 --> 00:00:02,290
+By now you have seen a lot of different learning algorithms.
+到目前为止 我们已经介绍了许多不同的学习算法
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:03,330 --> 00:00:04,450
+And if you've been following along
+如果你一直跟着这些视频的进度学习
+
+3
+00:00:04,770 --> 00:00:06,030
+these videos you should consider
+你会发现自己已经不知不觉地
+
+4
+00:00:06,770 --> 00:00:09,530
+yourself an expert on many state-of-the-art machine learning techniques.
+成为一个了解许多先进机器学习技术的专家了
+
+5
+00:00:09,730 --> 00:00:12,310
+But even among
+然而 在懂机器学习的人当中
+
+6
+00:00:12,560 --> 00:00:14,460
+people that know a certain learning algorithm.
+依然存在着很大的差距
+
+7
+00:00:15,250 --> 00:00:16,830
+There's often a huge difference between
+一部分人确实掌握了
+
+8
+00:00:17,090 --> 00:00:18,240
+someone that really knows how
+怎样高效有力地
+
+9
+00:00:18,410 --> 00:00:20,130
+to powerfully and effectively apply
+运用这些学习算法
+
+10
+00:00:20,450 --> 00:00:22,270
+that algorithm, versus someone that's
+而另一些人
+
+11
+00:00:22,950 --> 00:00:24,090
+less familiar with some of
+他们可能对我马上要讲的东西
+
+12
+00:00:24,160 --> 00:00:25,080
+the material that I'm about
+就不是那么熟悉了
+
+13
+00:00:25,420 --> 00:00:26,900
+to teach and who doesn't really
+他们可能没有完全理解
+
+14
+00:00:27,080 --> 00:00:28,090
+understand how to apply these
+怎样运用这些算法
+
+15
+00:00:28,250 --> 00:00:29,180
+algorithms and can end up
+因此总是
+
+16
+00:00:29,570 --> 00:00:30,760
+wasting a lot of
+把时间浪费在
+
+17
+00:00:30,870 --> 00:00:33,320
+their time trying things out that don't really make sense.
+毫无意义的尝试上
+
+18
+00:00:34,380 --> 00:00:35,180
+What I would like to do is
+我想做的是
+
+19
+00:00:35,340 --> 00:00:36,350
+make sure that if you
+确保你在设计
+
+20
+00:00:36,560 --> 00:00:37,830
+are developing machine learning systems,
+机器学习的系统时
+
+21
+00:00:38,600 --> 00:00:39,780
+that you know how to choose
+你能够明白怎样选择
+
+22
+00:00:40,400 --> 00:00:42,900
+one of the most promising avenues to spend your time pursuing.
+一条最合适 最正确的道路
+
+23
+00:00:43,890 --> 00:00:45,050
+And on this and the next
+因此 在这节视频
+
+24
+00:00:45,190 --> 00:00:46,530
+few videos I'm going to
+和之后的几段视频中
+
+25
+00:00:46,750 --> 00:00:47,890
+give a number of practical
+我将向你介绍一些实用的
+
+26
+00:00:48,380 --> 00:00:51,150
+suggestions, advice, guidelines on how to do that.
+建议和指导 帮助你明白怎样进行选择
+
+27
+00:00:51,520 --> 00:00:53,410
+And concretely what we'd
+具体来讲
+
+28
+00:00:53,600 --> 00:00:54,460
+focus on is the problem
+我将重点关注的问题是
+
+29
+00:00:54,940 --> 00:00:56,380
+of, suppose you are
+假如你在开发
+
+30
+00:00:56,580 --> 00:00:57,760
+developing a machine learning system
+一个机器学习系统
+
+31
+00:00:58,390 --> 00:00:59,390
+or trying to improve the performance
+或者想试着改进
+
+32
+00:00:59,950 --> 00:01:01,810
+of a machine learning system, how
+一个机器学习系统的性能
+
+33
+00:01:02,000 --> 00:01:03,630
+do you go about deciding what are
+你应如何决定
+
+34
+00:01:03,700 --> 00:01:05,260
+the proxy avenues to try
+接下来应该
+
+35
+00:01:07,620 --> 00:01:07,620
+next?
+选择哪条道路?
+
+36
+00:01:09,300 --> 00:01:11,200
+To explain this, let's continue using
+为了解释这一问题
+
+37
+00:01:11,670 --> 00:01:13,210
+our example of learning to
+我想仍然使用
+
+38
+00:01:13,350 --> 00:01:15,280
+predict housing prices.
+预测房价的学习例子
+
+39
+00:01:15,570 --> 00:01:17,760
+And let's say you've implement and regularize linear regression.
+假如你已经完成了正则化线性回归
+
+40
+00:01:18,700 --> 00:01:20,090
+Thus minimizing that cost function
+也就是最小化代价函数J的值
+
+41
+00:01:20,520 --> 00:01:22,870
+j. Now suppose that
+假如
+
+42
+00:01:23,130 --> 00:01:24,310
+after you take your learn parameters,
+在你得到你的学习参数以后
+
+43
+00:01:24,820 --> 00:01:26,570
+if you test your hypothesis on
+如果你要将你的假设函数
+
+44
+00:01:26,720 --> 00:01:28,360
+the new set of houses, suppose you
+放到一组新的房屋样本上进行测试
+
+45
+00:01:28,540 --> 00:01:29,470
+find that this is making huge
+假如说你发现在预测房价时
+
+46
+00:01:29,860 --> 00:01:31,770
+errors in this prediction of the housing prices.
+产生了巨大的误差
+
+47
+00:01:33,220 --> 00:01:34,490
+The question is what should
+现在你的问题是 要想改进这个算法
+
+48
+00:01:34,670 --> 00:01:37,600
+you then try mixing in order to improve the learning algorithm?
+接下来应该怎么办?
+
+49
+00:01:39,000 --> 00:01:40,000
+There are many things that one
+实际上你可以想出
+
+50
+00:01:40,210 --> 00:01:41,460
+can think of that could improve
+很多种方法来改进
+
+51
+00:01:41,950 --> 00:01:43,660
+the performance of the learning algorithm.
+这个算法的性能
+
+52
+00:01:44,800 --> 00:01:47,510
+One thing they could try, is to get more training examples.
+其中一种办法是 使用更多的训练样本
+
+53
+00:01:48,060 --> 00:01:49,240
+And concretely, you can imagine, maybe, you
+具体来讲 也许你能想到
+
+54
+00:01:49,600 --> 00:01:51,150
+know, setting up phone surveys, going
+通过电话调查
+
+55
+00:01:51,570 --> 00:01:52,820
+door to door, to try to
+或上门调查
+
+56
+00:01:52,930 --> 00:01:54,050
+get more data on how much
+来获取更多的
+
+57
+00:01:54,310 --> 00:01:56,660
+different houses sell for.
+不同的房屋出售数据
+
+58
+00:01:57,730 --> 00:01:58,770
+And the sad thing is I've seen
+遗憾的是
+
+59
+00:01:59,010 --> 00:02:00,060
+a lot of people spend a
+我看到好多人花费了好多时间
+
+60
+00:02:00,200 --> 00:02:01,400
+lot of time collecting more training
+想收集更多的训练样本
+
+61
+00:02:01,760 --> 00:02:03,270
+examples, thinking oh, if we have
+他们总认为 噢 要是我有
+
+62
+00:02:03,760 --> 00:02:04,780
+twice as much or ten times
+两倍甚至十倍数量的训练数据
+
+63
+00:02:05,050 --> 00:02:07,250
+as much training data, that is certainly going to help, right?
+那就一定会解决问题的 是吧?
+
+64
+00:02:07,990 --> 00:02:09,020
+But sometimes getting more training
+但有时候 获得更多的训练数据
+
+65
+00:02:09,380 --> 00:02:10,680
+data doesn't actually help
+实际上并没有作用
+
+66
+00:02:11,240 --> 00:02:11,920
+and in the next few videos
+在接下来的几段视频中
+
+67
+00:02:12,430 --> 00:02:13,670
+we will see why, and we
+我们将解释原因
+
+68
+00:02:13,720 --> 00:02:15,270
+will see how you
+我们也将知道
+
+69
+00:02:15,500 --> 00:02:16,780
+can avoid spending a lot
+怎样避免把过多的时间
+
+70
+00:02:16,950 --> 00:02:18,160
+of time collecting more training data
+浪费在收集更多的训练数据上
+
+71
+00:02:18,910 --> 00:02:20,660
+in settings where it is just not going to help.
+这实际上是于事无补的
+
+72
+00:02:22,370 --> 00:02:23,510
+Other things you might try are
+另一个方法 你也许能想到的
+
+73
+00:02:23,730 --> 00:02:25,830
+to well maybe try a smaller set of features.
+是尝试选用更少的特征集
+
+74
+00:02:26,470 --> 00:02:27,270
+So if you have some set
+因此如果你有一系列特征
+
+75
+00:02:27,450 --> 00:02:29,030
+of features such as x1,
+比如x1
+
+76
+00:02:29,270 --> 00:02:30,330
+x2, x3 and so on,
+x2 x3等等
+
+77
+00:02:30,680 --> 00:02:31,840
+maybe a large number of features.
+也许有很多特征
+
+78
+00:02:32,570 --> 00:02:33,460
+Maybe you want to spend time
+也许你可以花一点时间
+
+79
+00:02:33,860 --> 00:02:35,240
+carefully selecting some small
+从这些特征中
+
+80
+00:02:35,590 --> 00:02:37,410
+subset of them to prevent overfitting.
+仔细挑选一小部分来防止过拟合
+
+81
+00:02:38,670 --> 00:02:40,730
+Or maybe you need to get additional features.
+或者也许你需要用更多的特征
+
+82
+00:02:41,330 --> 00:02:42,390
+Maybe the current set of features
+也许目前的特征集
+
+83
+00:02:42,570 --> 00:02:44,740
+aren't informative enough and you
+对你来讲并不是很有帮助
+
+84
+00:02:44,840 --> 00:02:47,460
+want to collect more data in the sense of getting more features.
+你希望从获取更多特征的角度 来收集更多的数据
+
+85
+00:02:48,510 --> 00:02:49,590
+And once again this is the
+同样地
+
+86
+00:02:49,730 --> 00:02:50,900
+sort of project that can scale
+你可以把这个问题
+
+87
+00:02:51,180 --> 00:02:52,260
+up the huge projects can you
+扩展为一个很大的项目
+
+88
+00:02:52,580 --> 00:02:54,110
+imagine getting phone
+比如使用电话调查
+
+89
+00:02:54,350 --> 00:02:55,280
+surveys to find out more
+来得到更多的房屋案例
+
+90
+00:02:55,490 --> 00:02:57,230
+houses, or extra land
+或者再进行土地测量
+
+91
+00:02:57,640 --> 00:02:58,620
+surveys to find out more
+来获得更多有关
+
+92
+00:02:58,800 --> 00:03:01,130
+about the pieces of land and so on, so a huge project.
+这块土地的信息等等 因此这是一个复杂的问题
+
+93
+00:03:01,690 --> 00:03:02,820
+And once again it would be
+同样的道理
+
+94
+00:03:02,930 --> 00:03:04,140
+nice to know in advance if
+我们非常希望
+
+95
+00:03:04,330 --> 00:03:05,210
+this is going to help before we
+在花费大量时间完成这些工作之前
+
+96
+00:03:05,760 --> 00:03:07,690
+spend a lot of time doing something like this.
+我们就能知道其效果如何
+
+97
+00:03:07,920 --> 00:03:09,390
+We can also try
+我们也可以尝试
+
+98
+00:03:10,360 --> 00:03:12,100
+adding polynomial features things like
+增加多项式特征的方法
+
+99
+00:03:12,180 --> 00:03:13,100
+x2 square x2 square and product
+比如x1的平方 x2的平方
+
+100
+00:03:13,860 --> 00:03:14,700
+features x1, x2.
+x1 x2的乘积
+
+101
+00:03:14,930 --> 00:03:16,040
+We can still spend quite a
+我们可以花很多时间
+
+102
+00:03:16,180 --> 00:03:17,830
+lot of time thinking about that and
+来考虑这一方法
+
+103
+00:03:18,270 --> 00:03:19,340
+we can also try other things like
+我们也可以考虑其他方法
+
+104
+00:03:19,540 --> 00:03:21,390
+decreasing lambda, the regularization parameter or increasing lambda.
+减小或增大正则化参数lambda的值
+
+105
+00:03:23,840 --> 00:03:25,160
+Given a menu of options
+我们列出的这个单子
+
+106
+00:03:25,520 --> 00:03:26,680
+like these, some of which
+上面的很多方法
+
+107
+00:03:26,970 --> 00:03:28,240
+can easily scale up to
+都可以扩展开来
+
+108
+00:03:28,950 --> 00:03:30,000
+six month or longer projects.
+扩展成一个六个月或更长时间的项目
+
+109
+00:03:31,310 --> 00:03:32,660
+Unfortunately, the most common
+遗憾的是
+
+110
+00:03:32,760 --> 00:03:34,010
+method that people use to
+大多数人用来选择这些方法的标准
+
+111
+00:03:34,170 --> 00:03:36,040
+pick one of these is to go by gut feeling.
+是凭感觉
+
+112
+00:03:36,520 --> 00:03:37,670
+In which what many people
+也就是说
+
+113
+00:03:38,170 --> 00:03:39,520
+will do is sort of randomly
+大多数人的选择方法是
+
+114
+00:03:39,940 --> 00:03:41,100
+pick one of these options and
+随便从这些方法中选择一种
+
+115
+00:03:41,250 --> 00:03:43,050
+maybe say, "Oh, lets go and get more training data."
+比如他们会说 “噢 我们来多找点数据吧”
+
+116
+00:03:43,980 --> 00:03:45,480
+And easily spend six months collecting
+然后花上六个月的时间
+
+117
+00:03:45,880 --> 00:03:47,540
+more training data or maybe someone
+收集了一大堆数据
+
+118
+00:03:47,780 --> 00:03:48,860
+else would rather be saying, "Well,
+然后也许另一个人说
+
+119
+00:03:49,430 --> 00:03:51,810
+let's go collect a lot more features on these houses in our data set."
+“好吧 让我们来从这些房子的数据中多找点特征吧”
+
+120
+00:03:52,780 --> 00:03:54,010
+And I have a lot
+我很遗憾不止一次地看到
+
+121
+00:03:54,220 --> 00:03:55,870
+of times, sadly seen people spend, you know,
+很多人花了
+
+122
+00:03:56,630 --> 00:03:58,360
+literally 6 months doing one
+不夸张地说 至少六个月时间
+
+123
+00:03:58,530 --> 00:03:59,680
+of these avenues that they have
+来完成他们随便选择的一种方法
+
+124
+00:04:00,240 --> 00:04:01,810
+sort of at random only to
+而在六个月或者更长时间后
+
+125
+00:04:01,920 --> 00:04:03,220
+discover six months later that
+他们很遗憾地发现
+
+126
+00:04:03,460 --> 00:04:05,610
+that really wasn't a promising avenue to pursue.
+自己选择的是一条不归路
+
+127
+00:04:07,090 --> 00:04:08,170
+Fortunately, there is a
+幸运的是
+
+128
+00:04:08,310 --> 00:04:10,650
+pretty simple technique that can
+有一系列简单的方法
+
+129
+00:04:10,930 --> 00:04:12,640
+let you very quickly rule
+能让你事半功倍
+
+130
+00:04:12,900 --> 00:04:14,190
+out half of the things
+排除掉单子上的
+
+131
+00:04:14,500 --> 00:04:16,440
+on this list as being potentially
+至少一半的方法
+
+132
+00:04:16,970 --> 00:04:17,990
+promising things to pursue.
+留下那些确实有前途的方法
+
+133
+00:04:18,390 --> 00:04:19,310
+And there is a very simple technique,
+同时也有一种很简单的方法
+
+134
+00:04:19,830 --> 00:04:21,080
+that if you run, can easily
+只要你使用
+
+135
+00:04:21,710 --> 00:04:22,820
+rule out many of these options,
+就能很轻松地排除掉很多选择
+
+136
+00:04:24,120 --> 00:04:25,470
+and potentially save you
+从而为你节省
+
+137
+00:04:25,580 --> 00:04:28,600
+a lot of time pursuing something that's just is not going to work.
+大量不必要花费的时间
+
+138
+00:04:29,610 --> 00:04:30,950
+In the next two videos
+在接下来的两段视频中
+
+139
+00:04:31,320 --> 00:04:32,450
+after this, I'm going to
+我首先介绍
+
+140
+00:04:32,560 --> 00:04:35,420
+first talk about how to evaluate learning algorithms.
+怎样评估机器学习算法的性能
+
+141
+00:04:36,540 --> 00:04:37,810
+And in the next few
+然后在之后的几段视频中
+
+142
+00:04:38,080 --> 00:04:39,770
+videos after that, I'm
+我将开始
+
+143
+00:04:40,070 --> 00:04:41,130
+going to talk about these techniques,
+讨论这些方法
+
+144
+00:04:42,470 --> 00:04:44,270
+which are called the machine learning diagnostics.
+它们也被称为"机器学习诊断法"
+
+145
+00:04:46,690 --> 00:04:47,980
+And what a diagnostic is, is
+“诊断法”的意思是
+
+146
+00:04:48,120 --> 00:04:49,080
+a test you can run,
+这是一种测试法
+
+147
+00:04:49,900 --> 00:04:52,240
+to get insight into what
+你通过执行这种测试
+
+148
+00:04:52,430 --> 00:04:53,740
+is or isn't working with
+能够深入了解
+
+149
+00:04:54,130 --> 00:04:55,810
+an algorithm, and which will
+某种算法到底是否有用
+
+150
+00:04:56,070 --> 00:04:57,720
+often give you insight as to
+这通常也能够告诉你
+
+151
+00:04:57,940 --> 00:04:59,360
+what are promising things to try
+要想改进一种算法的效果
+
+152
+00:04:59,920 --> 00:05:01,100
+to improve a learning algorithm's
+什么样的尝试
+
+153
+00:05:03,910 --> 00:05:03,910
+performance.
+才是有意义的
+
+154
+00:05:04,730 --> 00:05:07,140
+We'll talk about specific diagnostics later in this video sequence.
+在这一系列的视频中我们将介绍具体的诊断法
+
+155
+00:05:08,050 --> 00:05:09,230
+But I should mention in advance
+但我要提前说明一点的是
+
+156
+00:05:09,440 --> 00:05:10,780
+that diagnostics can take
+这些诊断法的执行和实现
+
+157
+00:05:11,100 --> 00:05:12,280
+time to implement and can sometimes,
+是需要花些时间的
+
+158
+00:05:12,820 --> 00:05:14,300
+you know, take quite a
+有时候
+
+159
+00:05:14,340 --> 00:05:15,610
+lot of time to implement and
+确实需要花很多时间
+
+160
+00:05:15,740 --> 00:05:17,120
+understand but doing so
+来理解和实现
+
+161
+00:05:17,410 --> 00:05:18,330
+can be a very good use
+但这样做的确是
+
+162
+00:05:18,610 --> 00:05:19,380
+of your time when you are
+把时间用在了刀刃上
+
+163
+00:05:19,660 --> 00:05:21,460
+developing learning algorithms because they
+因为这些方法
+
+164
+00:05:21,560 --> 00:05:22,660
+can often save you from
+让你在开发学习算法时
+
+165
+00:05:22,880 --> 00:05:24,670
+spending many months pursuing an
+节省了几个月的时间
+
+166
+00:05:24,840 --> 00:05:26,580
+avenue that you could
+早点从不必要的尝试中解脱出来
+
+167
+00:05:26,870 --> 00:05:29,460
+have found out much earlier just was not going to be fruitful.
+早日脱离苦海
+
+168
+00:05:32,220 --> 00:05:33,070
+So in the next few
+因此
+
+169
+00:05:33,250 --> 00:05:34,250
+videos, I'm going to first
+在接下来几节课中
+
+170
+00:05:34,570 --> 00:05:36,220
+talk about how evaluate your
+我将先来介绍
+
+171
+00:05:36,450 --> 00:05:38,210
+learning algorithms and after
+如何评价你的学习算法
+
+172
+00:05:38,410 --> 00:05:39,210
+that I'm going to talk
+在此之后
+
+173
+00:05:39,300 --> 00:05:41,490
+about some of these diagnostics which will hopefully
+我将介绍一些诊断法
+
+174
+00:05:41,810 --> 00:05:42,950
+let you much more
+希望能让你更清楚
+
+175
+00:05:43,110 --> 00:05:44,470
+effectively select more of the
+在接下来的尝试中
+
+176
+00:05:44,770 --> 00:05:45,880
+useful things to try mixing
+如何选择更有意义的方法
+
+177
+00:05:46,560 --> 00:05:48,200
+if your goal to improve
+最终达到改进机器学习系统性能的目的
+
+178
+00:05:48,760 --> 00:05:50,430
+the machine learning system.
+
diff --git a/srt/10 - 2 - Evaluating a Hypothesis (8 min).srt b/srt/10 - 2 - Evaluating a Hypothesis (8 min).srt
new file mode 100644
index 00000000..3fd0f9da
--- /dev/null
+++ b/srt/10 - 2 - Evaluating a Hypothesis (8 min).srt
@@ -0,0 +1,711 @@
+1
+00:00:00,146 --> 00:00:02,515
+In this video, I would like to talk about how to
+在本节视频中我想介绍一下
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,523 --> 00:00:06,662
+evaluate a hypothesis that has been learned by your algorithm.
+怎样评价通过你的学习算法得到的一个假设
+
+3
+00:00:06,685 --> 00:00:09,200
+In later videos, we will build on this
+基于这节课的讨论 在之后的视频中
+
+4
+00:00:09,231 --> 00:00:11,846
+to talk about how to prevent in the problems of
+我们还将讨论如何防止
+
+5
+00:00:11,869 --> 00:00:14,908
+overfitting and underfitting as well.
+过拟合和欠拟合的问题
+
+6
+00:00:15,615 --> 00:00:19,023
+When we fit the parameters of our learning algorithm
+当我们确定学习算法的参数时
+
+7
+00:00:19,038 --> 00:00:23,154
+we think about choosing the parameters to minimize the training error.
+我们考虑的是选择参数来使训练误差最小化
+
+8
+00:00:23,169 --> 00:00:26,077
+One might think that getting a really low value of
+有人认为 得到一个很小的训练误差
+
+9
+00:00:26,100 --> 00:00:28,108
+training error might be a good thing,
+一定是一件好事
+
+10
+00:00:28,108 --> 00:00:29,562
+but we have already seen that
+但我们已经知道
+
+11
+00:00:29,562 --> 00:00:32,400
+just because a hypothesis has low training error,
+仅仅是因为这个假设具有很小的训练误差
+
+12
+00:00:32,400 --> 00:00:35,254
+that doesn't mean it is necessarily a good hypothesis.
+并不能说明它一定是一个好的假设
+
+13
+00:00:35,254 --> 00:00:40,223
+And we've already seen the example of how a hypothesis can overfit.
+我们也学习了过拟合假设的例子
+
+14
+00:00:40,415 --> 00:00:45,785
+And therefore fail to generalize the new examples not in the training set.
+这时推广到新的训练样本上就不灵了
+
+15
+00:00:45,962 --> 00:00:50,000
+So how do you tell if the hypothesis might be overfitting.
+那么 你怎样判断一个假设是否是过拟合的呢
+
+16
+00:00:50,015 --> 00:00:54,346
+In this simple example we could plot the hypothesis h of x
+对于这个简单的例子 我们可以
+
+17
+00:00:54,365 --> 00:00:56,338
+and just see what was going on.
+画出假设函数h(x) 然后观察
+
+18
+00:00:56,346 --> 00:01:00,538
+But in general for problems with more features than just one feature,
+但对于更一般的情况 特征不止一个的例子
+
+19
+00:01:00,554 --> 00:01:03,531
+for problems with a large number of features like these
+就像这样有很多特征的问题
+
+20
+00:01:03,546 --> 00:01:06,692
+it becomes hard or may be impossible
+想要通过画出假设函数来观察
+
+21
+00:01:06,708 --> 00:01:09,515
+to plot what the hypothesis looks like
+就变得很难甚至不可能了
+
+22
+00:01:09,531 --> 00:01:13,046
+and so we need some other way to evaluate our hypothesis.
+因此 我们需要另一种评价假设函数的方法
+
+23
+00:01:13,062 --> 00:01:17,315
+The standard way to evaluate a learned hypothesis is as follows.
+如下给出了一种评价假设的标准方法
+
+24
+00:01:17,331 --> 00:01:19,308
+Suppose we have a data set like this.
+假如我们有这样一组数据组
+
+25
+00:01:19,323 --> 00:01:21,977
+Here I have just shown 10 training examples,
+在这里我只展示了10组训练样本
+
+26
+00:01:21,992 --> 00:01:23,969
+but of course usually we may have
+当然我们通常可以有
+
+27
+00:01:23,985 --> 00:01:27,254
+dozens or hundreds or maybe thousands of training examples.
+成百上千组训练样本
+
+28
+00:01:27,269 --> 00:01:30,246
+In order to make sure we can evaluate our hypothesis,
+为了确保我们可以评价我们的假设函数
+
+29
+00:01:30,262 --> 00:01:32,808
+what we are going to do is split
+我要做的是
+
+30
+00:01:32,823 --> 00:01:35,554
+the data we have into two portions.
+将这些数据分成两部分
+
+31
+00:01:35,569 --> 00:01:40,723
+The first portion is going to be our usual training set
+第一部分将成为我们的训练集
+
+32
+00:01:42,638 --> 00:01:47,446
+and the second portion is going to be our test set,
+第二部分将成为我们的测试集
+
+33
+00:01:47,462 --> 00:01:50,398
+and a pretty typical split of this
+将所有数据分成训练集和测试集
+
+34
+00:01:50,413 --> 00:01:53,482
+all the data we have into a training set and test set
+其中一种典型的分割方法是
+
+35
+00:01:53,498 --> 00:01:57,936
+might be around say a 70%, 30% split.
+按照7:3的比例
+
+36
+00:01:57,952 --> 00:02:00,052
+Worth more today to grade the training set
+将70%的数据作为训练集
+
+37
+00:02:00,067 --> 00:02:02,367
+and relatively less to the test set.
+30%的数据作为测试集
+
+38
+00:02:02,382 --> 00:02:05,782
+And so now, if we have some data set,
+因此 现在如果我们有了一些数据
+
+39
+00:02:05,790 --> 00:02:08,459
+we run a sine of say 70%
+我们只用其中的70%
+
+40
+00:02:08,475 --> 00:02:11,529
+of the data to be our training set where here "m"
+作为我们的训练集
+
+41
+00:02:11,544 --> 00:02:14,336
+is as usual our number of training examples
+这里的m依然表示训练样本的总数
+
+42
+00:02:14,352 --> 00:02:16,913
+and the remainder of our data
+而剩下的那部分数据
+
+43
+00:02:16,929 --> 00:02:19,310
+might then be assigned to become our test set.
+将被用作测试集
+
+44
+00:02:19,325 --> 00:02:23,410
+And here, I'm going to use the notation m subscript test
+在这里 我使用m下标test
+
+45
+00:02:23,425 --> 00:02:27,187
+to denote the number of test examples.
+来表示测试样本的总数
+
+46
+00:02:27,202 --> 00:02:32,225
+And so in general, this subscript test is going to denote
+因此 这里的下标test将表示
+
+47
+00:02:32,241 --> 00:02:34,987
+examples that come from a test set so that
+这些样本是来自测试集
+
+48
+00:02:35,002 --> 00:02:40,810
+x1 subscript test, y1 subscript test is my first
+因此x(1)test y(1)test将成为我的
+
+49
+00:02:40,825 --> 00:02:43,648
+test example which I guess in this example
+第一组测试样本
+
+50
+00:02:43,664 --> 00:02:45,656
+might be this example over here.
+我想应该是这里的这一组样本
+
+51
+00:02:45,671 --> 00:02:47,495
+Finally, one last detail
+最后再提醒一点
+
+52
+00:02:47,510 --> 00:02:50,795
+whereas here I've drawn this as though the first 70%
+在这里我是选择了前70%的数据作为训练集
+
+53
+00:02:50,810 --> 00:02:54,479
+goes to the training set and the last 30% to the test set.
+后30%的数据作为测试集
+
+54
+00:02:54,495 --> 00:02:57,518
+If there is any sort of ordinary to the data.
+但如果这组数据有某种规律或顺序的话
+
+55
+00:02:57,533 --> 00:03:01,048
+That should be better to send a random 70%
+那么最好是
+
+56
+00:03:01,048 --> 00:03:02,948
+of your data to the training set and a
+随机选择70%作为训练集
+
+57
+00:03:02,964 --> 00:03:05,556
+random 30% of your data to the test set.
+剩下的30%作为测试集
+
+58
+00:03:05,571 --> 00:03:08,579
+So if your data were already randomly sorted,
+当然如果你的数据已经随机分布了
+
+59
+00:03:08,595 --> 00:03:12,110
+you could just take the first 70% and last 30%
+那你可以选择前70%和后30%
+
+60
+00:03:12,125 --> 00:03:14,718
+that if your data were not randomly ordered,
+但如果你的数据不是随机排列的
+
+61
+00:03:14,733 --> 00:03:16,756
+it would be better to randomly shuffle or
+最好还是打乱顺序
+
+62
+00:03:16,771 --> 00:03:19,718
+to randomly reorder the examples in your training set.
+或者使用一种随机的顺序来构建你的数据
+
+63
+00:03:19,733 --> 00:03:23,310
+Before you know sending the first 70% in the training set
+然后再取出前70%作为训练集
+
+64
+00:03:23,325 --> 00:03:26,669
+and the last 30% of the test set.
+后30%作为测试集
+
+65
+00:03:27,054 --> 00:03:30,169
+Here then is a fairly typical procedure
+接下来 这里展示了一种典型的方法
+
+66
+00:03:30,185 --> 00:03:32,008
+for how you would train and test
+你可以按照这些步骤训练和测试你的学习算法
+
+67
+00:03:32,023 --> 00:03:34,492
+the learning algorithm and the learning regression.
+比如线性回归算法
+
+68
+00:03:34,508 --> 00:03:38,115
+First, you learn the parameters theta from the training set
+首先 你需要对训练集进行学习得到参数theta
+
+69
+00:03:38,131 --> 00:03:41,798
+so you minimize the usual training error objective j of theta,
+具体来讲就是最小化训练误差J(θ)
+
+70
+00:03:41,813 --> 00:03:44,713
+where j of theta here was defined using that
+这里的J(θ)是使用那70%数据
+
+71
+00:03:44,729 --> 00:03:47,059
+70% of all the data you have.
+来定义得到的
+
+72
+00:03:47,075 --> 00:03:49,759
+There is only the training data.
+也就是仅仅是训练数据
+
+73
+00:03:49,882 --> 00:03:52,167
+And then you would compute the test error.
+接下来 你要计算出测试误差
+
+74
+00:03:52,182 --> 00:03:56,298
+And I am going to denote the test error as j subscript test.
+我将用J下标test来表示测试误差
+
+75
+00:03:56,313 --> 00:03:59,229
+And so what you do is take your parameter theta
+那么你要做的就是
+
+76
+00:03:59,259 --> 00:04:02,190
+that you have learned from the training set, and plug it in here
+取出你之前从训练集中学习得到的参数theta放在这里
+
+77
+00:04:02,205 --> 00:04:04,875
+and compute your test set error.
+来计算你的测试误差
+
+78
+00:04:04,890 --> 00:04:08,529
+Which I am going to write as follows.
+可以写成如下的形式
+
+79
+00:04:08,698 --> 00:04:11,275
+So this is basically
+这实际上是测试集
+
+80
+00:04:11,290 --> 00:04:15,244
+the average squared error
+平方误差的
+
+81
+00:04:15,269 --> 00:04:18,154
+as measured on your test set.
+平均值
+
+82
+00:04:18,169 --> 00:04:19,915
+It's pretty much what you'd expect.
+这就是你期望得到的值
+
+83
+00:04:19,931 --> 00:04:23,415
+So if we run every test example through your hypothesis
+因此 我们使用包含参数theta的假设函数对每一个测试样本进行测试
+
+84
+00:04:23,431 --> 00:04:28,008
+with parameter theta and just measure the squared error
+然后通过假设函数和测试样本
+
+85
+00:04:28,023 --> 00:04:33,338
+that your hypothesis has on your m subscript test, test examples.
+计算出mtest个平方误差
+
+86
+00:04:33,354 --> 00:04:37,054
+And of course, this is the definition of the
+当然 这是当我们使用线性回归
+
+87
+00:04:37,069 --> 00:04:40,815
+test set error if we are using linear regression
+和平方误差标准时
+
+88
+00:04:40,831 --> 00:04:44,362
+and using the squared error metric.
+测试误差的定义
+
+89
+00:04:44,377 --> 00:04:47,477
+How about if we were doing a classification problem
+那么如果是考虑分类问题
+
+90
+00:04:47,492 --> 00:04:50,654
+and say using logistic regression instead.
+比如说使用逻辑回归的时候呢
+
+91
+00:04:50,669 --> 00:04:53,877
+In that case, the procedure for training
+训练和测试逻辑回归的步骤
+
+92
+00:04:53,892 --> 00:04:57,085
+and testing say logistic regression is pretty similar
+与之前所说的非常类似
+
+93
+00:04:57,100 --> 00:04:59,985
+first we will do the parameters from the training data,
+首先我们要从训练数据 也就是所有数据的70%中
+
+94
+00:05:00,000 --> 00:05:02,331
+that first 70% of the data.
+学习得到参数theta
+
+95
+00:05:02,346 --> 00:05:05,115
+And it will compute the test error as follows.
+然后用如下的方式计算测试误差
+
+96
+00:05:05,131 --> 00:05:07,015
+It's the same objective function
+目标函数和我们平常
+
+97
+00:05:07,031 --> 00:05:09,592
+as we always use but we just logistic regression,
+做逻辑回归的一样
+
+98
+00:05:09,608 --> 00:05:11,569
+except that now is define using
+唯一的区别是
+
+99
+00:05:11,585 --> 00:05:15,115
+our m subscript test, test examples.
+现在我们使用的是mtest个测试样本
+
+100
+00:05:15,131 --> 00:05:17,600
+While this definition of the test set error
+这里的测试误差Jtest(θ)
+
+101
+00:05:17,631 --> 00:05:20,238
+j subscript test is perfectly reasonable.
+其实不难理解
+
+102
+00:05:20,254 --> 00:05:22,231
+Sometimes there is an alternative
+有时这是另一种形式的测试集
+
+103
+00:05:22,246 --> 00:05:25,469
+test sets metric that might be easier to interpret,
+更易于理解
+
+104
+00:05:25,485 --> 00:05:27,877
+and that's the misclassification error.
+这里的误差其实叫误分类率
+
+105
+00:05:27,892 --> 00:05:30,792
+It's also called the zero one misclassification error,
+也被称为0/1错分率
+
+106
+00:05:30,808 --> 00:05:32,692
+with zero one denoting that
+0/1表示了
+
+107
+00:05:32,708 --> 00:05:36,146
+you either get an example right or you get an example wrong.
+你预测到的正确或错误样本的情况
+
+108
+00:05:36,162 --> 00:05:37,910
+Here's what I mean.
+我想说的是这个意思
+
+109
+00:05:37,925 --> 00:05:41,795
+Let me define the error of a prediction.
+可以这样定义一次预测的误差
+
+110
+00:05:41,825 --> 00:05:44,202
+That is h of x.
+关于假设h(x)
+
+111
+00:05:44,218 --> 00:05:47,518
+And given the label y as
+和标签y的误差
+
+112
+00:05:47,533 --> 00:05:51,848
+equal to one if my hypothesis
+那么这个误差等于1
+
+113
+00:05:51,864 --> 00:05:54,633
+outputs the value greater than equal to five
+当你的假设函数h(x)的值大于等于0.5
+
+114
+00:05:54,641 --> 00:05:57,510
+and Y is equal to zero
+并且y的值等于0
+
+115
+00:05:57,525 --> 00:06:03,718
+or if my hypothesis outputs a value of less than 0.5
+或者当h(x)小于0.5
+
+116
+00:06:03,733 --> 00:06:05,402
+and y is equal to one,
+并且y的值等于1
+
+117
+00:06:05,418 --> 00:06:08,118
+right, so both of these cases basic respond
+因此 这两种情况都表明
+
+118
+00:06:08,133 --> 00:06:11,833
+to if your hypothesis mislabeled the example
+你的假设对样本进行了误判
+
+119
+00:06:11,833 --> 00:06:14,518
+assuming your threshold at an 0.5.
+这里定义阈值为0.5
+
+120
+00:06:14,533 --> 00:06:18,171
+So either thought it was more likely to be 1, but it was actually 0,
+那么也就是说 假设结果更趋向于1 但实际是0
+
+121
+00:06:18,187 --> 00:06:20,733
+or your hypothesis stored was more likely
+或者说假设更趋向于0
+
+122
+00:06:20,748 --> 00:06:23,556
+to be 0, but the label was actually 1.
+但实际的标签却是1
+
+123
+00:06:23,571 --> 00:06:28,471
+And otherwise, we define this error function to be zero.
+否则 我们将误差值定义为0
+
+124
+00:06:28,487 --> 00:06:34,841
+If your hypothesis basically classified the example y correctly.
+此时你的假设值能够正确对样本y进行分类
+
+125
+00:06:34,864 --> 00:06:38,841
+We could then define the test error,
+然后 我们就能应用错分率误差
+
+126
+00:06:38,856 --> 00:06:42,371
+using the misclassification error metric to be
+来定义测试误差
+
+127
+00:06:42,387 --> 00:06:46,779
+one of the m tests of sum from i equals one
+也就是1/mtest 乘以
+
+128
+00:06:46,795 --> 00:06:49,941
+to m subscript test of the
+h(i)(xtest)和y(i)的错分率误差
+
+129
+00:06:49,956 --> 00:06:55,164
+error of h of x(i) test
+从i=1到mtest
+
+130
+00:06:55,179 --> 00:06:57,971
+comma y(i).
+的求和
+
+131
+00:06:57,987 --> 00:07:02,010
+And so that's just my way of writing out that this is exactly
+这样我就写出了我的定义方式
+
+132
+00:07:02,025 --> 00:07:05,587
+the fraction of the examples in my test set
+这实际上就是我的假设函数误标记的
+
+133
+00:07:05,602 --> 00:07:08,864
+that my hypothesis has mislabeled.
+那部分测试集中的样本
+
+134
+00:07:08,871 --> 00:07:10,602
+And so that's the definition of
+这也就是使用
+
+135
+00:07:10,618 --> 00:07:13,687
+the test set error using the misclassification error
+0/1错分率或误分类率
+
+136
+00:07:13,718 --> 00:07:16,948
+of the 0 1 misclassification metric.
+的准则来定义的测试误差
+
+137
+00:07:16,971 --> 00:07:19,995
+So that's the standard technique for evaluating
+以上我们介绍了一套标准技术
+
+138
+00:07:20,010 --> 00:07:22,833
+how good a learned hypothesis is.
+来评价一个已经学习过的假设
+
+139
+00:07:22,848 --> 00:07:25,579
+In the next video, we will adapt these ideas
+在下一段视频中我们要应用这些方法
+
+140
+00:07:25,595 --> 00:07:28,525
+to helping us do things like choose what features
+来帮助我们进行诸如特征选择一类的问题
+
+141
+00:07:28,541 --> 00:07:31,641
+like the degree polynomial to use with the learning algorithm
+比如多项式次数的选择
+
+142
+00:07:31,656 --> 00:07:34,964
+or choose the regularization parameter for learning algorithm.
+或者正则化参数的选择
+
diff --git a/srt/10 - 3 - Model Selection and Traination.srt b/srt/10 - 3 - Model Selection and Traination.srt
new file mode 100644
index 00000000..3034ea3e
--- /dev/null
+++ b/srt/10 - 3 - Model Selection and Traination.srt
@@ -0,0 +1,913 @@
+1
+00:00:00,160 --> 00:00:04,570
+Suppose you're left to decide what degree of polynomial to fit to a data set.
+假如你想要确定对于某组数据最合适的多项式次数是几次(字幕翻译:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:04,570 --> 00:00:08,750
+So that what features to include that gives you a learning algorithm.
+怎样选用正确的特征来构造学习算法
+
+3
+00:00:08,750 --> 00:00:13,160
+Or suppose you'd like to choose the regularization parameter longer for
+或者假如你需要正确选择学习算法中的正则化参数lambda
+
+4
+00:00:13,160 --> 00:00:14,550
+learning algorithm.
+
+5
+00:00:14,550 --> 00:00:15,830
+How do you do that?
+你应该怎样做呢?
+
+6
+00:00:15,830 --> 00:00:17,510
+This account model selection process.
+这些问题我们称之为模型选择问题
+
+7
+00:00:17,510 --> 00:00:22,411
+Browsers, and in our discussion of how to do this, we'll talk about not just how to
+在我们对于这一问题的讨论中,我们还将提到不仅
+
+8
+00:00:22,411 --> 00:00:27,031
+split your data into the train and test sets, but how to switch data into what we
+是把你的数据分为:训练集和测试集,而且是如何将数据分为三个数据组:
+
+9
+00:00:27,031 --> 00:00:31,020
+discover is called the train,validation, and test sets.
+也就是训练集、验证集和测试集
+
+10
+00:00:31,020 --> 00:00:33,860
+We'll see in this video just what these things are, and
+在这节视频中我们将会介绍这些内容的含义
+
+11
+00:00:33,860 --> 00:00:36,890
+how to use them to do model selection.
+以及如何使用它们进行模型选择。
+
+12
+00:00:36,890 --> 00:00:40,350
+We've already seen a lot of times the problem of overfitting,
+在前面的学习中,我们已经多次接触到过拟合现象
+
+13
+00:00:40,350 --> 00:00:44,380
+in which just because a learning algorithm fits a training set well,
+在过拟合的情况中学习算法在适用于训练集时表现非常完美,
+
+14
+00:00:44,380 --> 00:00:47,490
+that doesn't mean it's a good hypothesis.
+但这并不代表此时的假设也很完美
+
+15
+00:00:47,490 --> 00:00:52,290
+More generally, this is why the training set's error is not a good predictor for
+更普遍地说,这也是为什么训练集误差通常不能正确预测出
+
+16
+00:00:52,290 --> 00:00:56,000
+how well the hypothesis will do on new example.
+该假设是否能很好地拟合新样本的原因。
+
+17
+00:00:56,000 --> 00:00:59,150
+Concretely, if you fit some set of parameters.
+具体来讲,如果你把这些参数集
+
+18
+00:00:59,150 --> 00:01:02,810
+Theta0, theta1, theta2, and so on, to your training set.
+比如theta1 theta2等等,调整到非常拟合你的训练集
+
+19
+00:01:02,810 --> 00:01:07,210
+Then the fact that your hypothesis does well on the training set.
+那么结果就是你的假设会在训练集上表现地很好。
+
+20
+00:01:07,210 --> 00:01:11,630
+Well, this doesn't mean much in terms of predicting how well your hypothesis will
+但这并不能确定当你的假设推广到训练集之外的新的样本上时
+
+21
+00:01:11,630 --> 00:01:15,970
+generalize to new examples not seen in the training set.
+预测的结果是怎样的
+
+22
+00:01:15,970 --> 00:01:18,730
+And a more general principle is that once your
+而更为普遍的规律是只要
+
+23
+00:01:18,730 --> 00:01:21,290
+parameter is what fit to some set of data.
+你的参数非常拟合某个数据组。
+
+24
+00:01:21,290 --> 00:01:23,720
+Maybe the training set,maybe something else.
+比如说非常拟合训练集,当然也可以是其他数据集。
+
+25
+00:01:23,720 --> 00:01:27,900
+Then the error of your hypothesis as measured on that same data set,
+那么你的假设对于相同数据组的预测其预测误差
+
+26
+00:01:27,900 --> 00:01:29,600
+such as the training error,
+比如说训练误差,
+
+27
+00:01:29,600 --> 00:01:33,380
+that's unlikely to be a good estimate of your actual generalization error.
+是不能够用来推广到一般情况的,或者说是不能作为实际的泛化误差的。
+
+28
+00:01:33,380 --> 00:01:38,330
+That is how well the hypothesis will generalize to new examples.
+也就是说,不能说明你的假设对于新样本的效果。
+
+29
+00:01:38,330 --> 00:01:41,170
+Now let's consider the model selection problem.
+下面我们来考虑模型选择问题。
+
+30
+00:01:41,170 --> 00:01:45,100
+Let's say you're trying to choose what degree polynomial to fit to data.
+假如说你现在要选择能最好地拟合你数据的多项式次数。
+
+31
+00:01:45,100 --> 00:01:48,800
+So, should you choose a linear function, a quadratic function, a cubic function?
+换句话说 你应该选择一次函数、二次函数还是三次函数呢?
+
+32
+00:01:48,800 --> 00:01:50,740
+All the way up to a 10th-order polynomial.
+等等一直到十次函数
+
+33
+00:01:51,940 --> 00:01:55,810
+So it's as if there's one extra parameter in this algorithm,
+所以似乎在这个算法里应该有这样一个参数,
+
+34
+00:01:55,810 --> 00:02:01,210
+which I'm going to denote d, which is, what degree of polynomial.
+这里我用d来表示你应该选择的多项式次数。
+
+35
+00:02:01,210 --> 00:02:02,310
+Do you want to pick.
+
+36
+00:02:02,310 --> 00:02:06,610
+So it's as if, in addition to the theta parameters, it's as if
+所以,你还要考虑确定一个参数似乎除了你要确定的参数theta之外你还要考虑确定一个参数
+
+37
+00:02:06,610 --> 00:02:10,680
+there's one more parameter, d, that you're trying to determine using your data set.
+你同样需要用你的数据组来确定这个多项式的次数d。
+
+38
+00:02:10,680 --> 00:02:14,940
+So, the first option is d equals one, if you fit a linear function.
+第一个选择是d=1,也就表示线性(一次)方程
+
+39
+00:02:14,940 --> 00:02:19,760
+We can choose d equals two, d equals three, all the way up to d equals 10.
+我们也可以选择d=2或者3等等,一直到d=10
+
+40
+00:02:19,760 --> 00:02:24,830
+So, we'd like to fit this extra sort of parameter which I'm denoting by d.
+因此 我们想确定这个多出来的参数d最适当的取值
+
+41
+00:02:24,830 --> 00:02:28,880
+And concretely let's say that you want to choose a model,
+具体地说,比如你想要选择一个模型,
+
+42
+00:02:28,880 --> 00:02:33,110
+that is choose a degree of polynomial, choose one of these 10 models.
+那就从这10个模型中选择一个最适当的多项式次数
+
+43
+00:02:33,110 --> 00:02:37,920
+And fit that model and also get some estimate of how well your
+并且用这个模型进行估测
+
+44
+00:02:37,920 --> 00:02:42,570
+fitted hypothesis was generalize to new examples.
+预测你的假设能否很好地推广到新的样本上
+
+45
+00:02:42,570 --> 00:02:44,130
+Here's one thing you could do.
+那么你可以这样做
+
+46
+00:02:44,130 --> 00:02:49,790
+What you could, first take your first model and minimize the training error.
+你可以先选择第一个模型,然后求训练误差的最小值
+
+47
+00:02:49,790 --> 00:02:53,260
+And this would give you some parameter vector theta.
+这样你就会得到一个参数向量 theta
+
+48
+00:02:53,260 --> 00:02:58,600
+And you could then take your second model,the quadratic function, and
+然后你再选择第二个模型,二次函数模型
+
+49
+00:02:58,600 --> 00:03:01,290
+fit that to your training set and this will give you some other.
+进行同样的过程这样你会得到另一个
+
+50
+00:03:01,290 --> 00:03:03,040
+Parameter vector theta.
+参数向量 theta
+
+51
+00:03:03,040 --> 00:03:06,790
+In order to distinguish between these different parameter vectors, I'm going
+为了区别这些不同的参数向量theta,
+
+52
+00:03:06,790 --> 00:03:11,550
+to use a superscript one superscript two there where theta superscript one
+我想用上标(1) 上标(2)来表示,这里的上标(1)表示的是
+
+53
+00:03:11,550 --> 00:03:16,170
+just means the parameters I get by fitting this model to my training data.
+在调整第一个模型使其拟合训练数据时得到的参数theta
+
+54
+00:03:16,170 --> 00:03:19,130
+And theta superscript two just means the parameters I
+同样地 theta上标(2)表示的是
+
+55
+00:03:19,130 --> 00:03:23,940
+get by fitting this quadratic function to my training data and so on.
+二次函数在和训练数据拟合的过程中得到的参数,以此类推
+
+56
+00:03:23,940 --> 00:03:30,500
+By fitting a cubic model I get parenthesis three up to, well, say theta 10.
+在拟合三次函数模型时我又得到一个参数theta(3)等等,直到theta(10)
+
+57
+00:03:30,500 --> 00:03:36,200
+And one thing we ccould do is that take these parameters and look at test error.
+接下来我们要做的是对所有这些模型求出测试集误差
+
+58
+00:03:36,200 --> 00:03:38,800
+So I can compute on my test set J test of one,
+因此 我可以算出Jtest(θ(1))
+
+59
+00:03:38,800 --> 00:03:46,330
+J test of theta two, and so on.
+Jtest(θ(2)),等等
+
+60
+00:03:47,410 --> 00:03:51,930
+J test of theta three, and so on.
+Jtest(θ(3)),以此类推
+
+61
+00:03:53,050 --> 00:03:57,546
+So I'm going to take each of my hypotheses with the corresponding parameters and
+也就是对于每一个模型对应的假设
+
+62
+00:03:57,546 --> 00:04:00,270
+just measure the performance of on the test set.
+都计算出其作用于测试集的表现如何
+
+63
+00:04:00,270 --> 00:04:05,010
+Now, one thing I could do then is, in order to select one of these models,
+接下来为了确定选择哪一个模型最好
+
+64
+00:04:05,010 --> 00:04:09,160
+I could then see which model has the lowest test set error.
+我要做的是
+
+65
+00:04:09,160 --> 00:04:09,930
+And let's just say for
+那么我们假设
+
+66
+00:04:09,930 --> 00:04:14,480
+this example that I ended up choosing the fifth order polynomial.
+对于这一个例子最终选择了五次多项式模型
+
+67
+00:04:13,880 --> 00:04:15,430
+看看这些模型中哪一个对应的测试集误差最小
+
+68
+00:04:14,480 --> 00:04:16,940
+So, this seems reasonable so far.
+这一过程目前看来还比较合理
+
+69
+00:04:16,940 --> 00:04:21,060
+But now let's say I want to take my fifth hypothesis, this, this,
+那么现在 我确定了我使用我的假设也就是这个五次函数模型
+
+70
+00:04:21,060 --> 00:04:26,080
+fifth order model, and let's say I want to ask, how well does this model generalize?
+这个五次函数模型,现在我想知道这个模型能不能推广到新的样本
+
+71
+00:04:27,190 --> 00:04:30,560
+One thing I could do is look at how well my fifth order
+我们可以观察这个五次多项式假设模型
+
+72
+00:04:30,560 --> 00:04:34,710
+polynomial hypothesis had done on my test set.
+对测试集的拟合情况
+
+73
+00:04:34,710 --> 00:04:39,450
+But the problem is this will not be a fair estimate of how well my
+但这里有一个问题是这样做仍然不能公平地说明
+
+74
+00:04:39,450 --> 00:04:42,360
+hypothesis generalizes.
+我的假设推广到一般时的效果
+
+75
+00:04:42,360 --> 00:04:48,140
+And the reason is what we've done is we've fit this extra parameter d,
+其原因在于我们选择了一个能够最好地拟合测试集的参数d的值
+
+76
+00:04:48,140 --> 00:04:50,870
+that is this degree of polynomial.
+及多项式的度
+
+77
+00:04:50,870 --> 00:04:54,720
+And what fits that parameter d, using the test set, namely,
+即我们选择了一个参数d的值
+
+78
+00:04:54,720 --> 00:05:00,310
+we chose the value of d that gave us the best possible performance on the test set.
+我们选择了一个能够最好地拟合测试集的参数d的值
+
+79
+00:05:00,310 --> 00:05:06,340
+And so, the performance of my parameter vector theta5, on the test set,
+因此 我们的参数向量theta(5)在拟合测试集时的结果
+
+80
+00:05:06,340 --> 00:05:11,160
+that's likely to be an overly optimistic estimate of generalization error.
+也就是对测试样本预测误差时,很可能导致一个比实际泛化误差更完美的预测结果
+
+81
+00:05:11,160 --> 00:05:15,640
+Right, so, that because I had fit this parameter d to my test set is no longer
+对吧?因为我是找了一个最能拟合测试集的参数d
+
+82
+00:05:15,640 --> 00:05:21,410
+fair to evaluate my hypothesis on this test set, because I fit my parameters
+因此我再用测试集来评价我的假设,就显得不公平了因为我已经选了一个能够最拟合测试集的参数
+
+83
+00:05:21,410 --> 00:05:25,900
+to this test set, I've chose the degree d of polynomial using the test set.
+我选择的多项式次数d本身就是按照最拟合测试集来选择的
+
+84
+00:05:25,900 --> 00:05:29,430
+And so my hypothesis is likely to do better on
+因此我的假设很可能很好地
+
+85
+00:05:29,430 --> 00:05:33,650
+this test set than it would on new examples that it hasn't seen before, and
+很可能很好地拟合测试集而且这种拟合的效果很可能会比对那些没见过的新样本拟合得更好
+
+86
+00:05:33,650 --> 00:05:36,140
+that's which is,which is what I really care about.
+而我们其实是更关心对新样本的拟合效果的
+
+87
+00:05:36,140 --> 00:05:41,050
+So just to reiterate, on the previous slide, we saw that if we fit some set of
+所以,再回过头来说在前面的幻灯片中我们看到,如果我们
+
+88
+00:05:41,050 --> 00:05:45,210
+parameters, you know, say theta0, theta1, and so on, to some training set,
+用训练集来拟合参数,theta0 theta1 等等参数时
+
+89
+00:05:45,210 --> 00:05:50,300
+then the performance of the fitted modelon the training set is not predictive of
+那么 拟合后的模型在作用于训练集上的效果,是不能预测出
+
+90
+00:05:50,300 --> 00:05:53,500
+how well the hypothesis willgeneralize to new examples.
+我们将这个假设推广到新样本上时其效果如何的
+
+91
+00:05:53,500 --> 00:05:56,770
+Is because these parameters were fit to the training set,
+这是因为这些参数能够很好地拟合训练集,
+
+92
+00:05:56,770 --> 00:05:59,110
+so they're likely to do well on the training set,
+因此它们很有可能在对训练集的预测中表现地很好,
+
+93
+00:05:59,110 --> 00:06:03,200
+even if the parameters don't do well on other examples.
+但对其他的新样本来说 就不一定那么好了
+
+94
+00:06:03,200 --> 00:06:07,460
+And, in the procedure I just described on this line, we just did the same thing.
+而在刚才这一页幻灯片上我讲到的步骤,实际上是在做相同的工作
+
+95
+00:06:07,460 --> 00:06:13,060
+And specifically, what we did was,we fit this parameter d to the test set.
+具体地说,我们是在对测试集进行拟合
+
+96
+00:06:13,060 --> 00:06:16,770
+And by having fit the parameter to the test set, this means that
+而通过拟合测试集得到的参数
+
+97
+00:06:16,770 --> 00:06:22,010
+the performance of the hypothesis on that test set may not be a fair estimate of how
+其假设对于测试样本的预测效果不能公平地估计出
+
+98
+00:06:22,010 --> 00:06:26,670
+well the hypothesis is, is likely to do on examples we haven't seen before.
+这个假设对于未知的新样本的预测效果如何
+
+99
+00:06:26,670 --> 00:06:30,630
+To address this problem, in a model selection setting,
+为了调整这个评价假设时模型选择的问题,
+
+100
+00:06:30,630 --> 00:06:35,550
+if we want to evaluate a hypothesis, this is what we usually do instead.
+我们通常会采用如下的方法来解决
+
+101
+00:06:35,550 --> 00:06:40,200
+Given the data set, instead of just splitting into a training test set,
+如果我们有这样的数据集,我们不要将其仅仅分为训练集和测试集两部分
+
+102
+00:06:40,200 --> 00:06:43,930
+what we're going to do is then split it into three pieces.
+而是分割为三个部分
+
+103
+00:06:43,930 --> 00:06:49,130
+And the first piece is going to be called the training set as usual.
+第一部分和之前一样,也是训练集
+
+104
+00:06:50,130 --> 00:06:53,300
+So let me call this first part the training set.
+那么我们把第一部分还是称为训练集
+
+105
+00:06:54,780 --> 00:07:00,056
+And the second piece of this data, I'm going to call the cross validation set.
+然后第二部分数据,我们将其称为交叉验证集
+
+106
+00:07:00,056 --> 00:07:04,711
+[SOUND] Cross validation.
+交叉验证
+
+107
+00:07:04,711 --> 00:07:08,860
+And the cross validation, as CV .
+我用CV 来作为交叉验证的缩写
+
+108
+00:07:08,860 --> 00:07:13,520
+Sometimes it's also called the validation set instead of cross validation set.
+有时也把交叉验证直接称为验证集
+
+109
+00:07:13,520 --> 00:07:18,060
+And then the loss can be to call the usual test set.
+然后最后这部分数据我们依然称之为测试集
+
+110
+00:07:18,060 --> 00:07:21,930
+And the pretty, pretty typical ratio at which to split these things will be
+那么最典型的比例是分配三组数据
+
+111
+00:07:21,930 --> 00:07:24,990
+to send 60% of your data's, your training set,
+将整个数据的60%分给训练集
+
+112
+00:07:24,990 --> 00:07:29,290
+maybe 20% to your cross validation set, and 20% to your test set.
+然后20%作为验证集,20%作为测试集
+
+113
+00:07:29,290 --> 00:07:33,600
+And these numbers can vary a little bit but this integration be pretty typical.
+当然这些比值可以稍微进行调整,但这种分法是最典型的比例
+
+114
+00:07:33,600 --> 00:07:38,922
+And so our training sets will now be only maybe 60% of the data, and our
+因此现在我们的训练集就是大约60%的总数据,
+
+115
+00:07:38,922 --> 00:07:44,860
+cross-validation set, or our validation set, will have some number of examples.
+我们的交叉验证集或者叫验证集 就会有某个数量的样本
+
+116
+00:07:44,860 --> 00:07:47,290
+I'm going to denote that m subscript cv.
+我用m下标cv来表示这个数量
+
+117
+00:07:47,290 --> 00:07:50,860
+So that's the number of cross-validation examples.
+因此它表示的是交叉验证样本的总数
+
+118
+00:07:52,040 --> 00:07:59,110
+Following our early notational convention I'm going to use xi cv comma y i cv,
+沿用我们之前使用的标记法则,我们还是使用(x(i)cv, y(i)cv)
+
+119
+00:07:59,110 --> 00:08:02,720
+to denote the i cross validation example.
+来表示交叉验证样本
+
+120
+00:08:02,720 --> 00:08:07,290
+And finally we also have a test set over here with our
+最后,我们同样有一个测试集
+
+121
+00:08:07,290 --> 00:08:11,420
+m subscript test being the number of test examples.
+我们用m下标test来表示测试样本的总数
+
+122
+00:08:11,420 --> 00:08:14,570
+So, now that we've defined the training validation or
+所以 这样我们就定义了训练集或
+
+123
+00:08:14,570 --> 00:08:16,740
+cross validation and test sets.
+交叉验证和以及测试集
+
+124
+00:08:16,740 --> 00:08:21,420
+We can also define the training error, cross validation error, and test error.
+同样地我们可以定义训练误差 交叉验证误差和测试误差
+
+125
+00:08:21,420 --> 00:08:23,790
+So here's my training error, and
+因此训练误差可以这样定义
+
+126
+00:08:23,790 --> 00:08:26,820
+I'm just writing this as J subscript train of theta.
+这里我将训练误差写作J下标train(θ)
+
+127
+00:08:26,820 --> 00:08:29,030
+This is pretty much the same things.
+这和前面完全是一个意思
+
+128
+00:08:29,030 --> 00:08:32,260
+These are the same thing as the J of theta that I've been writing so
+这就是我们通常写的J(θ)
+
+129
+00:08:32,260 --> 00:08:37,110
+far, this is just a training set error you know, as measuring a training set and then
+这就是当你对训练集进行预测时,得到的训练集误差
+
+130
+00:08:37,110 --> 00:08:41,470
+J subscript cv my cross validation error,this is pretty much what you'd expect,
+然后J下标cv表示的是验证集误差,
+
+131
+00:08:41,470 --> 00:08:45,970
+just like the training error you've set measure it on a cross validation data set,
+就像训练集误差一样,只不过是对验证样本进行预测得到的结果而已
+
+132
+00:08:45,970 --> 00:08:48,450
+and here's my test set error same as before.
+另外这是我的测试误差也是一样的
+
+133
+00:08:49,530 --> 00:08:53,410
+So when faced with a model selection problem like this, what we're going to
+因此 在考虑像这样的模型选择问题时
+
+134
+00:08:53,410 --> 00:08:58,709
+do is, instead of using the test set to select the model, we're instead
+我们不再使用测试集来进行模型选择
+
+135
+00:08:58,709 --> 00:09:04,580
+going to use the validation set, or the cross validation set, to select the model.
+取而代之的是我们将使用验证集,也叫交叉验证集来选择模型
+
+136
+00:09:04,580 --> 00:09:10,570
+Concretely, we're going to first take our first hypothesis, take this first model,
+具体来讲,我们首先要选取第一种假设,或者说第一个模型
+
+137
+00:09:10,570 --> 00:09:13,580
+and say, minimize the cross function, and
+使得代价函数取最小值
+
+138
+00:09:13,580 --> 00:09:17,520
+this would give me some parameter vector theta for the new model.
+这样我们可以得到对应一次模型的一个参数向量theta
+
+139
+00:09:17,520 --> 00:09:20,300
+And, as before, I'm going to put a superscript 1,
+然后像之前一样,我们也用上标(1)
+
+140
+00:09:20,300 --> 00:09:23,560
+just to denote that this is the parameter for the new model.
+来表示它是一次模型的参数
+
+141
+00:09:23,560 --> 00:09:25,660
+We do the same thing for the quadratic model.
+然后 我们再对二次函数模型进行同样的步骤
+
+142
+00:09:25,660 --> 00:09:27,927
+Get some parameter vector theta two.
+这样我们又得到一个参数向量theta(2)
+
+143
+00:09:27,927 --> 00:09:31,601
+Get some para,parameter vector theta three, and so
+然后是theta(3),等等
+
+144
+00:09:31,601 --> 00:09:35,470
+on, down to theta ten for the polynomial.
+一直到多项式次数为10的情况
+
+145
+00:09:35,470 --> 00:09:40,440
+And what I'm going to do is, instead of testing these hypotheses on the test set,
+接下来我的做法,与之前不同的是,我不是使用测试集来测试这些假设的表现如何
+
+146
+00:09:40,440 --> 00:09:43,130
+I'm instead going to test themon the cross validation set.
+而是使用交叉验证集来测试其预测效果
+
+147
+00:09:43,130 --> 00:09:46,600
+And measure J subscript cv,
+因此我要对每一个模型都算出其对应的Jcv
+
+148
+00:09:46,600 --> 00:09:52,180
+to see how well each of these hypotheses do on my cross validation set.
+来观察这些假设模型中,哪一个能最好地对交叉验证集进行预测
+
+149
+00:09:53,250 --> 00:09:57,180
+And then I'm going to pick the hypothesis with the lowest cross validation error.
+我将选择交叉验证误差最小的那一组假设作为我们的模型
+
+150
+00:09:57,180 --> 00:10:00,180
+So for this example, let's say for the sake of argument,
+因此对于这个例子,为了讨论的方便
+
+151
+00:10:00,180 --> 00:10:06,550
+that it was my 4th order polynomial, that had the lowest cross validation error.
+我们假设四次多项式,对应的交叉验证误差最小
+
+152
+00:10:06,550 --> 00:10:11,070
+So in that case I'm going to pick this fourth order polynomial model.
+那么在这种情况下,我就选择四次多项式模型
+
+153
+00:10:11,070 --> 00:10:15,250
+And finally,what this means is that that parameter d,
+作为我们最终的选择。这表示的是参数d
+
+154
+00:10:15,250 --> 00:10:17,200
+remember d was the degree of polynomial, right?
+别忘了参数d指的是多项式次数
+
+155
+00:10:17,200 --> 00:10:20,270
+So d equals two, d equals three, all the way up to d equals 10.
+d等于2 等于3,一直到d等于10
+
+156
+00:10:20,270 --> 00:10:25,040
+What we've done is we'll fit that parameter d and we'll say d equals four.
+我们做的是我们选择了最合适的d=4
+
+157
+00:10:25,040 --> 00:10:27,290
+And we did so using the cross-validation set.
+我们是使用交叉验证集来确定的这个参数
+
+158
+00:10:27,290 --> 00:10:32,320
+And so this degree of polynomial, so the parameter, is no longer fit to the test
+因此,这个参数d,即这个多项式次数,就没有跟测试集拟合过了
+
+159
+00:10:32,320 --> 00:10:39,260
+set, and so we've not saved away the test set, and we can use the test set to
+这样我们就为测试集留出了一条路,现在我们就能使用
+
+160
+00:10:39,260 --> 00:10:44,325
+measure, or to estimate the generalization error of the model that was selected.
+测试集来预测或者估计,通过学习算法得出的模型的泛化误差了
+
+161
+00:10:44,325 --> 00:10:47,680
+By the of them.
+
+162
+00:10:47,680 --> 00:10:51,140
+So, that was model selection and how you can take your data,
+因此,这是模型选择问题,包括你应该如何将数据
+
+163
+00:10:51,140 --> 00:10:54,310
+split it into a training, validation, and test set.
+包括你应该如何将数据和测试集
+
+164
+00:10:54,310 --> 00:10:57,310
+And use your cross validation data to select the model and
+以及使用你的交叉验证集,来选择最合适的模型,并且
+
+165
+00:10:57,310 --> 00:10:58,570
+evaluate it on the test set.
+使用测试集进行预测
+
+166
+00:10:59,630 --> 00:11:02,860
+One final note, I should say that in.
+最后我要说明的一点是
+
+167
+00:11:02,860 --> 00:11:06,210
+The machine learning, as of this practice today, there aren't many
+在如今的机器学习应用中,很少有人
+
+168
+00:11:06,210 --> 00:11:10,470
+people that will do that early thing that
+I talked about, and said that, you know,
+按照我最开始介绍的那个步骤做的,就像我们前面说过的(使用测试集来做模型选择,然后再使用同样的测试集,来测出误差,评价假设的预测效果)
+
+169
+00:11:10,470 --> 00:11:15,360
+it isn't such a good idea, of selecting your model using this test set.
+这实在不是一个好主意,用测试集来选择你的模型
+
+170
+00:11:15,360 --> 00:11:19,590
+And then using the same test set to report the error as though
+然后使用相同的测试集来报告误差
+
+171
+00:11:19,590 --> 00:11:24,120
+selecting your degree of polynomial on the test set, and then reporting the error on
+通过使用用测试集来选择多项式的次数,然后求测试集的预测误差
+
+172
+00:11:24,120 --> 00:11:27,840
+the test set as though that were a good estimate of generalization error.
+尽管这样做的确能得到一个很理想的泛化误差
+
+173
+00:11:27,840 --> 00:11:31,160
+That sort of practice is unfortunately many, many people do do it.
+但很遗憾确实有很多人就是这样做的
+
+174
+00:11:31,160 --> 00:11:35,550
+If you have a massive, massive test that is maybe not a terrible thing to do,
+如果你有一个很大很大的测试集,也许情况稍微好些
+
+175
+00:11:35,550 --> 00:11:38,090
+but many practitioners,
+但大多数的机器学习
+
+176
+00:11:38,090 --> 00:11:42,360
+most practitioners that machine
+learnimg tend to advise against that.
+实践者一般还是建议最好不要这样做
+
+177
+00:11:42,360 --> 00:11:45,760
+And it's considered better practice to have separate train validation and
+最好还是分成训练集、验证集和
+
+178
+00:11:45,760 --> 00:11:46,728
+test sets.
+测试集
+
+179
+00:11:46,728 --> 00:11:50,620
+I just warned you to sometimes people to do, you know, use the same data for
+但就像我之前说的,总有很多人使用同一组数据
+
+180
+00:11:50,620 --> 00:11:54,320
+the purpose of the validation set, and for the purpose of the test set.
+既用来做验证集,也同时用来做测试集
+
+181
+00:11:54,320 --> 00:11:57,430
+You need a training set and a test set, and that's good,
+你只需把数据只分成训练集和测试集,因为这能得到很好的效果
+
+182
+00:11:57,430 --> 00:12:00,020
+that's practice, though you will see some people do it.
+所以你会看到很多人是这么做的
+
+183
+00:12:00,020 --> 00:12:03,090
+But, if possible, I would recommend against doing that yourself.
+但我还是建议尽量不要这样做
+
diff --git a/srt/10 - 4 - Diagnosing Bias vs. Variance (8 min).srt b/srt/10 - 4 - Diagnosing Bias vs. Variance (8 min).srt
new file mode 100644
index 00000000..8f9ae1bd
--- /dev/null
+++ b/srt/10 - 4 - Diagnosing Bias vs. Variance (8 min).srt
@@ -0,0 +1,615 @@
+1
+00:00:00,120 --> 00:00:01,220
+If you run the learning algorithm
+当你运行一个学习算法时(字幕整理,中国海洋大学,黄海广,haiguang2000@qq.com)
+
+2
+00:00:01,710 --> 00:00:02,640
+and it doesn't do as well
+如果这个算法的表现不理想
+
+3
+00:00:02,840 --> 00:00:04,520
+as you are hoping, almost all
+那么多半是出现
+
+4
+00:00:04,740 --> 00:00:05,670
+the time it will be because
+两种情况
+
+5
+00:00:06,100 --> 00:00:07,650
+you have either a high bias
+要么是偏差比较大
+
+6
+00:00:08,010 --> 00:00:09,530
+problem or a high variance problem.
+要么是方差比较大
+
+7
+00:00:09,860 --> 00:00:10,940
+In other words they're either an
+换句话说 出现的情况要么是欠拟合
+
+8
+00:00:11,130 --> 00:00:13,140
+underfitting problem or an overfitting problem.
+要么是过拟合问题
+
+9
+00:00:14,260 --> 00:00:15,090
+And in this case it's very
+那么这两种情况
+
+10
+00:00:15,350 --> 00:00:16,580
+important to figure out
+哪个和偏差有关
+
+11
+00:00:16,790 --> 00:00:17,970
+which of these two problems is
+哪个和方差有关
+
+12
+00:00:18,280 --> 00:00:19,500
+bias or variance or a bit of both that you
+或者是不是和两个都有关
+
+13
+00:00:20,210 --> 00:00:20,430
+actually have.
+搞清楚这一点非常重要
+
+14
+00:00:21,050 --> 00:00:21,980
+Because knowing which of these
+因为能判断出现的情况
+
+15
+00:00:22,440 --> 00:00:23,890
+two things is happening would give
+是这两种情况中的哪一种
+
+16
+00:00:24,060 --> 00:00:25,940
+a very strong indicator for whether
+其实是一个很有效的指示器
+
+17
+00:00:26,180 --> 00:00:27,490
+the useful and promising ways
+指引着可以改进算法的
+
+18
+00:00:27,770 --> 00:00:29,030
+to try to improve your algorithm.
+最有效的方法和途径
+
+19
+00:00:30,230 --> 00:00:31,270
+In this video, I would like
+在这段视频中
+
+20
+00:00:31,380 --> 00:00:33,030
+to delve more deeply into
+我想更深入地探讨一下
+
+21
+00:00:33,220 --> 00:00:34,850
+this bias and various issue and
+有关偏差和方差的问题
+
+22
+00:00:35,180 --> 00:00:36,530
+understand them better as well
+希望你能对它们有一个更深入的理解
+
+23
+00:00:36,790 --> 00:00:38,470
+as figure out how to look
+并且也能弄清楚怎样评价一个学习算法
+
+24
+00:00:38,610 --> 00:00:42,910
+at and evaluate knows whether or not we might have a bias problem or a variance problem.
+能够判断一个算法是偏差还是方差有问题
+
+25
+00:00:43,030 --> 00:00:45,750
+Since this would be critical to
+因为这个问题对于弄清
+
+26
+00:00:45,900 --> 00:00:48,180
+figuring out how to improve the performance of learning algorithm that you implement.
+如何改进学习算法的效果非常重要
+
+27
+00:00:48,640 --> 00:00:52,270
+So you've already
+好的 这几幅图
+
+28
+00:00:52,680 --> 00:00:53,690
+seen this figure a few times,
+你已经见过很多次了
+
+29
+00:00:54,190 --> 00:00:55,230
+where if you fit two simple
+如果你用两个很简单的假设来拟合数据
+
+30
+00:00:55,710 --> 00:00:57,900
+hypothesis, like a straight line that that underfits the data.
+比如说用一条直线 那么不足以拟合这组数据(欠拟合)
+
+31
+00:00:59,660 --> 00:01:00,720
+If you fit a two complex
+而如果你用两个很复杂的假设来拟合时
+
+32
+00:01:01,250 --> 00:01:02,870
+hypothesis, then that might
+那么对训练集来说
+
+33
+00:01:03,400 --> 00:01:05,050
+fit the training set perfectly but
+则会拟合得很好
+
+34
+00:01:05,270 --> 00:01:06,810
+overfit the data and this
+但又过于完美(过拟合)
+
+35
+00:01:06,930 --> 00:01:09,000
+may be hypothesis of some
+而像这样的
+
+36
+00:01:09,340 --> 00:01:11,000
+intermediate level of complexity,
+中等复杂度的假设
+
+37
+00:01:11,810 --> 00:01:13,120
+of some, maybe degree two
+比如某种二次多项式的假设
+
+38
+00:01:13,390 --> 00:01:15,770
+polynomials are not too low and not too high degree.
+次数既不高也不低
+
+39
+00:01:16,560 --> 00:01:17,340
+That's just right.
+这种假设对数据拟合得刚刚好
+
+40
+00:01:17,560 --> 00:01:18,480
+And gives you the best
+此时对应的的泛化误差
+
+41
+00:01:19,100 --> 00:01:20,740
+generalization error out of these options.
+也是三种情况中最小的
+
+42
+00:01:21,770 --> 00:01:22,960
+Now that we're armed with the
+现在我们已经掌握了
+
+43
+00:01:23,030 --> 00:01:25,130
+notion of training and validation
+训练集 验证集和测试集的概念
+
+44
+00:01:26,100 --> 00:01:27,550
+in test sets, we can understand
+我们就能更好地理解
+
+45
+00:01:28,290 --> 00:01:30,530
+the concepts of bias and variance a little bit better.
+偏差和方差的问题
+
+46
+00:01:31,310 --> 00:01:33,140
+Concretely, let our
+具体来说
+
+47
+00:01:33,370 --> 00:01:34,920
+training error and cross
+我们沿用之前所使用的
+
+48
+00:01:35,050 --> 00:01:36,620
+validation error be defined as
+训练集误差和验证集
+
+49
+00:01:36,850 --> 00:01:38,440
+in the previous videos, just say,
+误差的定义
+
+50
+00:01:38,680 --> 00:01:40,110
+the squared error, the average
+也就是平方误差
+
+51
+00:01:40,450 --> 00:01:41,420
+squared error as measured
+即对训练集数据进行预测
+
+52
+00:01:41,830 --> 00:01:42,810
+on the 20 sets or as
+或对验证集数据进行预测
+
+53
+00:01:42,930 --> 00:01:44,710
+measured on the cross validation set.
+所产生的平均平方误差
+
+54
+00:01:46,560 --> 00:01:47,690
+Now let's plot the following figure.
+下面我们来画出如下这个示意图
+
+55
+00:01:48,470 --> 00:01:49,930
+On the horizontal axis I am
+横坐标上表示的是
+
+56
+00:01:50,010 --> 00:01:52,000
+going to plot the degree of polynomial,
+多项式的次数
+
+57
+00:01:52,400 --> 00:01:53,380
+so as I go the right
+因此横坐标越往右的位置
+
+58
+00:01:54,810 --> 00:01:57,050
+I'm going to be fitting higher and higher order polynomials.
+表示多项式的次数越大
+
+59
+00:01:58,000 --> 00:02:02,687
+So will the left of this figure, where maybe D equals one, we're
+那么我们来画这幅图对应的情况,d可能等于1的情况,
+
+60
+00:02:02,687 --> 00:02:07,500
+going to be fitting very simple figures.Whereas way here on the right of the
+是用很简单的函数来进行拟合,而在右边的这个图中
+
+61
+00:02:07,500 --> 00:02:12,187
+horizontal axis, have much larger values of D. So a much higher degree of
+水平横坐标表示有更多更大的d值.表示更高次数的
+
+62
+00:02:12,187 --> 00:02:17,487
+polynomial. And so here that's going to correspond to fitting. Much more complex
+多项式.因此这些位置对应着使用更复杂的
+
+63
+00:02:17,487 --> 00:02:22,836
+functions to your training set. Let's look at the training error and the cross
+函数来拟合你的训练集时所需要的d值。让我们来把训练集误差和交叉
+
+64
+00:02:22,836 --> 00:02:27,342
+validation error and plot them on this figure. Let's start with the training
+验证集误差画在这个坐标中。我们先来画训练集误差
+
+65
+00:02:27,342 --> 00:02:32,086
+error. As we increase the degree of the polynomial, we're going to be able to fit
+随着我们增大多项式的次数,我们将对训练集拟合得
+
+66
+00:02:32,086 --> 00:02:36,533
+our training set better and better. And so, if D=1 then it's a relatively high
+越来越好。所以如果d等于1时,对应着一个比较大的
+
+67
+00:02:36,533 --> 00:02:41,157
+training error. If we have a very high degree polynomial our training error is
+训练误差。而如果我们的多项式次数很高时,我们的训练误差
+
+68
+00:02:41,157 --> 00:02:45,960
+going to be really low maybe even zero because we'll fit the training set really well.
+就会很小,甚至可能等于0,因为可能非常拟合训练集
+
+69
+00:02:45,960 --> 00:02:50,644
+And so as we increase the degree of polynomial, we find typically that the
+所以,当我们增大多项式次数时,我们不难发现
+
+70
+00:02:50,644 --> 00:02:55,776
+training error decreases. So, I'm going to write J. Subscript three of data there.
+训练误差明显下降。这里我写上J下标3来表示训练集误差
+
+71
+00:02:55,776 --> 00:03:01,565
+Because our training error tends to decrease with the degree of polynomial
+因为随着我们对数据拟合所需多项式次数的增大
+
+72
+00:03:01,565 --> 00:03:07,652
+that we fit to the data. Next let's look at the cross validation error. Or for that
+训练误差是趋于下降的。接下来我们再看交叉验证误差
+
+73
+00:03:07,652 --> 00:03:13,812
+matter, if we look at the test set error we'll get a pretty similar result as if we
+事实上如果我们观察测试集误差的话,我们会得到一个和交叉验证误差
+
+74
+00:03:13,812 --> 00:03:19,292
+were to plot the cross validation error. So, we know that if D=1. We're fitting a
+非常接近的结果。所以,我们知道如果d等于1的话
+
+75
+00:03:19,292 --> 00:03:23,758
+very simple function. And so we may be under fitting the training set. And so
+意味着用一个很简单的函数来拟合数据。也就是说
+
+76
+00:03:23,758 --> 00:03:28,343
+we're going to have a very high cross validation error. If we fit, you know, an
+ 我们会得到一个较大的交叉验证误差。而如果我们用
+
+77
+00:03:28,343 --> 00:03:33,107
+intermediate degree polynomial, there's, we have a D equals two in our example on
+一个中等大小的多项式次数来拟合时,在前一张幻灯片中我们用的d等于2
+
+78
+00:03:33,107 --> 00:03:38,049
+the previous slide, we're going to have a much lower cross validation error, because
+那么我们会得到一个更小的交叉验证误差
+
+79
+00:03:38,049 --> 00:03:43,110
+we're just fitting, finding a much better fit to the data. And conversely, if D were
+因为我们找了一个能够更好拟合数据的次数,同样地,反过来,如果次数d
+
+80
+00:03:43,110 --> 00:03:47,874
+too high, so if D took on, say, a value of four, then we're getting over fitting, and
+太大,比如说d的值取为4,那么我们又过拟合了
+
+81
+00:03:47,874 --> 00:03:52,669
+so we ended with a high value for cross validation error. So, if you were to. vary
+我们又会得到一个较大的交叉验证误差
+
+82
+00:03:52,669 --> 00:03:57,854
+this smoothly and plot the curve. You might end up with a curve like that. Where,
+因此 如果你平稳地过渡这几个点,你可以绘制出一条平滑的曲线,就像这样
+
+83
+00:03:57,854 --> 00:04:03,737
+that's J.C.V. Of staza. Indicating that you plot J. Tesla's data, you get
+我用Jcv(θ)来表示。同样地,如果你画出Jtest(θ),你也将得到
+
+84
+00:04:03,737 --> 00:04:09,465
+something very similar. And so, this sort of plot also helps us to better understand
+一条类似的曲线,这样一幅图也同时能帮助我们更好地理解
+
+85
+00:04:09,465 --> 00:04:13,829
+the notions of bias and variance. Concretely, suppose you've applied a
+偏差和方差的概念。具体来说,假设你得出了一个
+
+86
+00:04:13,829 --> 00:04:18,447
+learning algorithm, and it's not performing as well as you were hoping.
+学习算法,而这个算法并没有表现地如你期望那么好。
+
+87
+00:04:18,447 --> 00:04:23,507
+So, if your cross-validation set error, or your test set error is high. How can we
+所以你的交叉验证误差或者测试集误差都很大。我们应该
+
+88
+00:04:23,507 --> 00:04:28,758
+figure out if the learning algorithm is suffering from high bias or if it's suffer
+如何判断此时的学习算法正处于高偏差的问题还是
+
+89
+00:04:28,758 --> 00:04:34,130
+from high variance? So the setting of the cause validation error being high,
+高方差的问题?交叉验证误差比较大的情况,
+
+90
+00:04:34,130 --> 00:04:39,789
+corresponds to either this regime or this regime. So this regime on the left
+对应着曲线中的这一端,或者这一端,那么左边的这一端,
+
+91
+00:04:39,789 --> 00:04:45,747
+corresponds to a high bias problem. That is, if you're fitting a overly low order
+对应的就是高偏差的问题。也就是你使用了一个过于小的
+
+92
+00:04:45,747 --> 00:04:51,556
+polynomial, such as a D=1. When we really needed a higher order polynomial to fit
+多项式次数,比如d等于1。但实际上我们需要一个较高的多项式次数来拟合数据
+
+93
+00:04:51,556 --> 00:04:57,513
+the data. Whereas in contrast, this regime corresponds to a high variance problem.
+相反地,右边这一端对应的是高方差问题。
+
+94
+00:04:57,513 --> 00:05:03,247
+That is, if D, the degree of polynomial was too large for the data set that we
+也就是说,多项式次数d对于我们的数据来讲太大了,
+
+95
+00:05:03,247 --> 00:05:08,535
+have. And this figure just has a clue for how to distinguish between these two
+这幅图也提示了我们怎样区分这两种情况。
+
+96
+00:05:08,535 --> 00:05:15,790
+cases. Concretely for the high bias case.That is the case of under fitting. What we
+具体地说,对于高偏差的情况,也就是对应欠拟合的情况,我们
+
+97
+00:05:15,790 --> 00:05:21,790
+find is that both the cross validation error and the trading error are going to
+发现交叉验证误差和训练误差都会
+
+98
+00:05:21,790 --> 00:05:29,770
+be high. So if your algorithm is suffering from a bias problem. The training set
+很大。因此,如果你的算法有偏差问题的话,那么训练集
+
+99
+00:05:29,770 --> 00:05:38,162
+error, will be high. And you might find that the cross validation error will also
+误差将会比较大。同时你可能会发现交叉验证集误差
+
+100
+00:05:38,162 --> 00:05:45,271
+be high. It might be a close. Maybe just slightly higher than a training error. And
+也很大。两个误差可能很接近,或者可能验证误差稍大一点
+
+101
+00:05:45,271 --> 00:05:52,211
+so, if you see this combination that's a sign your algorithm may be suffering from
+所以如果你看到这样的组合情况,那就表示你的算法正处于
+
+102
+00:05:52,211 --> 00:05:58,981
+high bias. In contrast if your algorithm is suffering from high variance then if
+高偏差的问题。反过来,如果你的算法处于高方差的问题,
+
+103
+00:05:58,981 --> 00:06:05,921
+you look here. We'll notice that J-train that is the training error is going to be
+那么如果你观察这里,我们会发现 Jtrain就是训练误差,
+
+104
+00:06:05,921 --> 00:06:13,626
+low. That is your fitting the training set very well. Where as your, cross validation
+会很小。也就意味着,你对训练集数据拟合得非常好。而你的交叉验证误差也一样。
+
+105
+00:06:13,626 --> 00:06:18,575
+error. Assuming that this is say the squared era. Which we're trying to
+假设此时我们最小化的是平方误差,
+
+106
+00:06:18,575 --> 00:06:22,757
+minimize . Where as in contrast, your error on the cross
+而反过来,你的交叉验证集误差
+
+107
+00:06:22,757 --> 00:06:28,080
+validation set or your cross function in the cross validation set will be much bigger
+或者说你的交叉验证集对应的代价函数的值,将会远远大于
+
+108
+00:06:28,080 --> 00:06:32,884
+Than your training set error. So, there's a double greater than sign. That's the
+训练集误差。这里的双大于符号是一个
+
+109
+00:06:32,884 --> 00:06:37,935
+math symbol for much greater than, denoted by two greater than signs. And so, if you
+数学符号,表示远远大于,用两个大于符号表示。因此如果
+
+110
+00:06:37,935 --> 00:06:43,047
+see this combination of values then that might give you, that's a clue that your
+你看见这种组合的情况,这就预示着你的
+
+111
+00:06:43,047 --> 00:06:47,482
+learning algorithm maybe suffering from high variance. And might be over
+学习算法可能正处于高方差和过拟合的情况
+
+112
+00:06:47,482 --> 00:06:51,732
+emphasizing. And the key that distinguishes these two cases is if you
+同时,区分这两种不同情形的关键依据是
+
+113
+00:06:51,732 --> 00:06:56,844
+have a high bias problem your training set error will also be high. Your hypothesis
+如果你的算法处于高偏差的情况,那么你的训练集误差会很大。因为你的假设
+
+114
+00:06:56,844 --> 00:07:02,080
+is just not fitting the training set well. And if you have a high variance problem.
+不能很好地拟合训练集数据。而当你处于高方差的问题时
+
+115
+00:07:02,080 --> 00:07:06,852
+Your training set error will usually be low. That is much lower than your cross
+你的训练误差通常都会很小,并且远远小于
+
+116
+00:07:06,852 --> 00:07:11,684
+validation error. So hopefully that gives you a somewhat better understanding of the
+交叉验证误差。好的,但愿这节课能让你更清楚地理解
+
+117
+00:07:11,684 --> 00:07:15,845
+two problems of bias and variants. I still have a lot more to say about bias and
+偏差和方差这两种问题。在之后几段视频中,我还将对偏差和误差做更多的解释
+
+118
+00:07:15,845 --> 00:07:19,954
+variants in the next few videos. But what we'll see later is that by diagnosing
+但我们之后要关注的是诊断一个学习算法
+
+119
+00:07:19,954 --> 00:07:24,011
+whether a learning algorithm may be suffering from high bias or high variance,
+是否处于高偏差或高方差的情况。
+
+120
+00:07:24,011 --> 00:07:28,432
+we'll show you even more details of how to do that in later videos. We'll see that by
+在后面几段视频中我还将向你展示更多细节,我们将会看到
+
+121
+00:07:28,432 --> 00:07:32,697
+figuring out whether a learning algorithm may be suffering from high bias or high
+通过分清一个学习算法是处于高偏差还是高误差,
+
+122
+00:07:32,697 --> 00:07:36,806
+variance, or a combination of both, that, that would give us much better guidance
+还是两种情况的结合,这能够更好地指引我们
+
+123
+00:07:36,806 --> 00:07:42,367
+promising things to try in order to improve the performance of a learning algorithm.
+应该采取什么样的措施,来提高学习算法的性能表现.
+
diff --git a/srt/10 - 5 - Regularization and Bias_Variance (11 min).srt b/srt/10 - 5 - Regularization and Bias_Variance (11 min).srt
new file mode 100644
index 00000000..646e08bb
--- /dev/null
+++ b/srt/10 - 5 - Regularization and Bias_Variance (11 min).srt
@@ -0,0 +1,1625 @@
+1
+00:00:00,390 --> 00:00:02,440
+You've seen how regularization can help
+现在你应该已经知道(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,610 --> 00:00:04,660
+prevent overfitting, but how
+算法正则化可以有效地防止过拟合
+
+3
+00:00:04,960 --> 00:00:06,230
+does it affect the bias and
+但正则化跟算法的偏差和方差
+
+4
+00:00:06,460 --> 00:00:08,070
+variance of a learning algorithm?
+又有什么关系呢?
+
+5
+00:00:08,630 --> 00:00:09,890
+In this video, I like to
+在这段视频中
+
+6
+00:00:10,020 --> 00:00:11,180
+go deeper into the issue
+我想更深入地
+
+7
+00:00:11,550 --> 00:00:13,300
+of bias and variance, and
+探讨一下偏差和方差的问题
+
+8
+00:00:13,520 --> 00:00:14,450
+talk about how it interacts
+讨论一下两者之间
+
+9
+00:00:15,070 --> 00:00:15,880
+with, and is effected by,
+是如何相互影响的
+
+10
+00:00:16,070 --> 00:00:18,470
+the regularization of your learning algorithm.
+以及和算法的正则化之间的相互关系
+
+11
+00:00:22,484 --> 00:00:23,704
+Suppose we're fitting a high
+假如我们要对这样一个
+
+12
+00:00:25,814 --> 00:00:27,314
+order polynomial like that shown
+高阶的多项式
+
+13
+00:00:27,744 --> 00:00:28,944
+here, but to prevent
+进行拟合
+
+14
+00:00:29,384 --> 00:00:30,394
+overfitting, we're going to
+为了防止过拟合现象
+
+15
+00:00:30,534 --> 00:00:32,164
+use regularization, like that shown
+我们要使用如幻灯片所示的正则化
+
+16
+00:00:32,534 --> 00:00:33,694
+here, so we have this regularization
+因此我们试图通过这样一个正则化项
+
+17
+00:00:34,504 --> 00:00:35,674
+term to try to
+来让参数的值
+
+18
+00:00:35,714 --> 00:00:37,604
+keep the values of the parameters small.
+尽可能小
+
+19
+00:00:38,044 --> 00:00:39,724
+And as usual, the regularization sums
+正则化项的求和范围
+
+20
+00:00:40,094 --> 00:00:41,514
+from j equals 1 to
+照例取为j等于1到m
+
+21
+00:00:41,614 --> 00:00:42,804
+m rather than j equals 0
+而非j等于0到m
+
+22
+00:00:42,924 --> 00:00:45,454
+to m. Let's consider three cases.
+然后我们来分析以下三种情形
+
+23
+00:00:46,064 --> 00:00:47,914
+The first is the case of
+第一种情形是
+
+24
+00:00:47,984 --> 00:00:49,224
+a very large value of the
+正则化参数lambda
+
+25
+00:00:49,284 --> 00:00:51,574
+regularization parameter lambda, such
+取一个比较大的值
+
+26
+00:00:51,814 --> 00:00:52,964
+as if lambda were
+比如lambda的值
+
+27
+00:00:53,114 --> 00:00:53,924
+equal to 10,000s of huge value.
+取为10000甚至更大
+
+28
+00:00:53,704 --> 00:00:56,024
+In this
+在这种情况下
+
+29
+00:00:56,094 --> 00:00:57,234
+case, all of these
+所有这些参数
+
+30
+00:00:57,384 --> 00:00:58,974
+parameters, theta 1, theta 2,
+包括theta1 theta2
+
+31
+00:00:59,304 --> 00:01:00,034
+theta 3 and so on will
+theta3 等等
+
+32
+00:01:00,214 --> 00:01:02,114
+be heavily penalized and
+将被大大惩罚
+
+33
+00:01:02,294 --> 00:01:04,384
+so, what ends up with most
+其结果是
+
+34
+00:01:04,834 --> 00:01:06,164
+of these parameter values being close
+这些参数的值
+
+35
+00:01:06,514 --> 00:01:08,724
+to 0 and the hypothesis will be
+将近似等于0
+
+36
+00:01:08,904 --> 00:01:09,664
+roughly h or x
+并且假设模型
+
+37
+00:01:10,004 --> 00:01:11,704
+just equal or approximately equal
+h(x)的值将等于或者近似等于
+
+38
+00:01:11,994 --> 00:01:13,254
+to theta 0, and so we
+theta0的值
+
+39
+00:01:13,414 --> 00:01:15,284
+end up a hypothesis that more
+因此我们最终得到的假设函数
+
+40
+00:01:15,524 --> 00:01:16,974
+or less looks like that. This is more or
+应该是这个样子
+
+41
+00:01:17,094 --> 00:01:19,854
+less a flat, constant straight line.
+近似是一条平滑的直线
+
+42
+00:01:20,000 --> 00:01:21,910
+And so this hypothesis has high
+因此这个假设处于高偏差
+
+43
+00:01:22,250 --> 00:01:24,220
+bias and a value underfits this data set.
+对数据集欠拟合
+
+44
+00:01:24,560 --> 00:01:26,110
+So the horizontal straight
+因此这条水平直线
+
+45
+00:01:26,430 --> 00:01:27,400
+line is just not a very
+对这个数据集来讲
+
+46
+00:01:27,530 --> 00:01:29,690
+good model for this data set.
+不是一个好的假设模型
+
+47
+00:01:30,290 --> 00:01:31,460
+At the other extreme beam is if we have
+与之对应的另一种情况是
+
+48
+00:01:31,840 --> 00:01:33,150
+a very small value of
+如果我们的lambda值很小
+
+49
+00:01:33,440 --> 00:01:34,900
+lambda, such as if lambda
+比如说lambda的值
+
+50
+00:01:35,300 --> 00:01:37,220
+were equal to 0.
+等于0的时候
+
+51
+00:01:37,310 --> 00:01:38,530
+In that case, given that we're
+在这种情况下
+
+52
+00:01:38,670 --> 00:01:39,830
+fitting a high order polynomial,
+如果我们要拟合一个高阶多项式的话
+
+53
+00:01:39,980 --> 00:01:41,280
+this is a
+那么此时我们通常会处于
+
+54
+00:01:41,530 --> 00:01:43,180
+usual overfitting setting.
+过拟合的情况
+
+55
+00:01:44,340 --> 00:01:45,580
+In that case, given that we're
+在这种情况时
+
+56
+00:01:45,780 --> 00:01:46,830
+fitting a high order polynomial,
+在拟合一个高阶多项式时
+
+57
+00:01:47,760 --> 00:01:49,640
+basically without regularization or with
+如果没有进行正则化
+
+58
+00:01:49,820 --> 00:01:51,760
+very minimal regularization, we end
+或者正则化程度很微小的话
+
+59
+00:01:51,940 --> 00:01:53,770
+up with our usual high variance, overfitting
+我们通常会得到高方差和过拟合的结果
+
+60
+00:01:54,400 --> 00:01:55,490
+setting, because basically if lambda is
+因为通常来说
+
+61
+00:01:56,220 --> 00:01:57,240
+equal to zero, we are just
+lambda的值等于0
+
+62
+00:01:57,380 --> 00:01:59,900
+fitting with our regularization so
+相当于没有正则化项
+
+63
+00:02:00,030 --> 00:02:06,050
+that overfits the hypothesis
+因此是过拟合假设
+
+64
+00:02:07,290 --> 00:02:08,160
+and is only if we have some
+只有当我们取一个中间大小的
+
+65
+00:02:08,320 --> 00:02:10,310
+intermediate value of lambda that is neither too large nor too small that we end up with parameters theta
+既不大也不小的lambda值时
+
+66
+00:02:10,810 --> 00:02:11,970
+that we end up that give us a reasonable
+我们才会得到一组合理的
+
+67
+00:02:12,360 --> 00:02:13,640
+fit to this data.
+对数据刚好拟合的theta参数值
+
+68
+00:02:14,000 --> 00:02:14,920
+So how can we automatically
+那么我们应该怎样
+
+69
+00:02:15,720 --> 00:02:17,190
+choose a good value
+自动地选择出一个最合适的
+
+70
+00:02:17,690 --> 00:02:19,200
+for the regularization parameter lambda?
+正则化参数lambda呢
+
+71
+00:02:20,210 --> 00:02:22,480
+Just to reiterate, here is our model and here is our learning algorithm subjective.
+重申一下 我们的模型和学习参数是这样的
+
+72
+00:02:24,780 --> 00:02:27,690
+For the setting where we're using regularization, let me define
+让我们假设在使用正则化的情形中
+
+73
+00:02:28,520 --> 00:02:30,650
+j train of theta to be something different
+定义Jtrain(θ)为另一种不同的形式
+
+74
+00:02:31,520 --> 00:02:33,480
+to be the optimization objective
+目标同样是最优化
+
+75
+00:02:34,280 --> 00:02:35,910
+but without the regularization term.
+但不使用正则化项
+
+76
+00:02:36,650 --> 00:02:38,510
+Previously, in earlier video
+在先前的授课视频中
+
+77
+00:02:38,860 --> 00:02:39,780
+when we are not using
+当我们没有使用正则化时
+
+78
+00:02:40,150 --> 00:02:41,910
+regularization, I define j train of theta to
+我们定义的Jtrain(θ)
+
+79
+00:02:42,760 --> 00:02:45,890
+be the same as j of theta as the cost function but
+就是代价函数J(θ)
+
+80
+00:02:46,140 --> 00:02:48,550
+when we are using regularization with this extra lambda term
+但当我们使用正则化多出这个lambda项时
+
+81
+00:02:49,590 --> 00:02:51,950
+we're going to
+我们就将训练集误差
+
+82
+00:02:52,190 --> 00:02:53,340
+define j train my training set error,
+也就是Jtrain 定义为
+
+83
+00:02:53,610 --> 00:02:54,720
+to be just my sum of
+训练集数据预测的平方误差的求和
+
+84
+00:02:54,940 --> 00:02:56,180
+squared errors on the training
+或者准确的说
+
+85
+00:02:56,520 --> 00:02:58,010
+set, or my average squared error
+是训练集的平均误差平方和
+
+86
+00:02:58,230 --> 00:03:01,170
+on the training set without taking into account that regularization chart.
+但不考虑正则化项
+
+87
+00:03:02,050 --> 00:03:03,360
+And similarly, I'm then
+与此类似
+
+88
+00:03:03,520 --> 00:03:04,800
+also going to define the
+我们来定义交叉验证集误差
+
+89
+00:03:05,320 --> 00:03:07,280
+cross-validation set error when the
+以及测试集误差
+
+90
+00:03:07,380 --> 00:03:08,480
+test set error, as before
+和之前一样定义为
+
+91
+00:03:08,940 --> 00:03:10,830
+to be the average sum of squared errors
+对交叉验证集和测试集进行预测
+
+92
+00:03:11,430 --> 00:03:13,100
+on the cross-validation and the test sets.
+取平均误差平方和的形式
+
+93
+00:03:14,350 --> 00:03:16,380
+So just to summarize,
+总结一下
+
+94
+00:03:16,930 --> 00:03:18,170
+my definitions of J train and
+我们对于训练误差Jtrain
+
+95
+00:03:18,600 --> 00:03:19,520
+J C V and J
+交叉验证集误差Jcv
+
+96
+00:03:19,730 --> 00:03:20,930
+Test are just the
+和测试集误差Jtest的定义
+
+97
+00:03:21,160 --> 00:03:22,120
+average squared error, or one
+都是平均误差平方和
+
+98
+00:03:22,520 --> 00:03:23,720
+half of the average
+或者准确地说
+
+99
+00:03:24,100 --> 00:03:25,710
+squared error on my training validation and
+是训练集 验证集和测试集进行预测
+
+100
+00:03:25,950 --> 00:03:27,880
+test sets without the extra
+在不使用正则化项时
+
+101
+00:03:29,420 --> 00:03:30,400
+regularization chart.
+平均误差平方和的二分之一
+
+102
+00:03:30,470 --> 00:03:32,610
+So, this is how we can automatically choose the regularization parameter lambda.
+这就是我们自动选取正则化参数lambda的方法
+
+103
+00:03:35,060 --> 00:03:36,710
+What I usually do is may
+通常我的做法是
+
+104
+00:03:36,830 --> 00:03:39,150
+be have some range of values of lambda I want to try it.
+选取一系列我想要尝试的lambda值
+
+105
+00:03:39,330 --> 00:03:40,850
+So I might be
+因此首先我可能考虑
+
+106
+00:03:40,990 --> 00:03:42,160
+considering not using regularization,
+不使用正则化的情形
+
+107
+00:03:43,540 --> 00:03:44,670
+or here are a few values I might try.
+以及一系列我可能会试的值
+
+108
+00:03:44,890 --> 00:03:45,850
+I might be considering along because
+比如说我可能从0.01 0.02
+
+109
+00:03:46,320 --> 00:03:48,500
+of O1, O2 from O4 and so on.
+0.04开始 一直试下去
+
+110
+00:03:49,090 --> 00:03:50,510
+And you know, I usually step these
+通常来讲
+
+111
+00:03:50,770 --> 00:03:53,220
+up in multiples of
+我一般将步长设为2倍速度增长
+
+112
+00:03:53,420 --> 00:03:55,960
+two until some maybe larger value
+一直到一个比较大的值
+
+113
+00:03:56,070 --> 00:03:57,250
+this in multiples of two you
+在本例中以两倍步长递增的话
+
+114
+00:03:57,480 --> 00:03:59,000
+I actually end up with 10.24;
+我们最终取值10.24
+
+115
+00:03:59,270 --> 00:04:01,810
+it's ten exactly, but you
+实际上我们取的是10
+
+116
+00:04:01,980 --> 00:04:03,240
+know, this is close enough and
+但取10已经非常接近了
+
+117
+00:04:03,860 --> 00:04:05,320
+the 35 decimal
+因为小数点后的24
+
+118
+00:04:05,610 --> 00:04:07,830
+places won't affect your result that much.
+对最终的结果不会有太大影响
+
+119
+00:04:08,140 --> 00:04:09,920
+So, this gives me, maybe
+因此 这样我就得到了
+
+120
+00:04:10,640 --> 00:04:12,470
+twelve different models, that I'm
+12个不同的正则化参数lambda
+
+121
+00:04:12,610 --> 00:04:14,350
+trying to select amongst, corresponding to
+对应的12个不同的模型
+
+122
+00:04:14,540 --> 00:04:16,210
+12 different values of the
+我将从这个12个模型中
+
+123
+00:04:16,520 --> 00:04:20,900
+regularization parameter lambda and of course, you can also go
+选出一个最合适的模型,当然了,你也可以试
+
+124
+00:04:20,910 --> 00:04:23,560
+to values less than 0.01
+小于0.01的值
+
+125
+00:04:23,620 --> 00:04:25,110
+or values larger than 10,
+或者大于10的值
+
+126
+00:04:25,210 --> 00:04:27,380
+but I've just truncated it here for convenience.
+但在这里我就不讨论这些情况了
+
+127
+00:04:27,710 --> 00:04:28,570
+Given each of these 12
+得到这12组模型后
+
+128
+00:04:28,900 --> 00:04:30,050
+models, what we can
+接下来
+
+129
+00:04:30,280 --> 00:04:31,080
+do is then the following:
+我们要做的事情是
+
+130
+00:04:32,110 --> 00:04:33,410
+we take this first
+选用第一个模型
+
+131
+00:04:33,790 --> 00:04:35,160
+model with lambda equals 0,
+也就是lambda等于0
+
+132
+00:04:35,360 --> 00:04:37,420
+and minimize my cos
+然后最小化我们的
+
+133
+00:04:37,700 --> 00:04:39,860
+function j of theta and this
+代价函数J(θ)
+
+134
+00:04:40,090 --> 00:04:41,620
+would give me some parameter vector theta
+这样我们就得到了某个参数向量theta
+
+135
+00:04:42,160 --> 00:04:43,310
+and similar to the earlier video,
+与之前视频的做法类似
+
+136
+00:04:43,510 --> 00:04:45,370
+let me just denote this as
+我使用theta上标(1)
+
+137
+00:04:46,860 --> 00:04:47,960
+theta superscript 1.
+来表示第一个参数向量theta
+
+138
+00:04:49,890 --> 00:04:50,750
+And then I can take my
+然后我再取第二个模型
+
+139
+00:04:50,930 --> 00:04:52,520
+second model, with lambda
+也就是
+
+140
+00:04:53,000 --> 00:04:54,530
+set to 0.01 and
+lambda等于0.01的模型
+
+141
+00:04:55,160 --> 00:04:57,120
+minimize my cos function, now
+最小化代价方差
+
+142
+00:04:57,250 --> 00:04:58,870
+using lambda equals 0.01
+当然 现在lambda等于0.01
+
+143
+00:04:58,970 --> 00:05:00,080
+of course, to get some
+那么会得到一个
+
+144
+00:05:00,270 --> 00:05:01,290
+different parameter vector theta,
+完全不同的参数向量theta
+
+145
+00:05:01,840 --> 00:05:02,730
+we need to know that theta 2,
+用theta(2)来表示
+
+146
+00:05:02,860 --> 00:05:04,000
+and for that I
+同理 接下来
+
+147
+00:05:04,240 --> 00:05:05,520
+end up with theta 3
+我会得到theta(3)
+
+148
+00:05:05,720 --> 00:05:06,590
+so that this is correct for my
+这对应于我的第三个模型
+
+149
+00:05:06,660 --> 00:05:08,400
+third model, and so on,
+以此类推
+
+150
+00:05:08,930 --> 00:05:10,290
+until for for my final model
+一直到最后一个
+
+151
+00:05:10,760 --> 00:05:13,360
+with lambda set to 10,
+lambda等于10或10.24的模型
+
+152
+00:05:13,360 --> 00:05:16,460
+or 10.24, or I end up with this theta 12.
+对应theta(12)
+
+153
+00:05:17,650 --> 00:05:19,120
+Next I can take
+接下来我就可以用
+
+154
+00:05:19,360 --> 00:05:21,020
+all of these hypotheses, all of
+所有这些假设
+
+155
+00:05:21,100 --> 00:05:23,160
+these parameters, and use
+所有这些参数
+
+156
+00:05:23,470 --> 00:05:25,510
+my cross-validation set to evaluate them.
+以及交叉验证集来评价它们了
+
+157
+00:05:26,250 --> 00:05:27,750
+So I can look at my
+因此我可以从第一个模型
+
+158
+00:05:28,430 --> 00:05:29,730
+first model, my second
+第二个模型等等开始
+
+159
+00:05:30,080 --> 00:05:30,680
+model, fits with these different values
+对每一个不同的正则化参数lambda
+
+160
+00:05:30,710 --> 00:05:41,600
+of the regularization parameter and
+进行拟合
+
+161
+00:05:41,750 --> 00:05:42,630
+evaluate them on my cross-validation
+然后用交叉验证集来评价每一个模型
+
+162
+00:05:42,880 --> 00:05:43,460
+set - basically measure the average squared error of each of these parameter
+也即测出每一个参数thata在交叉验证集上的
+
+163
+00:05:43,550 --> 00:05:45,220
+vectors theta on my cross-validation set.
+平均误差平方和
+
+164
+00:05:45,560 --> 00:05:47,110
+And I would then pick whichever one
+然后我就选取这12个模型中
+
+165
+00:05:47,270 --> 00:05:48,710
+of these 12 models gives me
+交叉验证集误差最小的
+
+166
+00:05:48,880 --> 00:05:51,360
+the lowest error on the cross-validation set.
+那个模型作为最终选择
+
+167
+00:05:52,360 --> 00:05:53,100
+And let's say, for the sake
+对于本例而言
+
+168
+00:05:53,380 --> 00:05:54,970
+of this example, that I
+假如说
+
+169
+00:05:55,260 --> 00:05:56,880
+end up picking theta 5,
+最终我选择了theta(5)
+
+170
+00:05:56,960 --> 00:05:59,570
+the fifth order polynomial, because
+也就是五次多项式
+
+171
+00:05:59,960 --> 00:06:02,550
+that has the Noah's cross-validation error.
+因为此时的交叉验证集误差最小
+
+172
+00:06:03,320 --> 00:06:05,530
+Having done that, finally, what
+做完这些 最后
+
+173
+00:06:05,700 --> 00:06:06,530
+I would do if I want
+如果我想看看该模型
+
+174
+00:06:06,800 --> 00:06:07,940
+to report a test set error
+在测试集上的表现
+
+175
+00:06:08,680 --> 00:06:10,000
+is to take the parameter theta
+我可以用经过学习
+
+176
+00:06:10,310 --> 00:06:12,200
+5 that I've
+得到的模型theta(5)
+
+177
+00:06:12,350 --> 00:06:13,860
+selected and look at
+来测出它对测试集的
+
+178
+00:06:13,980 --> 00:06:16,020
+how well it does on my test set.
+预测效果如何
+
+179
+00:06:16,150 --> 00:06:17,620
+And once again here is as
+再次重申一下
+
+180
+00:06:17,790 --> 00:06:18,980
+if we fit this parameter
+这里我们依然是用
+
+181
+00:06:19,540 --> 00:06:21,750
+theta to my cross-validation
+交叉验证集来拟合模型
+
+182
+00:06:22,580 --> 00:06:23,770
+set, which is why I
+这也是为什么我之前
+
+183
+00:06:23,970 --> 00:06:25,250
+am saving aside a separate
+预留了一部分数据
+
+184
+00:06:25,730 --> 00:06:27,120
+test set that I
+作为测试集的原因
+
+185
+00:06:27,170 --> 00:06:28,370
+am going to use to get
+这样我就可以用这部分测试集
+
+186
+00:06:28,660 --> 00:06:29,780
+a better estimate of how
+比较准确地估计出
+
+187
+00:06:30,040 --> 00:06:31,250
+well my a parameter vector
+我的参数向量theta
+
+188
+00:06:31,500 --> 00:06:33,000
+theta will generalize to previously unseen examples.
+对于新样本的泛化能力
+
+189
+00:06:35,430 --> 00:06:37,180
+So that's model selection applied
+这就是模型选择在选取
+
+190
+00:06:37,570 --> 00:06:39,620
+to selecting the regularization parameter
+正则化参数lambda时的应用
+
+191
+00:06:40,570 --> 00:06:41,660
+lambda. The last thing
+在这段视频中
+
+192
+00:06:41,800 --> 00:06:42,830
+I'd like to do in this
+我想讲的最后一个问题是
+
+193
+00:06:43,080 --> 00:06:44,200
+video, is get a
+当我们改变
+
+194
+00:06:44,280 --> 00:06:46,390
+better understanding of how
+正则化参数lambda的值时
+
+195
+00:06:46,960 --> 00:06:48,650
+cross-validation and training error
+交叉验证集误差
+
+196
+00:06:48,990 --> 00:06:51,000
+vary as we as
+和训练集误差会随之
+
+197
+00:06:51,840 --> 00:06:54,140
+we vary the regularization parameter lambda.
+发生怎样的变化
+
+198
+00:06:54,770 --> 00:06:56,370
+And so just a reminder, that
+我想提醒一下
+
+199
+00:06:56,670 --> 00:06:58,070
+was our original cosine function j of
+我们最初的代价函数J(θ)
+
+200
+00:06:58,150 --> 00:06:59,540
+theta, but for this
+原来是这样的形式
+
+201
+00:06:59,710 --> 00:07:00,660
+purpose we're going to define
+但在这里我们把训练误差
+
+202
+00:07:01,760 --> 00:07:03,140
+training error without using
+定义为不包括正则化项
+
+203
+00:07:03,150 --> 00:07:05,090
+the regularization parameter, and cross-validation
+交叉验证集误差
+
+204
+00:07:05,770 --> 00:07:07,060
+error without using the
+也定义为不包括
+
+205
+00:07:07,270 --> 00:07:09,720
+regularization parameter and what I'd like
+正则化项
+
+206
+00:07:10,120 --> 00:07:11,680
+to do is plot this J train
+我要做的是
+
+207
+00:07:12,660 --> 00:07:15,330
+and plot this Jcv, meaning just
+绘制出Jtrain和Jcv的曲线
+
+208
+00:07:15,610 --> 00:07:16,730
+how well does my
+随着我增大正则化项参数
+
+209
+00:07:16,830 --> 00:07:19,160
+hypothesis do for on
+lambda的值
+
+210
+00:07:19,490 --> 00:07:20,670
+the training set and how well
+看看我的假设
+
+211
+00:07:20,830 --> 00:07:22,190
+does my hypothesis do on the
+在训练集上的表现如何变化
+
+212
+00:07:22,250 --> 00:07:24,000
+cross-validation set as I
+以及在交叉验证集上
+
+213
+00:07:24,010 --> 00:07:25,920
+vary my regularization parameter
+表现如何变化
+
+214
+00:07:26,390 --> 00:07:29,860
+lambda so as
+就像我们之前看到的
+
+215
+00:07:29,920 --> 00:07:32,340
+we saw earlier, if lambda
+如果正则化项参数
+
+216
+00:07:32,670 --> 00:07:34,330
+is small, then we're
+lambda的值很小
+
+217
+00:07:34,520 --> 00:07:36,920
+not using much regularization and
+那也就是说我们几乎没有使用正则化
+
+218
+00:07:37,370 --> 00:07:39,460
+we run a larger risk of overfitting.
+因此我们有很大可能处于过拟合
+
+219
+00:07:39,550 --> 00:07:41,280
+Where as if lambda is
+而如果lambda的值
+
+220
+00:07:41,530 --> 00:07:42,690
+large, that is if we
+取的很大的时候
+
+221
+00:07:42,910 --> 00:07:43,810
+were on the right part
+也就是说取值在
+
+222
+00:07:44,790 --> 00:07:47,000
+of this horizontal axis, then
+横坐标的右端
+
+223
+00:07:47,290 --> 00:07:48,370
+with a large value of lambda
+那么由于lambda的值很大
+
+224
+00:07:49,160 --> 00:07:51,660
+we run the high risk of having a bias problem.
+我们很有可能处于高偏差的问题
+
+225
+00:07:52,640 --> 00:07:54,250
+So if you plot J train
+所以 如果你画出
+
+226
+00:07:54,880 --> 00:07:56,500
+and Jcv, what you
+Jtrain和Jcv的曲线
+
+227
+00:07:56,580 --> 00:07:58,330
+find is that for small
+你就会发现
+
+228
+00:07:58,700 --> 00:08:00,770
+values of lambda you can
+当lambda的值取得很小时
+
+229
+00:08:01,610 --> 00:08:02,640
+fit the training set relatively
+对训练集的拟合相对较好
+
+230
+00:08:03,240 --> 00:08:04,290
+well because you're not regularizing.
+因为没有使用正则化
+
+231
+00:08:05,200 --> 00:08:06,490
+So, for small values of
+因此 对于lambda值很小的情况
+
+232
+00:08:06,590 --> 00:08:08,350
+lambda, the regularization term basically
+正则化项基本可以忽略
+
+233
+00:08:08,560 --> 00:08:09,700
+goes away and you're just
+你只需要对平方误差
+
+234
+00:08:10,020 --> 00:08:12,060
+minimizing pretty much your squared error.
+做最小化处理即可
+
+235
+00:08:12,470 --> 00:08:14,090
+So when lambda is small, you
+所以当lambda值很小时
+
+236
+00:08:14,230 --> 00:08:15,180
+end up with a small value
+你最终能得到一个
+
+237
+00:08:15,770 --> 00:08:17,390
+for J train, whereas if
+值很小的Jtrain
+
+238
+00:08:17,500 --> 00:08:18,780
+lambda is large, then you
+而如果lambda的值很大时
+
+239
+00:08:19,340 --> 00:08:22,080
+have a high bias problem and you might not fit your training set so well.
+你将处于高偏差问题 不能对训练集很好地拟合
+
+240
+00:08:22,240 --> 00:08:23,400
+So you end up with a value up there.
+因此你的误差值可能位于这个位置
+
+241
+00:08:24,150 --> 00:08:28,400
+So, J train of
+因此 当lambda增大时
+
+242
+00:08:28,530 --> 00:08:29,730
+theta will tend to
+训练集误差Jtrain的值
+
+243
+00:08:29,920 --> 00:08:31,890
+increase when lambda increases
+会趋于上升
+
+244
+00:08:32,650 --> 00:08:34,320
+because a large value of
+因为lambda的值比较大时
+
+245
+00:08:34,520 --> 00:08:35,450
+lambda corresponds a high bias
+对应着高偏差的问题
+
+246
+00:08:36,000 --> 00:08:37,000
+where you might not even fit your
+此时你连训练集都不能很好地拟合
+
+247
+00:08:37,190 --> 00:08:38,760
+training set well, whereas a
+反过来 当lambda的值
+
+248
+00:08:38,890 --> 00:08:40,980
+small value of lambda corresponds to,
+取得很小的时候
+
+249
+00:08:41,250 --> 00:08:43,100
+if you can you know freely
+你的数据能随意地与高次多项式
+
+250
+00:08:43,450 --> 00:08:46,290
+fit to very high degree polynomials, your data, let's say.
+很好地拟合
+
+251
+00:08:46,520 --> 00:08:50,460
+As for the cross-validation error, we end up with a figure like this.
+交叉验证集误差的曲线是这样的
+
+252
+00:08:51,680 --> 00:08:53,200
+Where, over here on
+在曲线的右端
+
+253
+00:08:53,530 --> 00:08:55,060
+the right, if we
+当lambda的值
+
+254
+00:08:55,130 --> 00:08:56,070
+have a large value of lambda,
+取得很大时
+
+255
+00:08:57,040 --> 00:08:58,200
+we may end up underfitting.
+我们会处于欠拟合问题
+
+256
+00:08:58,900 --> 00:09:00,280
+And so, this is the bias regime
+也对应着偏差问题
+
+257
+00:09:01,950 --> 00:09:04,750
+whereas and cross
+那么此时交叉验证集误差
+
+258
+00:09:05,030 --> 00:09:06,680
+validation error will be
+将会很大
+
+259
+00:09:06,920 --> 00:09:08,060
+high and let me just leave
+我写在这里
+
+260
+00:09:08,250 --> 00:09:10,760
+all that. So, that's Jcv of theta because with
+这是交叉验证集误差Jcv
+
+261
+00:09:11,270 --> 00:09:12,440
+high bias we won't be fitting.
+由于高偏差的原因我们不能很好地拟合
+
+262
+00:09:13,430 --> 00:09:15,580
+We won't be doing well on the cross-validation set.
+我们的假设不能在交叉验证集上表现地比较好
+
+263
+00:09:17,050 --> 00:09:20,000
+Whereas here on the left, this is the high-variance regime.
+而曲线的左端对应的是高方差问题
+
+264
+00:09:21,120 --> 00:09:22,620
+Where if we have two smaller
+此时我们的lambda值
+
+265
+00:09:23,020 --> 00:09:24,910
+value of then we
+取得很小很小
+
+266
+00:09:25,070 --> 00:09:26,190
+may be overfitting the data
+因此我们会对数据过度拟合
+
+267
+00:09:26,870 --> 00:09:28,140
+and so by over fitting the
+所以由于过拟合的原因
+
+268
+00:09:28,230 --> 00:09:30,320
+data then it a cross validation error
+交叉验证集误差Jcv
+
+269
+00:09:30,710 --> 00:09:31,610
+will also be high.
+结果也会很大
+
+270
+00:09:32,700 --> 00:09:34,380
+And so, this is what the
+好的 这就是
+
+271
+00:09:35,620 --> 00:09:37,270
+cross-validation error and what
+当我们改变正则化参数
+
+272
+00:09:37,510 --> 00:09:38,860
+the training error may look
+lambda的值时
+
+273
+00:09:39,130 --> 00:09:40,410
+like on a training set
+交叉验证集误差
+
+274
+00:09:40,820 --> 00:09:43,270
+as we vary the parameter
+和训练集误差
+
+275
+00:09:43,950 --> 00:09:45,920
+lambda, as we vary the regularization parameter lambda.
+随之发生的变化
+
+276
+00:09:46,110 --> 00:09:47,220
+And so, once again, it will
+当然 在中间取的某个
+
+277
+00:09:47,430 --> 00:09:49,100
+often be some intermediate value
+lambda的值
+
+278
+00:09:49,790 --> 00:09:52,220
+of lambda that you know, subsequent just right
+表现得刚好合适
+
+279
+00:09:52,720 --> 00:09:53,990
+or that works best in
+这种情况下表现最好
+
+280
+00:09:54,120 --> 00:09:55,470
+terms of having a small
+交叉验证集误差
+
+281
+00:09:55,570 --> 00:09:58,510
+cross-validation error or a small test set error.
+或者测试集误差都很小
+
+282
+00:09:58,520 --> 00:09:59,580
+And whereas the curves I've drawn
+当然由于我在这里画的图
+
+283
+00:09:59,900 --> 00:10:02,230
+here are somewhat cartoonish and somewhat idealized.
+显得太卡通 也太理想化了
+
+284
+00:10:03,250 --> 00:10:04,270
+So on a real data set
+对于真实的数据
+
+285
+00:10:04,810 --> 00:10:06,000
+the pros you get may
+你得到的曲线可能
+
+286
+00:10:06,110 --> 00:10:07,070
+end up looking a little bit more
+比这看起来更凌乱
+
+287
+00:10:07,290 --> 00:10:09,180
+messy and just a little bit more noisy than this.
+会有很多的噪声
+
+288
+00:10:09,800 --> 00:10:10,900
+For some data sets you will
+对某个实际的数据集
+
+289
+00:10:11,000 --> 00:10:12,270
+really see these poor
+你或多或少能看出
+
+290
+00:10:12,560 --> 00:10:14,000
+source of trends and
+像这样的一个趋势
+
+291
+00:10:14,270 --> 00:10:15,160
+by looking at the plot
+通过绘出这条曲线
+
+292
+00:10:15,720 --> 00:10:16,750
+of the whole or cross validation
+通过交叉验证集误差的变化趋势
+
+293
+00:10:17,640 --> 00:10:19,280
+error, you can either
+你可以用自己选择出
+
+294
+00:10:19,420 --> 00:10:21,190
+manually, automatically try to
+或者编写程序自动得出
+
+295
+00:10:21,500 --> 00:10:22,920
+select a point that minimizes
+能使交叉验证集误差
+
+296
+00:10:23,370 --> 00:10:26,410
+the cross-validation error and
+最小的那个点
+
+297
+00:10:26,700 --> 00:10:28,420
+select the value of lambda corresponding
+然后选出那个与之对应的
+
+298
+00:10:29,100 --> 00:10:30,600
+to low cross-validation error.
+参数lambda的值
+
+299
+00:10:31,380 --> 00:10:32,610
+When I'm trying to pick the
+当我在尝试为学习算法
+
+300
+00:10:32,740 --> 00:10:34,690
+regularization parameter lambda
+选择正则化参数
+
+301
+00:10:35,020 --> 00:10:37,120
+for a learning algorithm, often I
+lambda的时候
+
+302
+00:10:37,240 --> 00:10:38,340
+find that plotting a figure
+我通常都会得出
+
+303
+00:10:38,620 --> 00:10:40,290
+like this one showed here, helps
+类似这个图的结果
+
+304
+00:10:40,570 --> 00:10:42,340
+me understand better what's going
+帮助我更好地理解各种情况
+
+305
+00:10:42,600 --> 00:10:44,140
+on and helps me verify that
+同时也帮助我确认
+
+306
+00:10:44,700 --> 00:10:45,960
+I am indeed picking a good
+我选择的正则化参数值
+
+307
+00:10:46,140 --> 00:10:47,490
+value for the regularization parameter
+到底好不好
+
+308
+00:10:48,340 --> 00:10:50,140
+lambda. So hopefully that
+希望这节课的内容
+
+309
+00:10:50,340 --> 00:10:51,980
+gives you more insight into regularization
+让你更深入地理解了正则化
+
+310
+00:10:53,470 --> 00:10:54,710
+and it's effects on the bias
+以及它对学习算法的
+
+311
+00:10:55,220 --> 00:10:56,290
+and variance of the learning algorithm.
+偏差和方差的影响
+
+312
+00:10:57,790 --> 00:10:59,330
+By know you've seen bias and
+到目前为止你已经从不同角度
+
+313
+00:10:59,490 --> 00:11:01,230
+variance from a lot of different perspectives.
+见识了方差和偏差问题
+
+314
+00:11:02,000 --> 00:11:03,290
+And what I'd like to do
+在下一节视频中
+
+315
+00:11:03,520 --> 00:11:04,820
+in the next video is take
+我要做的是
+
+316
+00:11:05,050 --> 00:11:05,930
+a lot of the insights
+基于我们已经浏览过的
+
+317
+00:11:06,100 --> 00:11:07,890
+that we've gone through and build
+所有这些概念
+
+318
+00:11:08,140 --> 00:11:09,030
+on them to put together
+将它们结合起来
+
+319
+00:11:09,740 --> 00:11:11,590
+a diagnostic that's called learning
+建立我们的诊断法
+
+320
+00:11:11,870 --> 00:11:12,920
+curves, which is a
+也称为学习曲线
+
+321
+00:11:12,970 --> 00:11:14,120
+tool that I often use
+这种方法通常被用来
+
+322
+00:11:14,540 --> 00:11:15,740
+to try to diagnose if a
+诊断一个学习算法
+
+323
+00:11:16,010 --> 00:11:17,450
+learning algorithm may be suffering
+到底是处于偏差问题
+
+324
+00:11:17,860 --> 00:11:19,150
+from a bias problem or a
+还是方差问题
+
+325
+00:11:19,380 --> 00:11:20,770
+variance problem or a little bit of both.
+还是两者都有。
+
diff --git a/srt/10 - 6 - Learning Curves (12 min).srt b/srt/10 - 6 - Learning Curves (12 min).srt
new file mode 100644
index 00000000..96b28c61
--- /dev/null
+++ b/srt/10 - 6 - Learning Curves (12 min).srt
@@ -0,0 +1,1770 @@
+1
+00:00:00,090 --> 00:00:02,040
+In this video, I'd like to tell you about learning curves.
+本节课我们介绍学习曲线
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:03,310 --> 00:00:05,850
+Learning curves is often a very useful thing to plot.
+绘制学习曲线非常有用
+
+3
+00:00:06,710 --> 00:00:08,170
+If either you wanted to sanity check
+也许你想检查你的学习算法
+
+4
+00:00:08,430 --> 00:00:09,590
+that your algorithm is working correctly,
+运行是否一切正常
+
+5
+00:00:10,400 --> 00:00:12,730
+or if you want to improve the performance of the algorithm.
+或者你希望改进算法的表现或效果
+
+6
+00:00:13,950 --> 00:00:15,200
+And learning curves is a
+那么学习曲线
+
+7
+00:00:15,310 --> 00:00:16,410
+tool that I actually use
+就是一种很好的工具
+
+8
+00:00:16,820 --> 00:00:17,920
+very often to try to
+我经常使用学习曲线
+
+9
+00:00:18,290 --> 00:00:20,030
+diagnose if a physical learning algorithm may be
+来判断某一个学习算法
+
+10
+00:00:20,180 --> 00:00:23,220
+suffering from bias, sort of variance problem or a bit of both.
+是否处于偏差 方差问题 或是二者皆有
+
+11
+00:00:27,170 --> 00:00:28,070
+Here's what a learning curve is.
+下面我们就来介绍学习曲线
+
+12
+00:00:28,830 --> 00:00:30,550
+To plot a learning curve, what
+为了绘制一条学习曲线
+
+13
+00:00:30,700 --> 00:00:31,760
+I usually do is plot
+我通常先绘制出Jtrain
+
+14
+00:00:32,210 --> 00:00:33,950
+j train which is, say,
+也就是训练集数据的
+
+15
+00:00:35,030 --> 00:00:36,050
+average squared error on my training
+平均误差平方和
+
+16
+00:00:36,440 --> 00:00:39,090
+set or Jcv which is
+或者Jcv 也即交叉验证集数据的
+
+17
+00:00:39,340 --> 00:00:41,130
+the average squared error on my cross validation set.
+平均误差平方和
+
+18
+00:00:41,590 --> 00:00:42,900
+And I'm going to plot
+我要将其绘制成一个
+
+19
+00:00:43,140 --> 00:00:44,160
+that as a function
+关于参数m的函数
+
+20
+00:00:44,500 --> 00:00:46,380
+of m, that is as a function
+也就是一个关于训练集
+
+21
+00:00:47,230 --> 00:00:51,260
+of the number of training examples I have.
+样本总数的函数
+
+22
+00:00:51,950 --> 00:00:53,420
+And so m is usually a constant like maybe I just have, you know, a 100
+所以m一般都是一个常数 比如m等于100
+
+23
+00:00:53,650 --> 00:00:55,220
+training examples but what I'm
+表示100组训练样本
+
+24
+00:00:55,330 --> 00:00:57,670
+going to do is artificially with
+但我要自己取一些m的值
+
+25
+00:00:57,860 --> 00:00:59,280
+use my training set exercise. So, I
+也就是说我要自行对m的取值
+
+26
+00:00:59,500 --> 00:01:01,460
+deliberately limit myself to using only,
+做一点限制
+
+27
+00:01:01,840 --> 00:01:03,440
+say, 10 or 20 or
+比如说我取10 20或者
+
+28
+00:01:03,660 --> 00:01:06,040
+30 or 40 training examples and
+30 40组训练集
+
+29
+00:01:06,170 --> 00:01:07,610
+plot what the training error is and
+然后绘出训练集误差
+
+30
+00:01:07,740 --> 00:01:09,640
+what the cross validation is for this
+以及交叉验证集误差
+
+31
+00:01:10,040 --> 00:01:12,260
+smallest training set exercises. So
+好的 那么我们来看看
+
+32
+00:01:12,620 --> 00:01:14,090
+let's see what these plots may look
+这条曲线绘制出来是什么样子
+
+33
+00:01:14,270 --> 00:01:15,530
+like. Suppose I have only
+假设我只有一组训练样本
+
+34
+00:01:15,730 --> 00:01:17,210
+one training example like that
+也即m=1
+
+35
+00:01:17,390 --> 00:01:18,450
+shown in this this first example
+正如第一幅图中所示
+
+36
+00:01:18,860 --> 00:01:19,970
+here and let's say I'm fitting a quadratic function. Well, I
+并且假设使用二次函数来拟合模型
+
+37
+00:01:22,470 --> 00:01:24,490
+have only one training example. I'm
+那么由于我只有一个训练样本
+
+38
+00:01:25,040 --> 00:01:26,100
+going to be able to fit it perfectly
+拟合的结果很明显会很好
+
+39
+00:01:26,650 --> 00:01:28,590
+right? You know, just fit the quadratic function. I'm
+是吧 用二次函数来拟合
+
+40
+00:01:28,760 --> 00:01:30,000
+going to have 0
+对这一个训练样本拟合
+
+41
+00:01:30,150 --> 00:01:32,240
+error on the one training example. If I
+其误差一定为0
+
+42
+00:01:32,570 --> 00:01:34,170
+have two training examples. Well the quadratic function can also fit that very well. So,
+如果有两组训练样本 二次函数也能很好地拟合
+
+43
+00:01:37,050 --> 00:01:38,550
+even if I am using regularization,
+即使是使用正则化
+
+44
+00:01:38,750 --> 00:01:40,220
+I can probably fit this quite well.
+拟合的结果也会很好
+
+45
+00:01:41,080 --> 00:01:41,970
+And if I am using no neural regularization,
+而如果不使用正则化的话
+
+46
+00:01:42,030 --> 00:01:45,200
+I'm going to fit this perfectly and
+那么拟合效果绝对棒极了
+
+47
+00:01:45,440 --> 00:01:46,400
+if I have three training examples
+如果我用三组训练样本的话
+
+48
+00:01:47,260 --> 00:01:48,380
+again. Yeah, I can fit a quadratic
+好吧 看起来依然能很好地
+
+49
+00:01:48,660 --> 00:01:51,320
+function perfectly so if
+用二次函数拟合
+
+50
+00:01:51,550 --> 00:01:52,590
+m equals 1 or m equals 2 or m equals 3,
+也就是说 当m等于1 m=2 或m=3时
+
+51
+00:01:54,850 --> 00:01:56,770
+my training error
+对训练集数据进行预测
+
+52
+00:01:57,350 --> 00:01:58,870
+on my training set is
+得到的训练集误差
+
+53
+00:01:59,110 --> 00:02:01,180
+going to be 0 assuming I'm
+都将等于0
+
+54
+00:02:01,220 --> 00:02:02,760
+not using regularization or it may
+这里假设我不使用正则化
+
+55
+00:02:03,150 --> 00:02:04,290
+slightly large in 0 if
+当然如果使用正则化
+
+56
+00:02:04,560 --> 00:02:06,400
+I'm using regularization and
+那么误差就稍大于0
+
+57
+00:02:06,500 --> 00:02:07,350
+by the way if I have
+顺便提醒一下
+
+58
+00:02:07,740 --> 00:02:08,980
+a large training set and I'm artificially
+如果我的训练集样本很大
+
+59
+00:02:09,940 --> 00:02:11,040
+restricting the size of my
+而我要人为地限制训练集
+
+60
+00:02:11,120 --> 00:02:13,080
+training set in order to J train.
+样本的容量
+
+61
+00:02:13,830 --> 00:02:14,770
+Here if I set
+比如说这里
+
+62
+00:02:15,110 --> 00:02:16,720
+M equals 3, say, and I
+我将m值设为3
+
+63
+00:02:17,040 --> 00:02:18,290
+train on only three examples,
+然后我仅用这三组样本进行训练
+
+64
+00:02:19,270 --> 00:02:21,030
+then, for this figure I
+然后对应到这个图中
+
+65
+00:02:21,110 --> 00:02:22,430
+am going to measure my training error
+我只看对这三组训练样本
+
+66
+00:02:22,830 --> 00:02:24,450
+only on the three examples that
+进行预测得到的训练误差
+
+67
+00:02:24,550 --> 00:02:25,580
+actually fit my data too
+也是和我模型拟合的三组样本
+
+68
+00:02:27,150 --> 00:02:28,130
+and so even I have to say
+所以即使我有100组训练样本
+
+69
+00:02:28,290 --> 00:02:31,160
+a 100 training examples but if I want to plot what my
+而我还是想绘制
+
+70
+00:02:31,430 --> 00:02:32,620
+training error is the m equals 3. What I'm going to do
+当m等于3时的训练误差
+
+71
+00:02:34,270 --> 00:02:35,200
+is to measure the
+那么我要关注的仍然是
+
+72
+00:02:35,340 --> 00:02:36,660
+training error on the
+对这三组训练样本进行预测的误差
+
+73
+00:02:36,750 --> 00:02:39,870
+three examples that I've actually fit to my hypothesis 2.
+同样 这三组样本也是我们用来拟合模型的三组样本
+
+74
+00:02:41,290 --> 00:02:42,900
+And not all the other examples that I have
+所有其他的样本
+
+75
+00:02:43,010 --> 00:02:44,940
+deliberately omitted from the training
+我都在训练过程中选择性忽略了
+
+76
+00:02:45,140 --> 00:02:46,750
+process. So just to summarize what we've
+好的 总结一下
+
+77
+00:02:46,960 --> 00:02:48,460
+seen is that if the training set
+我们现在已经看到
+
+78
+00:02:48,820 --> 00:02:50,560
+size is small then the
+当训练样本容量m很小的时候
+
+79
+00:02:50,630 --> 00:02:52,630
+training error is going to be small as well.
+训练误差也会很小
+
+80
+00:02:52,960 --> 00:02:53,900
+Because you know, we have a
+因为很显然
+
+81
+00:02:53,930 --> 00:02:55,150
+small training set is
+如果我们训练集很小
+
+82
+00:02:55,350 --> 00:02:56,790
+going to be very easy to
+那么很容易就能把
+
+83
+00:02:56,900 --> 00:02:58,080
+fit your training set
+训练集拟合到很好
+
+84
+00:02:58,720 --> 00:02:59,490
+very well may be even
+甚至拟合得天衣无缝
+
+85
+00:02:59,790 --> 00:03:02,970
+perfectly now say
+现在我们来看
+
+86
+00:03:03,190 --> 00:03:04,460
+we have m equals 4 for example. Well then
+当m等于4的时候
+
+87
+00:03:04,680 --> 00:03:06,800
+a quadratic function can be
+好吧 二次函数似乎也能
+
+88
+00:03:06,920 --> 00:03:07,900
+a longer fit this data set
+对数据拟合得很好
+
+89
+00:03:08,100 --> 00:03:09,680
+perfectly and if I
+那我们再看
+
+90
+00:03:09,790 --> 00:03:11,350
+have m equals 5 then you
+当m等于5的情况
+
+91
+00:03:11,460 --> 00:03:13,830
+know, maybe quadratic function will fit to stay there so
+这时候再用二次函数来拟合
+
+92
+00:03:14,090 --> 00:03:15,940
+so, then as my training set gets larger.
+好像效果有下降但还是差强人意
+
+93
+00:03:16,980 --> 00:03:18,460
+It becomes harder and harder to
+而当我的训练集越来越大的时候
+
+94
+00:03:18,620 --> 00:03:19,860
+ensure that I can
+你不难发现 要保证使用二次函数
+
+95
+00:03:20,060 --> 00:03:21,820
+find the quadratic function that process through
+的拟合效果依然很好
+
+96
+00:03:21,960 --> 00:03:25,460
+all my examples perfectly. So
+就显得越来越困难了
+
+97
+00:03:25,840 --> 00:03:27,300
+in fact as the training set size
+因此 事实上随着训练集容量的增大
+
+98
+00:03:27,690 --> 00:03:28,770
+grows what you find
+我们不难发现
+
+99
+00:03:29,300 --> 00:03:30,960
+is that my average training error
+我们的平均训练误差
+
+100
+00:03:31,310 --> 00:03:33,080
+actually increases and so if you plot
+是逐渐增大的
+
+101
+00:03:33,500 --> 00:03:34,650
+this figure what you find
+因此如果你画出这条曲线
+
+102
+00:03:35,220 --> 00:03:36,860
+is that the training set
+你就会发现
+
+103
+00:03:37,130 --> 00:03:38,520
+error that is the average
+训练集误差 也就是
+
+104
+00:03:38,940 --> 00:03:40,660
+error on your hypothesis grows
+对假设进行预测的误差平均值
+
+105
+00:03:41,300 --> 00:03:44,730
+as m grows and just to repeat when the intuition is that when
+随着m的增大而增大
+
+106
+00:03:45,020 --> 00:03:46,200
+m is small when you have very
+再重复一遍对这一问题的理解
+
+107
+00:03:46,500 --> 00:03:48,070
+few training examples. It's pretty
+当训练样本很少的时候
+
+108
+00:03:48,350 --> 00:03:49,420
+easy to fit every single
+对每一个训练样本
+
+109
+00:03:49,790 --> 00:03:51,350
+one of your training examples perfectly and
+都能很容易地拟合到很好
+
+110
+00:03:51,610 --> 00:03:52,840
+so your error is going
+所以训练误差将会很小
+
+111
+00:03:52,940 --> 00:03:54,540
+to be small whereas
+而反过来
+
+112
+00:03:54,710 --> 00:03:56,100
+when m is larger then gets
+当m的值逐渐增大
+
+113
+00:03:56,460 --> 00:03:57,900
+harder all the training
+那么想对每一个训练样本都拟合到很好
+
+114
+00:03:58,220 --> 00:03:59,900
+examples perfectly and so
+就显得愈发的困难了
+
+115
+00:04:00,430 --> 00:04:01,830
+your training set error becomes
+因此训练集误差就会越来越大
+
+116
+00:04:02,370 --> 00:04:05,840
+more larger now, how about the cross validation error.
+那么交叉验证集误差的情况如何呢
+
+117
+00:04:06,720 --> 00:04:08,460
+Well, the cross validation is
+好的 交叉验证集误差
+
+118
+00:04:08,590 --> 00:04:10,100
+my error on this cross
+是对完全陌生的交叉验证集数据
+
+119
+00:04:10,350 --> 00:04:12,660
+validation set that I haven't seen and
+进行预测得到的误差
+
+120
+00:04:12,880 --> 00:04:14,600
+so, you know, when I have
+那么我们知道
+
+121
+00:04:14,720 --> 00:04:15,900
+a very small training set, I'm
+当训练集很小的时候
+
+122
+00:04:16,080 --> 00:04:16,890
+not going to generalize well, just
+泛化程度不会很好
+
+123
+00:04:17,020 --> 00:04:19,610
+not going to do well on that.
+意思是不能很好地适应新样本
+
+124
+00:04:19,850 --> 00:04:21,220
+So, right, this hypothesis here doesn't
+因此这个假设
+
+125
+00:04:21,620 --> 00:04:22,720
+look like a good one, and
+就不是一个理想的假设
+
+126
+00:04:23,020 --> 00:04:23,970
+it's only when I get
+只有当我使用
+
+127
+00:04:24,050 --> 00:04:25,270
+a larger training set that,
+一个更大的训练集时
+
+128
+00:04:25,500 --> 00:04:26,380
+you know, I'm starting to get
+我才有可能
+
+129
+00:04:26,890 --> 00:04:28,100
+hypotheses that maybe fit
+得到一个能够更好拟合数据的
+
+130
+00:04:28,480 --> 00:04:30,810
+the data somewhat better.
+可能的假设
+
+131
+00:04:31,380 --> 00:04:32,050
+So your cross validation error and
+因此 你的验证集误差和
+
+132
+00:04:32,260 --> 00:04:35,650
+your test set error will tend
+测试集误差
+
+133
+00:04:35,890 --> 00:04:37,160
+to decrease as your training
+都会随着训练集样本容量m的增加
+
+134
+00:04:37,470 --> 00:04:39,150
+set size increases because the
+而减小 因为你使用的数据越多
+
+135
+00:04:39,250 --> 00:04:40,700
+more data you have, the better
+你越能获得更好地泛化表现
+
+136
+00:04:40,990 --> 00:04:43,410
+you do at generalizing to new examples.
+或者说对新样本的适应能力更强
+
+137
+00:04:44,010 --> 00:04:46,730
+So, just the more data you have, the better the hypothesis you fit.
+因此 数据越多 越能拟合出合适的假设
+
+138
+00:04:47,560 --> 00:04:48,560
+So if you plot j train,
+所以 如果你把Jtrain和Jcv绘制出来
+
+139
+00:04:49,420 --> 00:04:51,670
+and Jcv this is the sort of thing that you get.
+就应该得到这样的曲线
+
+140
+00:04:52,490 --> 00:04:53,550
+Now let's look at what
+现在我们来看看
+
+141
+00:04:53,770 --> 00:04:54,940
+the learning curves may look like
+当处于高偏差或者高方差的情况时
+
+142
+00:04:55,360 --> 00:04:56,550
+if we have either high
+这些学习曲线
+
+143
+00:04:56,930 --> 00:04:58,210
+bias or high variance problems.
+又会变成什么样子
+
+144
+00:04:58,920 --> 00:05:00,530
+Suppose your hypothesis has high
+假如你的假设处于高偏差问题
+
+145
+00:05:00,830 --> 00:05:02,150
+bias and to explain this
+为了更清楚地解释这个问题
+
+146
+00:05:02,370 --> 00:05:03,780
+I'm going to use a, set an
+我要用一个简单的例子来说明
+
+147
+00:05:03,940 --> 00:05:05,250
+example, of fitting a straight
+也就是用一条直线
+
+148
+00:05:05,440 --> 00:05:06,500
+line to data that, you
+来拟合数据的例子
+
+149
+00:05:06,770 --> 00:05:08,240
+know, can't really be fit well by a straight line.
+很显然一条直线不能很好地拟合数据
+
+150
+00:05:09,540 --> 00:05:12,330
+So we end up with a hypotheses that maybe looks like that.
+所以最后得到的假设很有可能是这样的
+
+151
+00:05:13,910 --> 00:05:15,450
+Now let's think what would
+现在我们来想一想
+
+152
+00:05:15,750 --> 00:05:16,840
+happen if we were to increase
+如果我们增大训练集样本容量
+
+153
+00:05:17,470 --> 00:05:18,880
+the training set size. So if
+会发生什么情况呢
+
+154
+00:05:19,160 --> 00:05:20,480
+instead of five examples like
+所以现在不像画出的这样
+
+155
+00:05:20,590 --> 00:05:22,400
+what I've drawn there, imagine that
+只有这五组样本了
+
+156
+00:05:22,570 --> 00:05:24,080
+we have a lot more training examples.
+我们有了更多的训练样本
+
+157
+00:05:25,280 --> 00:05:27,230
+Well what happens, if you fit a straight line to this.
+那么如果你用一条直线来拟合
+
+158
+00:05:27,980 --> 00:05:29,700
+What you find is that, you
+不难发现
+
+159
+00:05:30,040 --> 00:05:31,360
+end up with you know, pretty much the same straight line.
+还是会得到类似的一条直线假设
+
+160
+00:05:31,690 --> 00:05:32,940
+I mean a straight line that
+我的意思是
+
+161
+00:05:33,530 --> 00:05:35,110
+just cannot fit this
+刚才的情况用一条直线不能很好地拟合
+
+162
+00:05:35,270 --> 00:05:37,320
+data and getting a ton more data, well
+而现在把样本容量扩大了
+
+163
+00:05:37,890 --> 00:05:39,460
+the straight line isn't going to change that much.
+这条直线也基本不会变化太大
+
+164
+00:05:40,230 --> 00:05:41,400
+This is the best possible straight-line
+因为这条直线是对这组数据
+
+165
+00:05:41,840 --> 00:05:42,770
+fit to this data, but the
+最可能也是最接近的拟合
+
+166
+00:05:42,890 --> 00:05:44,160
+straight line just can't fit this
+但一条直线再怎么接近
+
+167
+00:05:44,320 --> 00:05:45,630
+data set that well. So,
+也不可能对这组数据进行很好的拟合
+
+168
+00:05:45,870 --> 00:05:47,420
+if you plot across validation error,
+所以 如果你绘出交叉验证集误差
+
+169
+00:05:49,260 --> 00:05:50,170
+this is what it will look like.
+应该是这样子的
+
+170
+00:05:51,320 --> 00:05:54,470
+Option on the left, if you have already a miniscule training set size like you know,
+最左端表示训练集样本容量很小 比如说只有一组样本
+
+171
+00:05:55,410 --> 00:05:57,710
+maybe just one training example and is not going to do well.
+那么表现当然很不好
+
+172
+00:05:58,550 --> 00:05:59,470
+But by the time you have
+而随着你增大训练集样本数
+
+173
+00:05:59,660 --> 00:06:00,760
+reached a certain number of training
+当达到某一个容量值的时候
+
+174
+00:06:00,940 --> 00:06:02,350
+examples, you have almost
+你就会找到那条最有可能
+
+175
+00:06:02,810 --> 00:06:04,010
+fit the best possible straight
+拟合数据的那条直线
+
+176
+00:06:04,200 --> 00:06:05,400
+line, and even if
+并且此时即便
+
+177
+00:06:05,490 --> 00:06:06,260
+you end up with a much
+你继续增大训练集的
+
+178
+00:06:06,480 --> 00:06:07,790
+larger training set size, a
+样本容量
+
+179
+00:06:07,970 --> 00:06:09,170
+much larger value of m,
+即使你不断增大m的值
+
+180
+00:06:10,010 --> 00:06:12,040
+you know, you're basically getting the same straight line,
+你基本上还是会得到的一条差不多的直线
+
+181
+00:06:12,370 --> 00:06:14,190
+and so, the cross-validation error
+因此 交叉验证集误差
+
+182
+00:06:14,480 --> 00:06:15,420
+- let me label that -
+我把它标在这里
+
+183
+00:06:15,650 --> 00:06:17,040
+or test set error or
+或者测试集误差
+
+184
+00:06:17,140 --> 00:06:18,660
+plateau out, or flatten out
+将会很快变为水平而不再变化
+
+185
+00:06:18,990 --> 00:06:20,480
+pretty soon, once you reached
+只要训练集样本容量值达到
+
+186
+00:06:20,910 --> 00:06:22,920
+beyond a certain the number
+或超过了那个特定的数值
+
+187
+00:06:23,270 --> 00:06:24,700
+of training examples, unless you
+交叉验证集误差和测试集误差就趋于不变
+
+188
+00:06:25,130 --> 00:06:27,480
+pretty much fit the best possible straight line.
+这样你会得到最能拟合数据的那条直线
+
+189
+00:06:28,390 --> 00:06:29,540
+And how about training error?
+那么训练误差又如何呢
+
+190
+00:06:30,120 --> 00:06:33,050
+Well, the training error will again be small.
+同样 训练误差一开始也是很小的
+
+191
+00:06:34,620 --> 00:06:36,280
+And what you find
+而在高偏差的情形中
+
+192
+00:06:36,760 --> 00:06:38,080
+in the high bias case is
+你会发现训练集误差
+
+193
+00:06:38,210 --> 00:06:40,770
+that the training error will end
+会逐渐增大
+
+194
+00:06:41,000 --> 00:06:42,510
+up close to the cross
+一直趋于接近
+
+195
+00:06:42,830 --> 00:06:44,700
+validation error, because you
+交叉验证集误差
+
+196
+00:06:44,810 --> 00:06:46,370
+have so few parameters and so
+这是因为你的参数很少
+
+197
+00:06:46,590 --> 00:06:48,070
+much data, at least when m is large.
+但当m很大的时候 数据太多
+
+198
+00:06:48,900 --> 00:06:49,840
+The performance on the training
+此时训练集和交叉验证集的
+
+199
+00:06:50,220 --> 00:06:52,500
+set and the cross validation set will be very similar.
+预测效果将会非常接近
+
+200
+00:06:53,800 --> 00:06:54,750
+And so, this is what your
+这就是当你的学习算法处于
+
+201
+00:06:54,870 --> 00:06:56,460
+learning curves will look like,
+高偏差情形时
+
+202
+00:06:56,770 --> 00:06:58,850
+if you have an algorithm that has high bias.
+学习曲线的大致走向
+
+203
+00:07:00,220 --> 00:07:01,470
+And finally, the problem with
+最后补充一点
+
+204
+00:07:01,630 --> 00:07:03,260
+high bias is reflected in
+高偏差的情形
+
+205
+00:07:03,450 --> 00:07:04,930
+the fact that both the
+反映出的问题是
+
+206
+00:07:05,580 --> 00:07:07,350
+cross validation error and the
+交叉验证集和训练集
+
+207
+00:07:07,420 --> 00:07:09,130
+training error are high,
+误差都很大
+
+208
+00:07:09,560 --> 00:07:10,440
+and so you end up with
+也就是说 你最终会得到一个
+
+209
+00:07:10,650 --> 00:07:12,040
+a relatively high value of
+值比较大Jcv
+
+210
+00:07:12,280 --> 00:07:14,250
+both Jcv and the j train.
+和Jtrain
+
+211
+00:07:15,370 --> 00:07:16,820
+This also implies something very
+这也得出一个很有意思的结论
+
+212
+00:07:17,120 --> 00:07:18,520
+interesting, which is that,
+那就是
+
+213
+00:07:18,800 --> 00:07:19,990
+if a learning algorithm has high
+如果一个学习算法
+
+214
+00:07:20,360 --> 00:07:22,250
+bias, as we
+有很大的偏差
+
+215
+00:07:22,390 --> 00:07:23,430
+get more and more training examples,
+那么当我们选用更多的训练样本时
+
+216
+00:07:24,060 --> 00:07:25,100
+that is, as we move to
+也就是在这幅图中
+
+217
+00:07:25,210 --> 00:07:26,600
+the right of this figure, we'll
+随着我们增大横坐标
+
+218
+00:07:26,740 --> 00:07:27,880
+notice that the cross
+我们发现交叉验证集误差的值
+
+219
+00:07:28,220 --> 00:07:29,430
+validation error isn't going
+不会表现出明显的下降
+
+220
+00:07:29,740 --> 00:07:31,020
+down much, it's basically fattened
+实际上是变为水平了
+
+221
+00:07:31,560 --> 00:07:32,820
+up, and so if
+所以如果学习算法
+
+222
+00:07:32,950 --> 00:07:35,020
+learning algorithms are really suffering from high bias.
+正处于高偏差的情形
+
+223
+00:07:36,640 --> 00:07:38,200
+Getting more training data by
+那么选用更多的训练集数据
+
+224
+00:07:38,370 --> 00:07:39,710
+itself will actually not help
+对于改善算法表现无益
+
+225
+00:07:40,190 --> 00:07:41,580
+that much,and as our figure
+正如我们右边的
+
+226
+00:07:41,760 --> 00:07:43,120
+example in the figure
+这两幅图所体现的
+
+227
+00:07:43,210 --> 00:07:45,670
+on the right, here we had only five training.
+这里我们只有五组训练样本
+
+228
+00:07:46,060 --> 00:07:47,970
+examples, and we fill certain straight line.
+然后我们找到这条直线来拟合
+
+229
+00:07:48,550 --> 00:07:49,270
+And when we had a ton
+然后我们增加了更多的训练样本
+
+230
+00:07:49,540 --> 00:07:50,730
+more training data, we still
+但我们仍然得到几乎一样的
+
+231
+00:07:51,040 --> 00:07:52,710
+end up with roughly the same straight line.
+一条直线
+
+232
+00:07:53,200 --> 00:07:54,290
+And so if the learning algorithm
+因此如果学习算法
+
+233
+00:07:54,440 --> 00:07:57,090
+has high bias give me a lot more training data.
+处于高偏差时
+
+234
+00:07:57,650 --> 00:07:59,060
+That doesn't actually help you
+给我再多的训练数据也于事无补
+
+235
+00:07:59,830 --> 00:08:01,290
+get a much lower cross validation
+交叉验证集误差或测试集误差
+
+236
+00:08:01,890 --> 00:08:02,890
+error or test set error.
+也不会降低多少
+
+237
+00:08:03,730 --> 00:08:04,950
+So knowing if your learning
+所以 能够看清你的算法正处于
+
+238
+00:08:05,250 --> 00:08:06,600
+algorithm is suffering from high
+高偏差的情形
+
+239
+00:08:06,780 --> 00:08:07,620
+bias seems like a useful
+是一件很有意义的事情
+
+240
+00:08:08,100 --> 00:08:09,500
+thing to know because this can
+因为这样可以让你避免
+
+241
+00:08:09,640 --> 00:08:11,140
+prevent you from wasting a
+把时间浪费在
+
+242
+00:08:11,290 --> 00:08:12,520
+lot of time collecting more training
+收集更多的训练集数据上
+
+243
+00:08:12,920 --> 00:08:15,440
+data where it might just not end up being helpful.
+因为再多的数据也是无意义的
+
+244
+00:08:16,200 --> 00:08:17,070
+Next let us look at the
+接下来我们再来看看
+
+245
+00:08:17,140 --> 00:08:18,530
+setting of a learning algorithm
+当学习算法正处于高方差的时候
+
+246
+00:08:19,470 --> 00:08:20,340
+that may have high variance.
+学习曲线应该是什么样子的
+
+247
+00:08:21,590 --> 00:08:22,880
+Let us just look at the
+首先我们来看
+
+248
+00:08:23,550 --> 00:08:24,260
+training error in a around if
+训练集误差
+
+249
+00:08:25,120 --> 00:08:26,350
+you have very smart training
+如果你的训练集样本容量很小
+
+250
+00:08:26,680 --> 00:08:28,730
+set like five training examples shown on
+比如像图中所示情形
+
+251
+00:08:29,130 --> 00:08:30,720
+the figure on the right and
+只有五组训练样本
+
+252
+00:08:31,150 --> 00:08:32,170
+if we're fitting say a
+如果我们用很高阶次的
+
+253
+00:08:32,200 --> 00:08:33,050
+very high order polynomial,
+多项式来拟合
+
+254
+00:08:34,380 --> 00:08:36,530
+and I've written a hundredth degree polynomial which
+比如这里我用了100次的多项式函数
+
+255
+00:08:37,090 --> 00:08:38,750
+really no one uses, but just an illustration.
+当然不会有人这么用的 这里只是演示
+
+256
+00:08:39,920 --> 00:08:41,460
+And if we're using a
+并且假设我们使用
+
+257
+00:08:41,550 --> 00:08:43,160
+fairly small value of lambda,
+一个很小的lambda值
+
+258
+00:08:43,800 --> 00:08:44,920
+maybe not zero, but a fairly
+可能不等于0
+
+259
+00:08:45,070 --> 00:08:46,830
+small value of lambda, then
+但足够小的lambda
+
+260
+00:08:47,040 --> 00:08:47,980
+we'll end up, you know,
+那么很显然 我们会对这组数据
+
+261
+00:08:48,190 --> 00:08:50,590
+fitting this data very well that with
+拟合得非常非常好
+
+262
+00:08:50,860 --> 00:08:53,390
+a function that overfits this.
+因此这个假设函数对数据过拟合
+
+263
+00:08:54,380 --> 00:08:55,640
+So, if the training
+所以 如果训练集
+
+264
+00:08:55,990 --> 00:08:57,820
+set size is small, our training
+样本容量很小时
+
+265
+00:08:58,320 --> 00:08:59,530
+error, that is, j train
+训练集误差Jtrain
+
+266
+00:09:00,030 --> 00:09:01,810
+of theta will be small.
+将会很小
+
+267
+00:09:03,130 --> 00:09:04,330
+And as this training set size increases
+随着训练集样本容量的增加
+
+268
+00:09:04,940 --> 00:09:05,870
+a bit, you know, we may
+可能这个假设函数仍然会
+
+269
+00:09:06,000 --> 00:09:07,160
+still be overfitting this
+对数据或多或少
+
+270
+00:09:07,330 --> 00:09:08,810
+data a little bit but
+有一点过拟合
+
+271
+00:09:09,780 --> 00:09:11,880
+it also becomes slightly harder to
+但很明显此时要对数据很好地拟合
+
+272
+00:09:12,020 --> 00:09:12,970
+fit this data set perfectly,
+显得更加困难和吃力了
+
+273
+00:09:13,940 --> 00:09:15,140
+and so, as the training set size
+所以 随着训练集样本容量的增大
+
+274
+00:09:15,350 --> 00:09:16,810
+increases, we'll find that
+我们会发现Jtrain的值
+
+275
+00:09:16,960 --> 00:09:19,390
+j train increases, because
+会随之增大
+
+276
+00:09:19,840 --> 00:09:21,040
+it is just a little harder to fit
+因为当训练样本越多的时候
+
+277
+00:09:21,260 --> 00:09:22,720
+the training set perfectly when we have
+我们就越难跟训练集数据拟合得很好
+
+278
+00:09:22,890 --> 00:09:25,700
+more examples, but the training set error will still be pretty low.
+但总的来说训练集误差还是很小
+
+279
+00:09:26,530 --> 00:09:28,600
+Now, how about the cross validation error?
+交叉验证集误差又如何呢
+
+280
+00:09:29,220 --> 00:09:30,590
+Well, in high variance
+好的 在高方差的情形中
+
+281
+00:09:31,040 --> 00:09:32,760
+setting, a hypothesis is
+假设函数对数据过拟合
+
+282
+00:09:32,980 --> 00:09:34,190
+overfitting and so the
+因此交叉验证集误差
+
+283
+00:09:34,290 --> 00:09:35,680
+cross validation error will remain
+将会一直都很大
+
+284
+00:09:36,120 --> 00:09:37,650
+high, even as we
+即便我们选择一个
+
+285
+00:09:37,750 --> 00:09:38,930
+get you know, a moderate number
+比较合适恰当的
+
+286
+00:09:39,260 --> 00:09:40,520
+of training examples and, so
+训练集样本数
+
+287
+00:09:41,170 --> 00:09:42,950
+maybe, the cross validation
+因此交叉验证集误差
+
+288
+00:09:43,730 --> 00:09:45,520
+error may look like that.
+画出来差不多是这样的
+
+289
+00:09:45,660 --> 00:09:47,720
+And the indicative diagnostic that we
+所以算法处于高方差情形
+
+290
+00:09:47,830 --> 00:09:49,200
+have a high variance problem,
+最明显的一个特点是
+
+291
+00:09:50,210 --> 00:09:51,490
+is the fact that there's
+在训练集误差
+
+292
+00:09:51,720 --> 00:09:54,010
+this large gap between
+和交叉验证集误差之间
+
+293
+00:09:54,340 --> 00:09:56,440
+the training error and the cross validation error.
+有一段很大的差距
+
+294
+00:09:57,440 --> 00:09:58,180
+And looking at this figure.
+而这个曲线图也反映出
+
+295
+00:09:58,720 --> 00:10:00,170
+If we think about adding more
+如果我们要考虑增大训练集的样本数
+
+296
+00:10:00,440 --> 00:10:01,810
+training data, that is, taking
+也就是在这幅图中
+
+297
+00:10:02,110 --> 00:10:03,660
+this figure and extrapolating to
+向右延伸曲线
+
+298
+00:10:03,790 --> 00:10:05,220
+the right, we can kind
+我们大致可以看出
+
+299
+00:10:05,330 --> 00:10:06,830
+of tell that, you know the
+这两条学习曲线
+
+300
+00:10:07,030 --> 00:10:08,120
+two curves, the blue curve
+蓝色和红色的两条曲线
+
+301
+00:10:08,480 --> 00:10:10,480
+and the magenta curve, are converging to each other.
+正在相互靠近
+
+302
+00:10:11,420 --> 00:10:12,360
+And so, if we were to
+因此 如果我们将曲线
+
+303
+00:10:12,520 --> 00:10:13,840
+extrapolate this figure to
+向右延伸出去
+
+304
+00:10:13,980 --> 00:10:21,230
+the right, then it
+那么似乎
+
+305
+00:10:21,360 --> 00:10:23,000
+seems it likely that the
+训练集误差很可能会
+
+306
+00:10:23,170 --> 00:10:24,120
+training error will keep on
+逐渐增大
+
+307
+00:10:24,270 --> 00:10:25,740
+going up and the
+而交叉验证集误差
+
+308
+00:10:27,130 --> 00:10:29,040
+cross-validation error would keep on going down.
+则会持续下降
+
+309
+00:10:30,000 --> 00:10:32,340
+And the thing we really care about is the cross-validation error
+当然我们最关心的还是交叉验证集误差
+
+310
+00:10:33,010 --> 00:10:35,150
+or the test set error, right?
+或者测试集误差 对吧
+
+311
+00:10:35,300 --> 00:10:36,460
+So in this sort
+所以从这幅图中
+
+312
+00:10:36,730 --> 00:10:37,850
+of figure, we can tell that
+我们基本可以预测
+
+313
+00:10:38,230 --> 00:10:39,420
+if we keep on adding training
+如果继续增大训练样本的数量
+
+314
+00:10:39,820 --> 00:10:40,930
+examples and extrapolate to the
+将曲线向右延伸
+
+315
+00:10:41,050 --> 00:10:42,650
+right, well our cross validation
+交叉验证集误差将会
+
+316
+00:10:43,290 --> 00:10:44,610
+error will keep on coming down.
+逐渐下降
+
+317
+00:10:45,120 --> 00:10:46,090
+And, so, in the high
+所以 在高方差的情形中
+
+318
+00:10:46,330 --> 00:10:47,980
+variance setting, getting more
+使用更多的训练集数据
+
+319
+00:10:48,180 --> 00:10:49,550
+training data is, indeed,
+对改进算法的表现
+
+320
+00:10:50,170 --> 00:10:51,240
+likely to help.
+事实上是有效果的
+
+321
+00:10:51,520 --> 00:10:52,810
+And so again, this seems like a
+这同样也体现出
+
+322
+00:10:53,060 --> 00:10:54,180
+useful thing to know if your
+知道你的算法正处于
+
+323
+00:10:54,330 --> 00:10:55,830
+learning algorithm is suffering
+高方差的情形
+
+324
+00:10:56,150 --> 00:10:57,460
+from a high variance problem, because
+也是非常有意义的
+
+325
+00:10:57,810 --> 00:10:59,150
+that tells you, for example that it
+因为它能告诉你
+
+326
+00:10:59,220 --> 00:11:00,100
+may be be worth your while
+是否有必要花时间
+
+327
+00:11:00,680 --> 00:11:02,430
+to see if you can go and get some more training data.
+来增加更多的训练集数据
+
+328
+00:11:03,700 --> 00:11:04,920
+Now, on the previous slide
+好的 在前一页和这一页幻灯片中
+
+329
+00:11:05,330 --> 00:11:06,450
+and this slide, I've drawn fairly
+我画出的学习曲线
+
+330
+00:11:06,970 --> 00:11:08,510
+clean fairly idealized curves.
+都是相当理想化的曲线
+
+331
+00:11:08,900 --> 00:11:10,050
+If you plot these curves for
+针对一个实际的学习算法
+
+332
+00:11:10,170 --> 00:11:11,970
+an actual learning algorithm, sometimes
+如果你画出学习曲线的话
+
+333
+00:11:12,500 --> 00:11:13,910
+you will actually see, you know, pretty
+你会看到基本类似的结果
+
+334
+00:11:14,560 --> 00:11:15,900
+much curves, like what I've drawn here.
+就像我在这里画的一样
+
+335
+00:11:16,600 --> 00:11:17,730
+Although, sometimes you see curves
+虽然如此
+
+336
+00:11:18,150 --> 00:11:19,160
+that are a little bit noisier and
+有时候你也会看到
+
+337
+00:11:19,230 --> 00:11:20,820
+a little bit messier than this.
+带有一点噪声或干扰的曲线
+
+338
+00:11:21,090 --> 00:11:22,440
+But plotting learning curves like
+但总的来说
+
+339
+00:11:22,620 --> 00:11:23,850
+these can often tell
+像这样画出学习曲线
+
+340
+00:11:24,120 --> 00:11:25,460
+you, can often help you
+确实能帮助你
+
+341
+00:11:25,570 --> 00:11:26,650
+figure out if your learning algorithm is
+看清你的学习算法
+
+342
+00:11:26,950 --> 00:11:29,080
+suffering from bias, or variance or even a little bit of both.
+是否处于高偏差 高方差 或二者皆有的情形
+
+343
+00:11:29,170 --> 00:11:31,030
+So when I'm
+所以当我打算
+
+344
+00:11:31,200 --> 00:11:32,700
+trying to improve the performance of
+改进一个学习算法
+
+345
+00:11:32,760 --> 00:11:34,060
+a learning algorithm, one thing
+的表现时
+
+346
+00:11:34,260 --> 00:11:35,720
+that I'll almost always do
+我通常会进行的一项工作
+
+347
+00:11:35,960 --> 00:11:37,440
+is plot these learning
+就是画出这些学习曲线
+
+348
+00:11:37,970 --> 00:11:39,460
+curves, and usually this will
+一般来讲 这项工作会让你
+
+349
+00:11:39,490 --> 00:11:41,710
+give you a better sense of whether there is a bias or variance problem.
+更轻松地看出偏差或方差的问题
+
+350
+00:11:44,280 --> 00:11:45,180
+And in the next video
+在下一节视频中
+
+351
+00:11:45,420 --> 00:11:46,440
+we'll see how this can
+我们将介绍如何判断
+
+352
+00:11:46,650 --> 00:11:48,370
+help suggest specific actions is
+是否应采取具体的某个行为
+
+353
+00:11:48,450 --> 00:11:49,580
+to take, or to not take,
+来改进学习算法的表现
+
+354
+00:11:50,260 --> 00:11:53,250
+in order to try to improve the performance of your learning algorithm.
+
diff --git a/srt/10 - 7 - Deciding What to Do Next Revisited (7 min).srt b/srt/10 - 7 - Deciding What to Do Next Revisited (7 min).srt
new file mode 100644
index 00000000..986288b9
--- /dev/null
+++ b/srt/10 - 7 - Deciding What to Do Next Revisited (7 min).srt
@@ -0,0 +1,996 @@
+1
+00:00:00,260 --> 00:00:01,340
+We've talked about how to evaluate
+我们已经介绍了怎样评价一个学习算法
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,960 --> 00:00:03,360
+learning algorithms, talked about model selection,
+我们讨论了模型选择问题
+
+3
+00:00:04,150 --> 00:00:06,490
+talked a lot about bias and variance.
+偏差和方差的问题
+
+4
+00:00:06,970 --> 00:00:08,110
+So how does this help us figure
+那么这些诊断法则怎样帮助我们弄清
+
+5
+00:00:08,330 --> 00:00:09,730
+out what are potentially fruitful,
+哪些方法有助于
+
+6
+00:00:10,340 --> 00:00:11,710
+potentially not fruitful things to
+改进学习算法的效果
+
+7
+00:00:11,950 --> 00:00:13,980
+try to do to improve the performance of a learning algorithm.
+哪些又是徒劳的呢
+
+8
+00:00:15,480 --> 00:00:16,660
+Let's go back to our original
+让我们再次回到最开始的例子
+
+9
+00:00:16,940 --> 00:00:18,890
+motivating example and go for the result.
+在那里寻找答案
+
+10
+00:00:21,030 --> 00:00:22,570
+So here is our earlier example
+这就是我们之前的例子
+
+11
+00:00:23,000 --> 00:00:24,120
+of maybe having fit regularized
+我们试图用正则化的线性回归拟合模型
+
+12
+00:00:24,720 --> 00:00:27,640
+linear regression and finding that it doesn't work as well as we're hoping.
+并评价该算法是否达到预期效果
+
+13
+00:00:28,300 --> 00:00:30,080
+We said that we had this menu of options.
+我们提出了如下这些选择
+
+14
+00:00:30,910 --> 00:00:32,430
+So is there some way to
+那么到底有没有某种方法
+
+15
+00:00:32,590 --> 00:00:34,530
+figure out which of these might be fruitful options?
+能够明确指出以上哪些方法有效呢
+
+16
+00:00:35,480 --> 00:00:36,490
+The first thing all of this
+第一种可供选择的方法
+
+17
+00:00:36,660 --> 00:00:38,770
+was getting more training examples.
+是使用更多的训练集数据
+
+18
+00:00:39,550 --> 00:00:40,700
+What this is good for,
+这种方法对于高方差的情况
+
+19
+00:00:40,880 --> 00:00:42,890
+is this helps to fix high variance.
+是有帮助的
+
+20
+00:00:45,320 --> 00:00:46,610
+And concretely, if you instead
+也就是说
+
+21
+00:00:47,150 --> 00:00:48,550
+have a high bias problem and
+如果你的模型不处于高方差问题
+
+22
+00:00:48,680 --> 00:00:50,530
+don't have any variance problem, then
+而是高偏差的时候
+
+23
+00:00:50,830 --> 00:00:52,000
+we saw in the previous video
+那么通过前面的视频
+
+24
+00:00:52,500 --> 00:00:53,560
+that getting more training examples,
+我们已经知道 获取更多的训练集数据
+
+25
+00:00:54,640 --> 00:00:56,380
+while maybe just isn't going to help much at all.
+并不会有太明显的帮助
+
+26
+00:00:57,360 --> 00:00:58,320
+So the first option is useful
+所以 要选择第一种方法
+
+27
+00:00:58,780 --> 00:01:00,230
+only if you, say, plot
+你应该先画出
+
+28
+00:01:00,580 --> 00:01:01,620
+the learning curves and figure
+学习曲线
+
+29
+00:01:01,720 --> 00:01:02,820
+out that you have at least
+然后看出你的模型
+
+30
+00:01:02,860 --> 00:01:03,970
+a bit of a variance, meaning that
+应该至少有那么一点方差问题
+
+31
+00:01:04,170 --> 00:01:06,530
+the cross-validation error is, you know,
+也就是说你的交叉验证集误差
+
+32
+00:01:06,680 --> 00:01:08,800
+quite a bit bigger than your training set error.
+应该比训练集误差大一点
+
+33
+00:01:08,910 --> 00:01:10,400
+How about trying a smaller set of features?
+第二种方法情况又如何呢
+
+34
+00:01:10,940 --> 00:01:11,920
+Well, trying a smaller
+第二种方法是
+
+35
+00:01:12,350 --> 00:01:13,570
+set of features, that's again
+少选几种特征
+
+36
+00:01:13,970 --> 00:01:16,060
+something that fixes high variance.
+这同样是对高方差时有效
+
+37
+00:01:17,100 --> 00:01:18,080
+And in other words, if you figure
+换句话说
+
+38
+00:01:18,420 --> 00:01:19,440
+out, by looking at learning curves
+如果你通过绘制学习曲线
+
+39
+00:01:19,820 --> 00:01:20,830
+or something else that you used,
+或者别的什么方法
+
+40
+00:01:21,190 --> 00:01:22,110
+that have a high bias
+看出你的模型处于高偏差问题
+
+41
+00:01:22,370 --> 00:01:23,460
+problem; then for goodness
+那么切记
+
+42
+00:01:23,670 --> 00:01:25,000
+sakes, don't waste your time
+千万不要浪费时间
+
+43
+00:01:25,540 --> 00:01:27,250
+trying to carefully select out
+试图从已有的特征中
+
+44
+00:01:27,450 --> 00:01:29,130
+a smaller set of features to use.
+挑出一小部分来使用
+
+45
+00:01:29,330 --> 00:01:31,190
+Because if you have a high bias problem, using
+因为你已经发现高偏差的问题了
+
+46
+00:01:32,060 --> 00:01:33,220
+fewer features is not going to help.
+使用更少的特征仍然无济于事
+
+47
+00:01:33,890 --> 00:01:35,270
+Whereas in contrast, if you
+反过来 如果你发现
+
+48
+00:01:35,490 --> 00:01:36,730
+look at the learning curves or something
+从你的学习曲线
+
+49
+00:01:36,900 --> 00:01:38,020
+else you figure out that you
+或者别的某种诊断图中
+
+50
+00:01:38,360 --> 00:01:39,780
+have a high variance problem, then,
+你看出了高方差的问题
+
+51
+00:01:40,320 --> 00:01:41,730
+indeed trying to select
+那么恭喜你
+
+52
+00:01:42,160 --> 00:01:43,180
+out a smaller set of features,
+花点时间挑选出一小部分合适的特征吧
+
+53
+00:01:43,440 --> 00:01:45,380
+that might indeed be a very good use of your time.
+这是把时间用在了刀刃上
+
+54
+00:01:45,790 --> 00:01:47,120
+How about trying to get additional
+方法三 选用更多的特征又如何呢
+
+55
+00:01:47,710 --> 00:01:49,640
+features, adding features, usually,
+通常来讲
+
+56
+00:01:50,170 --> 00:01:51,380
+not always, but usually we
+不是所有时候都适用
+
+57
+00:01:51,490 --> 00:01:53,020
+think of this as a solution
+但通常来说 增加特征数
+
+58
+00:01:54,070 --> 00:01:56,920
+for fixing high bias problems.
+是解决高偏差问题的法宝
+
+59
+00:01:57,600 --> 00:01:58,700
+So if you are adding extra
+所以如果你要增加
+
+60
+00:01:58,980 --> 00:02:00,640
+features it's usually because
+更多的特征时
+
+61
+00:02:01,750 --> 00:02:03,150
+your current hypothesis is too
+一般是由于你现有的
+
+62
+00:02:03,280 --> 00:02:04,280
+simple, and so we want
+假设函数太简单
+
+63
+00:02:04,540 --> 00:02:06,520
+to try to get additional features to
+因此我们才决定增加一些
+
+64
+00:02:06,730 --> 00:02:08,540
+make our hypothesis better able
+别的特征来让假设函数
+
+65
+00:02:09,060 --> 00:02:10,800
+to fit the training set. And
+更好地拟合训练集
+
+66
+00:02:11,420 --> 00:02:13,460
+similarly, adding polynomial features;
+接下来 类似地
+
+67
+00:02:13,770 --> 00:02:14,930
+this is another way of adding
+增加更多的多项式特征
+
+68
+00:02:15,140 --> 00:02:16,420
+features and so there
+这实际上也是属于增加特征
+
+69
+00:02:16,570 --> 00:02:18,220
+is another way to try
+因此也是用于
+
+70
+00:02:18,430 --> 00:02:19,950
+to fix the high bias problem.
+修正高偏差问题
+
+71
+00:02:21,020 --> 00:02:22,820
+And, if concretely if
+具体来说
+
+72
+00:02:23,210 --> 00:02:24,350
+your learning curves show you
+如果你画出的学习曲线告诉你
+
+73
+00:02:24,570 --> 00:02:25,410
+that you still have a high
+你还是处于高方差问题
+
+74
+00:02:25,520 --> 00:02:27,190
+variance problem, then, you know, again this
+那么采取这种方法
+
+75
+00:02:27,320 --> 00:02:29,360
+is maybe a less good use of your time.
+依然是浪费时间
+
+76
+00:02:30,640 --> 00:02:32,690
+And finally, decreasing and increasing lambda.
+最后 增大和减小lambda
+
+77
+00:02:33,200 --> 00:02:34,090
+This are quick and easy to try,
+这种方法尝试起来最方便
+
+78
+00:02:34,470 --> 00:02:36,000
+I guess these are less likely to
+我想尝试这个方法
+
+79
+00:02:36,140 --> 00:02:38,190
+be a waste of, you know, many months of your life.
+不至于花费你几个月时间
+
+80
+00:02:39,070 --> 00:02:41,530
+But decreasing lambda, you
+但我们已经知道
+
+81
+00:02:41,650 --> 00:02:43,400
+already know fixes high bias.
+减小lambda可以修正高偏差
+
+82
+00:02:45,360 --> 00:02:46,340
+In case this isn't clear to
+如果我说的你还不清楚的话
+
+83
+00:02:46,500 --> 00:02:47,340
+you, you know, I do encourage
+我建议你暂停视频
+
+84
+00:02:47,810 --> 00:02:50,350
+you to pause the video and think through this that
+仔细回忆一下
+
+85
+00:02:50,990 --> 00:02:52,790
+convince yourself that decreasing lambda
+减小lambda的值
+
+86
+00:02:53,620 --> 00:02:55,030
+helps fix high bias, whereas increasing
+有助于修正高偏差
+
+87
+00:02:55,590 --> 00:02:57,480
+lambda fixes high variance.
+而增大lambda的值解决高方差
+
+88
+00:02:59,870 --> 00:03:00,930
+And if you aren't sure why
+如果你确实不明白
+
+89
+00:03:01,270 --> 00:03:02,470
+this is the case, do
+为什么是这样的话
+
+90
+00:03:02,650 --> 00:03:04,130
+pause the video and make
+那就暂停一下好好想想
+
+91
+00:03:04,150 --> 00:03:05,820
+sure you can convince yourself that this is the case.
+直到真的弄清楚这个道理
+
+92
+00:03:06,580 --> 00:03:07,320
+Or take a look at the curves
+或者看看
+
+93
+00:03:07,800 --> 00:03:09,040
+that we were plotting at the
+上一节视频最后
+
+94
+00:03:09,190 --> 00:03:10,590
+end of the previous video and
+我们绘制的学习曲线
+
+95
+00:03:10,720 --> 00:03:11,650
+try to make sure you understand
+试着理解清楚
+
+96
+00:03:12,170 --> 00:03:13,670
+why these are the case.
+为什么是我说的那样
+
+97
+00:03:15,080 --> 00:03:16,120
+Finally, let us take everything
+最后 我们回顾一下
+
+98
+00:03:16,440 --> 00:03:17,840
+we have learned and relate it back
+这几节课介绍的这些内容
+
+99
+00:03:18,400 --> 00:03:19,980
+to neural networks and so,
+并且看看它们和神经网络的联系
+
+100
+00:03:20,130 --> 00:03:21,190
+here is some practical
+我想介绍一些
+
+101
+00:03:21,720 --> 00:03:22,720
+advice for how I usually
+很实用的经验或建议
+
+102
+00:03:23,520 --> 00:03:25,060
+choose the architecture or the
+这些来自于我平时为神经网络模型
+
+103
+00:03:25,530 --> 00:03:28,660
+connectivity pattern of the neural networks I use.
+选择结构或者连接形式的一些技巧
+
+104
+00:03:30,070 --> 00:03:31,190
+So, if you are fitting
+当你在进行神经网络拟合的时候
+
+105
+00:03:31,410 --> 00:03:33,160
+a neural network, one option would
+你可以选择一种
+
+106
+00:03:33,400 --> 00:03:34,680
+be to fit, say, a pretty
+比如说
+
+107
+00:03:34,840 --> 00:03:36,540
+small neural network with you know, relatively
+一个相对比较简单的神经网络模型
+
+108
+00:03:37,530 --> 00:03:38,670
+few hidden units, maybe just
+相对来讲 隐藏单元比较少
+
+109
+00:03:38,930 --> 00:03:40,430
+one hidden unit. If you're fitting
+或者甚至只有一个隐藏单元
+
+110
+00:03:40,890 --> 00:03:42,670
+a neural network, one option would
+如果你要进行神经网络的拟合
+
+111
+00:03:42,800 --> 00:03:44,440
+be to fit a relatively small
+其中一个选择是
+
+112
+00:03:44,920 --> 00:03:46,500
+neural network with, say,
+选用一个相对简单的网络结构
+
+113
+00:03:48,030 --> 00:03:49,630
+relatively few, maybe only one
+比如说只有一个
+
+114
+00:03:49,980 --> 00:03:51,760
+hidden layer and maybe
+隐藏层
+
+115
+00:03:52,070 --> 00:03:53,370
+only a relatively few number
+或者可能相对来讲
+
+116
+00:03:53,750 --> 00:03:55,160
+of hidden units.
+比较少的隐藏单元
+
+117
+00:03:55,570 --> 00:03:56,580
+So, a network like this might have relatively
+因此像这样的一个简单的神经网络
+
+118
+00:03:57,050 --> 00:03:59,170
+few parameters and be more prone to underfitting.
+参数就不会很多 并且很容易出现欠拟合
+
+119
+00:04:00,450 --> 00:04:01,850
+The main advantage of these small
+这种比较小型的神经网络
+
+120
+00:04:02,260 --> 00:04:04,760
+neural networks is that the computation will be cheaper.
+其最大优势在于计算量较小
+
+121
+00:04:05,820 --> 00:04:06,910
+An alternative would be to
+与之相对的另一种情况
+
+122
+00:04:07,010 --> 00:04:08,470
+fit a, maybe relatively large
+是相对较大型的神经网络结构
+
+123
+00:04:08,900 --> 00:04:10,790
+neural network with either more
+隐藏层单元会比较多
+
+124
+00:04:10,970 --> 00:04:12,370
+hidden units--there's a lot
+比如每一层中的隐藏单元数很多
+
+125
+00:04:12,560 --> 00:04:14,940
+of hidden in one there--or with more hidden layers.
+或者有很多个隐藏层
+
+126
+00:04:16,200 --> 00:04:17,800
+And so these neural networks tend
+因此这种比较复杂的神经网络
+
+127
+00:04:18,010 --> 00:04:20,870
+to have more parameters and therefore be more prone to overfitting.
+参数一般较多 也更容易出现过拟合
+
+128
+00:04:22,410 --> 00:04:24,010
+One disadvantage, often not a
+这种结构的一大劣势
+
+129
+00:04:24,050 --> 00:04:25,160
+major one but something to
+也许不是主要的 但还是需要考虑
+
+130
+00:04:25,250 --> 00:04:26,440
+think about, is that if you have
+那就是当网络中的
+
+131
+00:04:27,000 --> 00:04:28,450
+a large number of neurons
+神经元数量很多的时候
+
+132
+00:04:28,960 --> 00:04:30,040
+in your network, then it can
+这种结构会显得
+
+133
+00:04:30,230 --> 00:04:31,920
+be more computationally expensive.
+计算量较为庞大
+
+134
+00:04:33,070 --> 00:04:35,790
+Although within reason, this is often hopefully not a huge problem.
+虽然有这个情况 但通常来讲不成问题
+
+135
+00:04:36,840 --> 00:04:38,420
+The main potential problem of
+这种大型网络结构最主要的问题
+
+136
+00:04:38,540 --> 00:04:39,710
+these much larger neural networks is that it could be more prone to overfitting
+还是它更容易出现过拟合现象
+
+137
+00:04:39,980 --> 00:04:44,120
+and it turns out if you're applying neural
+事实上 如果你经常应用神经网络
+
+138
+00:04:44,700 --> 00:04:46,900
+network very often using
+特别是大型神经网络的话
+
+139
+00:04:47,240 --> 00:04:48,900
+a large neural network often it's actually the larger, the better
+你就会发现越大型的网络性能越好
+
+140
+00:04:50,610 --> 00:04:51,700
+but if it's overfitting, you can
+但如果发生了过拟合
+
+141
+00:04:51,890 --> 00:04:53,800
+then use regularization to address
+你可以使用正则化的方法
+
+142
+00:04:54,230 --> 00:04:56,510
+overfitting, usually using
+来修正过拟合
+
+143
+00:04:56,910 --> 00:04:58,480
+a larger neural network by using
+一般来说 使用一个大型的神经网络
+
+144
+00:04:58,720 --> 00:04:59,980
+regularization to address is
+并使用正则化来修正过拟合问题
+
+145
+00:05:00,310 --> 00:05:01,910
+overfitting that's often more
+通常比使用一个小型的神经网络
+
+146
+00:05:02,130 --> 00:05:04,160
+effective than using a smaller neural network.
+效果更好
+
+147
+00:05:05,100 --> 00:05:06,940
+And the main possible disadvantage is
+但主要可能出现的一大问题
+
+148
+00:05:07,130 --> 00:05:09,420
+that it can be more computationally expensive.
+就是计算量相对较大
+
+149
+00:05:10,470 --> 00:05:11,940
+And finally, one of the other decisions is, say,
+最后 你还需要选择的
+
+150
+00:05:12,280 --> 00:05:14,340
+the number of hidden layers you want to have, right?
+是隐藏层的层数
+
+151
+00:05:14,480 --> 00:05:16,400
+So, do you want
+你是应该用一个
+
+152
+00:05:17,030 --> 00:05:18,130
+one hidden layer or do
+隐藏层呢
+
+153
+00:05:18,380 --> 00:05:19,700
+you want three hidden layers, as
+还是应该用三个呢 就像我们这里画的
+
+154
+00:05:20,040 --> 00:05:21,790
+we've shown here, or do you want two hidden layers?
+或者还是用两个隐藏层呢
+
+155
+00:05:23,250 --> 00:05:24,850
+And usually, as I
+通常来说
+
+156
+00:05:24,980 --> 00:05:25,720
+think I said in the previous
+正如我在前面的视频中讲过的
+
+157
+00:05:26,190 --> 00:05:27,420
+video, using a single
+默认的情况是
+
+158
+00:05:27,640 --> 00:05:29,570
+hidden layer is a reasonable default, but
+使用一个隐藏层是比较合理的选择
+
+159
+00:05:29,780 --> 00:05:30,800
+if you want to choose the
+但是如果你想要选择
+
+160
+00:05:30,890 --> 00:05:32,400
+number of hidden layers, one
+一个最合适的隐藏层层数
+
+161
+00:05:32,580 --> 00:05:33,610
+other thing you can try is
+你也可以试试
+
+162
+00:05:34,270 --> 00:05:35,800
+find yourself a training cross-validation,
+把数据分割为训练集 验证集
+
+163
+00:05:36,660 --> 00:05:38,320
+and test set split and try
+和测试集 然后试试使用
+
+164
+00:05:38,730 --> 00:05:40,070
+training neural networks with one
+一个隐藏层的神经网络来训练模型
+
+165
+00:05:40,260 --> 00:05:41,210
+hidden layer or two hidden
+然后试试两个 三个隐藏层
+
+166
+00:05:41,490 --> 00:05:42,810
+layers or three hidden layers and
+以此类推
+
+167
+00:05:43,230 --> 00:05:44,300
+see which of those neural
+然后看看哪个神经网络
+
+168
+00:05:44,460 --> 00:05:47,460
+networks performs best on the cross-validation sets.
+在交叉验证集上表现得最理想
+
+169
+00:05:48,180 --> 00:05:49,190
+You take your three neural networks
+也就是说 你得到了三个神经网络模型
+
+170
+00:05:49,660 --> 00:05:50,510
+with one, two and three hidden
+分别有一个 两个 三个隐藏层
+
+171
+00:05:50,780 --> 00:05:52,410
+layers, and compute the
+然后你对每一个模型
+
+172
+00:05:52,570 --> 00:05:53,870
+cross validation error at Jcv
+都用交叉验证集数据进行测试
+
+173
+00:05:54,140 --> 00:05:55,120
+and all of
+算出三种情况下的
+
+174
+00:05:55,240 --> 00:05:56,630
+them and use that to
+交叉验证集误差Jcv
+
+175
+00:05:56,960 --> 00:05:58,350
+select which of these
+然后选出你认为最好的
+
+176
+00:05:58,690 --> 00:06:00,290
+is you think the best neural network.
+神经网络结构
+
+177
+00:06:02,580 --> 00:06:04,020
+So, that's it for
+好的 以上就是我们介绍的
+
+178
+00:06:04,230 --> 00:06:05,490
+bias and variance and ways
+偏差和方差问题
+
+179
+00:06:05,780 --> 00:06:08,170
+like learning curves, who tried to diagnose these problems.
+以及如学习曲线这样的诊断法
+
+180
+00:06:08,560 --> 00:06:09,860
+As far as what
+在改进学习算法的表现时
+
+181
+00:06:09,930 --> 00:06:11,020
+you think is implied, for one
+你可以充分运用
+
+182
+00:06:11,250 --> 00:06:12,480
+might be truthful or not
+以上这些内容来判断
+
+183
+00:06:12,630 --> 00:06:13,500
+truthful things to try
+哪些途径是有帮助的
+
+184
+00:06:13,910 --> 00:06:15,720
+to improve the performance of a learning algorithm.
+哪些方法是无意义的
+
+185
+00:06:16,960 --> 00:06:18,000
+If you understood the contents
+如果你理解了以上几节视频中
+
+186
+00:06:18,990 --> 00:06:20,700
+of the last few videos and if
+介绍的内容
+
+187
+00:06:20,790 --> 00:06:22,020
+you apply them you actually
+并且懂得如何运用
+
+188
+00:06:22,630 --> 00:06:24,300
+be much more effective already and
+那么你已经很厉害了
+
+189
+00:06:24,430 --> 00:06:25,890
+getting learning algorithms to work on problems
+你也能像硅谷的
+
+190
+00:06:26,610 --> 00:06:27,970
+and even a large fraction,
+大部分机器学习专家一样
+
+191
+00:06:28,560 --> 00:06:29,810
+maybe the majority of practitioners
+他们每天的工作就是
+
+192
+00:06:30,540 --> 00:06:31,860
+of machine learning here in
+有效地使用这些学习算法
+
+193
+00:06:32,060 --> 00:06:34,760
+Silicon Valley today doing these things as their full-time jobs.
+来解决众多具体的问题
+
+194
+00:06:35,820 --> 00:06:37,560
+So I hope that these
+我希望这几节中
+
+195
+00:06:37,990 --> 00:06:39,110
+pieces of advice
+提到的一些技巧
+
+196
+00:06:39,560 --> 00:06:41,420
+on by experience in diagnostics
+关于方差 偏差 以及学习曲线为代表的诊断法
+
+197
+00:06:42,730 --> 00:06:44,110
+will help you to much effectively
+能够真正帮助你更有效率地
+
+198
+00:06:44,790 --> 00:06:47,270
+and powerfully apply learning and
+应用机器学习
+
+199
+00:06:48,000 --> 00:06:49,300
+get them to work very well.
+让它们高效地工作
+
diff --git a/srt/11 - 1 - Prioritizing What to Work On (10 min).srt b/srt/11 - 1 - Prioritizing What to Work On (10 min).srt
new file mode 100644
index 00000000..699dd80e
--- /dev/null
+++ b/srt/11 - 1 - Prioritizing What to Work On (10 min).srt
@@ -0,0 +1,1436 @@
+1
+00:00:01,180 --> 00:00:02,410
+In the next few videos I'd
+在接下来的视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,560 --> 00:00:04,780
+like to talk about machine learning system design.
+我将谈到机器学习系统的设计
+
+3
+00:00:05,780 --> 00:00:06,950
+These videos will touch on
+这些视频将谈及
+
+4
+00:00:07,190 --> 00:00:08,370
+the main issues that you may
+在设计复杂的机器学习系统时
+
+5
+00:00:08,540 --> 00:00:10,140
+face when designing a
+你将遇到的
+
+6
+00:00:10,220 --> 00:00:11,450
+complex machine learning system,
+主要问题
+
+7
+00:00:12,470 --> 00:00:13,310
+and will actually try to give
+同时我们会试着
+
+8
+00:00:13,490 --> 00:00:14,680
+advice on how to
+给出一些关于
+
+9
+00:00:14,780 --> 00:00:17,580
+strategize putting together a complex machine learning system.
+如何巧妙构建一个复杂的机器学习系统的建议
+
+10
+00:00:18,970 --> 00:00:20,190
+In case this next set
+接下来的视频
+
+11
+00:00:20,340 --> 00:00:21,390
+of videos seems a little
+可能看起来
+
+12
+00:00:21,530 --> 00:00:23,140
+disjointed that's because these
+有点不连贯
+
+13
+00:00:23,360 --> 00:00:24,340
+videos will touch on a
+因为这些视频会涉及
+
+14
+00:00:24,450 --> 00:00:25,800
+range of the different issues that
+一些你在设计机器学习系统时
+
+15
+00:00:26,150 --> 00:00:28,220
+you may come across when designing complex learning systems.
+可能会遇到的不同问题
+
+16
+00:00:29,600 --> 00:00:31,080
+And even though the
+虽然
+
+17
+00:00:31,160 --> 00:00:32,270
+next set of videos may seem
+下面的课程的
+
+18
+00:00:32,560 --> 00:00:34,740
+somewhat less mathematical, I think
+的数学性可能不是那么强
+
+19
+00:00:35,050 --> 00:00:36,180
+that this material may turn
+但是我认为我们将要讲到的这些东西
+
+20
+00:00:36,500 --> 00:00:38,280
+out to be very useful, and
+是非常有用的
+
+21
+00:00:38,400 --> 00:00:39,650
+potentially huge time savers
+可能在构建大型的机器学习系统时
+
+22
+00:00:40,120 --> 00:00:41,610
+when you're building big machine learning systems.
+节省大量的时间
+
+23
+00:00:42,890 --> 00:00:44,140
+Concretely, I'd like to
+具体的说
+
+24
+00:00:44,260 --> 00:00:45,710
+begin with the issue of
+我首先要讲的是
+
+25
+00:00:46,330 --> 00:00:47,500
+prioritizing how to spend
+当我们在进行机器学习时
+
+26
+00:00:47,790 --> 00:00:48,680
+your time on what to work
+着重要考虑什么问题
+
+27
+00:00:48,980 --> 00:00:50,330
+on, and I'll begin
+首先我要举一个
+
+28
+00:00:50,740 --> 00:00:52,220
+with an example on spam classification.
+垃圾邮件分类的例子
+
+29
+00:00:55,580 --> 00:00:57,280
+Let's say you want to build a spam classifier.
+假如你想建立一个垃圾邮件分类器
+
+30
+00:00:58,540 --> 00:00:59,740
+Here are a couple of examples
+看这些垃圾邮件与
+
+31
+00:01:00,180 --> 00:01:02,340
+of obvious spam and non-spam emails.
+非垃圾邮件的例子
+
+32
+00:01:03,400 --> 00:01:05,350
+if the one on the left tried to sell things.
+左边这封邮件想向你推销东西
+
+33
+00:01:06,270 --> 00:01:07,640
+And notice how spammers
+注意 这封垃圾邮件
+
+34
+00:01:08,470 --> 00:01:10,080
+will deliberately misspell words,
+有意的拼错一些单词 就像
+
+35
+00:01:10,540 --> 00:01:13,470
+like Vincent with a 1 there, and mortgages.
+"Med1cine" 中有一个1 "m0rtgage"里有个0
+
+36
+00:01:14,850 --> 00:01:16,350
+And on the right as maybe
+右边的邮件
+
+37
+00:01:16,560 --> 00:01:17,760
+an obvious example of non-stamp
+显然不是一个垃圾邮件
+
+38
+00:01:18,480 --> 00:01:20,680
+email, actually email from my younger brother.
+实际上这是我弟弟写给我的
+
+39
+00:01:21,710 --> 00:01:22,940
+Let's say we have a labeled
+假设我们已经有一些
+
+40
+00:01:23,350 --> 00:01:24,560
+training set of some
+加过标签的训练集
+
+41
+00:01:24,860 --> 00:01:26,130
+number of spam emails and
+包括标注的垃圾邮件
+
+42
+00:01:26,240 --> 00:01:28,200
+some non-spam emails denoted
+表示为y=1
+
+43
+00:01:28,240 --> 00:01:30,780
+with labels y equals 1 or 0,
+和非垃圾邮件 表示为y=0
+
+44
+00:01:31,290 --> 00:01:32,600
+how do we build a
+我们如何
+
+45
+00:01:33,110 --> 00:01:34,900
+classifier using supervised learning
+以监督学习的方法来构造一个分类器
+
+46
+00:01:35,230 --> 00:01:37,130
+to distinguish between spam and non-spam?
+来区分垃圾邮件和非垃圾邮件呢?
+
+47
+00:01:38,130 --> 00:01:39,670
+In order to apply supervised learning,
+为了应用监督学习
+
+48
+00:01:40,340 --> 00:01:41,430
+the first decision we must
+我们首先
+
+49
+00:01:41,660 --> 00:01:43,190
+make is how do
+必须确定的是
+
+50
+00:01:43,270 --> 00:01:44,860
+we want to represent x, that
+如何用邮件的特征
+
+51
+00:01:45,260 --> 00:01:46,590
+is the features of the email.
+构造向量x
+
+52
+00:01:47,430 --> 00:01:48,900
+Given the features x and
+给出训练集中的
+
+53
+00:01:49,160 --> 00:01:50,290
+the labels y in our
+特征x和标签y
+
+54
+00:01:50,410 --> 00:01:51,510
+training set, we can then
+我们就能够训练出某种分类器
+
+55
+00:01:51,720 --> 00:01:54,660
+train a classifier, for example using logistic regression.
+比如用逻辑回归的方法
+
+56
+00:01:56,150 --> 00:01:57,120
+Here's one way to choose
+这里有一种选择
+
+57
+00:01:57,550 --> 00:01:59,630
+a set of features for our emails.
+邮件的一些特征变量的方法
+
+58
+00:02:00,850 --> 00:02:01,930
+We could come up with,
+比如说我们可能会想出
+
+59
+00:02:02,280 --> 00:02:03,630
+say, a list of maybe
+一系列单词
+
+60
+00:02:03,870 --> 00:02:05,170
+a hundred words that we think
+或者成百上千的单词
+
+61
+00:02:05,440 --> 00:02:06,850
+are indicative of whether e-mail
+我们可以认为这些单词
+
+62
+00:02:07,190 --> 00:02:09,230
+is spam or non-spam, for
+能够用来区分垃圾邮件或非垃圾邮件
+
+63
+00:02:09,370 --> 00:02:10,410
+example, if a piece of
+比如说 如果有封邮件
+
+64
+00:02:10,580 --> 00:02:11,640
+e-mail contains the word 'deal'
+包含单词"deal(交易)"
+
+65
+00:02:12,340 --> 00:02:13,350
+maybe it's more likely to be
+那么它就很有可能是一封垃圾邮件
+
+66
+00:02:13,460 --> 00:02:14,410
+spam if it contains
+同时
+
+67
+00:02:14,850 --> 00:02:16,280
+the word 'buy' maybe more
+包含单词"buy(买)"的邮件
+
+68
+00:02:16,450 --> 00:02:17,670
+likely to be spam, a
+也很有可能是一封垃圾邮件
+
+69
+00:02:17,990 --> 00:02:19,340
+word like 'discount' is more
+包含"discount(折扣)"的邮件
+
+70
+00:02:19,580 --> 00:02:20,900
+likely to be spam, whereas if
+也很有可能是垃圾邮件
+
+71
+00:02:21,080 --> 00:02:22,340
+a piece of email contains my name,
+如果一封邮件中
+
+72
+00:02:23,920 --> 00:02:25,350
+Andrew, maybe that means
+包含了我的名字"Andrew"
+
+73
+00:02:25,630 --> 00:02:26,870
+the person actually knows who
+这有可能是一个知道我的人写的
+
+74
+00:02:26,910 --> 00:02:27,740
+I am and that might mean it's
+这说明这封邮件
+
+75
+00:02:27,900 --> 00:02:30,090
+less likely to be spam.
+不太可能是垃圾邮件
+
+76
+00:02:31,470 --> 00:02:32,580
+And maybe for some reason I think
+因为某些原因 我认为
+
+77
+00:02:32,840 --> 00:02:33,990
+the word "now" may be
+"now(现在)"这个单词表明了
+
+78
+00:02:34,260 --> 00:02:35,680
+indicative of non-spam because
+这封邮件可能并不是垃圾邮件
+
+79
+00:02:35,980 --> 00:02:37,150
+I get a lot of urgent
+因为我经常收到一些很紧急的邮件
+
+80
+00:02:37,540 --> 00:02:39,370
+emails, and so on,
+当然还有别的单词
+
+81
+00:02:39,520 --> 00:02:41,220
+and maybe we choose a hundred words or so.
+我们可以选出这样成百上千的单词
+
+82
+00:02:42,380 --> 00:02:43,510
+Given a piece of email,
+给出一封这样的邮件
+
+83
+00:02:43,970 --> 00:02:44,930
+we can then take this piece
+我们可以将这封邮件
+
+84
+00:02:45,180 --> 00:02:46,220
+of email and encode
+用一个特征向量
+
+85
+00:02:46,640 --> 00:02:47,970
+it into a feature
+来表示
+
+86
+00:02:48,290 --> 00:02:49,930
+vector as follows.
+方法如下
+
+87
+00:02:50,810 --> 00:02:51,450
+I'm going to take my list of a
+现在我列出一些
+
+88
+00:02:51,720 --> 00:02:54,560
+hundred words and sort
+之前选好的单词
+
+89
+00:02:54,960 --> 00:02:56,620
+them in alphabetical order say.
+然后按字典序排序
+
+90
+00:02:57,210 --> 00:02:57,980
+It doesn't have to be sorted.
+其实并不是一定要排序的啦
+
+91
+00:02:58,450 --> 00:02:59,910
+But, you know, here's a, here's
+你看
+
+92
+00:03:00,110 --> 00:03:02,080
+my list of words, just count
+这些是之前的单词 像“discount”
+
+93
+00:03:02,710 --> 00:03:03,950
+and so on, until eventually
+等等
+
+94
+00:03:04,160 --> 00:03:05,430
+I'll get down to now, and so
+还有单词"now" 等等
+
+95
+00:03:06,080 --> 00:03:07,230
+on and given a piece
+看看这个例子
+
+96
+00:03:07,350 --> 00:03:08,540
+of e-mail like that shown on the
+右边的这封邮件
+
+97
+00:03:08,610 --> 00:03:09,640
+right, I'm going to
+我准备
+
+98
+00:03:09,770 --> 00:03:11,400
+check and see whether or
+检查一下这些词汇
+
+99
+00:03:11,450 --> 00:03:12,560
+not each of these words
+看它们是否
+
+100
+00:03:13,030 --> 00:03:14,560
+appears in the e-mail and then
+出现在这封邮件中
+
+101
+00:03:14,810 --> 00:03:16,400
+I'm going to define a feature
+我用一个
+
+102
+00:03:16,580 --> 00:03:19,130
+vector x where in
+特征向量x
+
+103
+00:03:19,260 --> 00:03:20,260
+this piece of an email on
+表示右边的这封邮件
+
+104
+00:03:20,340 --> 00:03:21,520
+the right, my name doesn't
+我的名字没有出现
+
+105
+00:03:21,930 --> 00:03:23,210
+appear so I'm gonna put a zero there.
+因此这里是0
+
+106
+00:03:24,070 --> 00:03:25,410
+The word "by" does appear,
+单词"buy(购买)"出现了
+
+107
+00:03:26,790 --> 00:03:27,690
+so I'm gonna put a one there
+所以这里是1
+
+108
+00:03:28,090 --> 00:03:29,450
+and I'm just gonna put one's or zeroes.
+注意在向量里面只有1或0 表示有没有出现
+
+109
+00:03:30,170 --> 00:03:31,550
+I'm gonna put a
+所以尽管"buy"出现了两次
+
+110
+00:03:31,730 --> 00:03:33,950
+one even though the word "by" occurs twice.
+这里仍然只是1
+
+111
+00:03:34,600 --> 00:03:36,490
+I'm not gonna recount how many times the word occurs.
+注意我不会去统计每个词出现的次数
+
+112
+00:03:37,590 --> 00:03:40,280
+The word "due" appears, I put a one there.
+单词"deal"也出现了 所以这里也是1
+
+113
+00:03:40,900 --> 00:03:42,450
+The word "discount" doesn't appear, at
+单词"discount"并没有出现
+
+114
+00:03:42,620 --> 00:03:43,680
+least not in this this little
+至少在这封邮件里是这样
+
+115
+00:03:44,520 --> 00:03:46,140
+short email, and so on.
+以此类推
+
+116
+00:03:46,810 --> 00:03:48,740
+The word "now" does appear and so on.
+单词"now"出现了
+
+117
+00:03:48,870 --> 00:03:50,250
+So I put ones and zeroes
+所以我在特征向量中
+
+118
+00:03:50,560 --> 00:03:52,560
+in this feature vector depending on
+依据对应的单词是否出现
+
+119
+00:03:52,720 --> 00:03:54,230
+whether or not a particular word appears.
+填上0和1
+
+120
+00:03:55,060 --> 00:03:56,740
+And in this example my
+在这个例子中
+
+121
+00:03:56,870 --> 00:03:58,850
+feature vector would have
+因为我选择了100个单词
+
+122
+00:03:59,110 --> 00:04:00,920
+to mention one hundred,
+用于表示是否可能为垃圾邮件
+
+123
+00:04:02,310 --> 00:04:03,960
+if I have a hundred,
+所以
+
+124
+00:04:04,310 --> 00:04:05,460
+if if I chose a hundred
+这个特征向量x
+
+125
+00:04:05,650 --> 00:04:06,850
+words to use for
+的维度是100
+
+126
+00:04:07,010 --> 00:04:08,980
+this representation and each
+并且
+
+127
+00:04:09,240 --> 00:04:13,060
+of my features Xj will
+如果这个特定的单词
+
+128
+00:04:13,300 --> 00:04:15,150
+basically be 1 if
+即单词 j 出现在
+
+129
+00:04:16,360 --> 00:04:17,410
+you have a particular word that,
+这封邮件中
+
+130
+00:04:17,490 --> 00:04:18,930
+we'll call this word j, appears
+那么每一个特征变量
+
+131
+00:04:19,420 --> 00:04:20,940
+in the email and Xj
+Xj 的值为1
+
+132
+00:04:22,400 --> 00:04:23,910
+would be zero otherwise.
+反之 Xj为0
+
+133
+00:04:25,700 --> 00:04:25,700
+Okay.
+好的
+
+134
+00:04:25,900 --> 00:04:27,440
+So that gives me
+这样我们就可以使用特征向量
+
+135
+00:04:27,600 --> 00:04:30,220
+a feature representation of a piece of email.
+来表示这封邮件
+
+136
+00:04:30,920 --> 00:04:32,060
+By the way, even though I've
+顺便说一句
+
+137
+00:04:32,200 --> 00:04:34,260
+described this process as manually
+虽然我所描述的这个过程是我自己
+
+138
+00:04:34,920 --> 00:04:36,790
+picking a hundred words, in
+选取的100个单词
+
+139
+00:04:36,950 --> 00:04:38,510
+practice what's most commonly
+但是在实际工作中 最普遍的做法是
+
+140
+00:04:38,940 --> 00:04:40,140
+done is to look through
+遍历整个训练集
+
+141
+00:04:40,400 --> 00:04:42,710
+a training set, and in
+然后
+
+142
+00:04:42,800 --> 00:04:43,980
+the training set depict the
+在训练集中
+
+143
+00:04:44,050 --> 00:04:45,690
+most frequently occurring n words
+选出出现次数最多的n个单词
+
+144
+00:04:46,080 --> 00:04:47,290
+where n is usually between ten
+n一般介于10,000和50,000之间
+
+145
+00:04:47,450 --> 00:04:49,310
+thousand and fifty thousand, and use
+然后把这些单词
+
+146
+00:04:49,550 --> 00:04:50,810
+those as your features.
+作为你要用的特征
+
+147
+00:04:51,630 --> 00:04:52,910
+So rather than manually picking a
+因此不同于手动选取
+
+148
+00:04:53,090 --> 00:04:54,220
+hundred words, here you look
+我们只用遍历
+
+149
+00:04:54,390 --> 00:04:56,030
+through the training examples and
+训练样本
+
+150
+00:04:56,130 --> 00:04:57,570
+pick the most frequently occurring words
+然后选出出现频率最高的词语
+
+151
+00:04:57,930 --> 00:04:58,860
+like ten thousand to fifty thousand
+差不多是10,000到50,000个单词
+
+152
+00:04:59,260 --> 00:05:00,680
+words, and those form the
+这些单词会构成特征
+
+153
+00:05:00,820 --> 00:05:01,550
+features that you are going
+这样你就可以用它们
+
+154
+00:05:01,640 --> 00:05:04,320
+to use to represent your email for spam classification.
+来做垃圾邮件分类
+
+155
+00:05:05,450 --> 00:05:06,850
+Now, if you're building a
+如果你正在构造一个垃圾邮件分类器
+
+156
+00:05:06,910 --> 00:05:09,020
+spam classifier one question
+你应该会面对这样一个问题
+
+157
+00:05:09,570 --> 00:05:11,260
+that you may face is, what's
+那就是
+
+158
+00:05:11,500 --> 00:05:12,580
+the best use of your time
+如何在有限的时间和精力下
+
+159
+00:05:13,230 --> 00:05:14,820
+in order to make your
+改进你的方法
+
+160
+00:05:14,980 --> 00:05:17,510
+spam classifier have higher accuracy, you have lower error.
+从而使得你的垃圾邮件分类器具有较高的准确度
+
+161
+00:05:18,910 --> 00:05:21,350
+One natural inclination is going to collect lots of data.
+从直觉上讲 是要收集大量的数据
+
+162
+00:05:21,780 --> 00:05:21,780
+Right?
+对吧?
+
+163
+00:05:22,030 --> 00:05:23,120
+And in fact there's this tendency
+事实上确实好多人这么做
+
+164
+00:05:23,700 --> 00:05:24,670
+to think that, well the
+很多人认为
+
+165
+00:05:24,740 --> 00:05:26,590
+more data we have the better the algorithm will do.
+收集越多的数据 算法就会表现的越好
+
+166
+00:05:27,560 --> 00:05:28,850
+And in fact, in the email
+事实上
+
+167
+00:05:29,100 --> 00:05:30,500
+spam domain, there are actually
+就垃圾邮件分类而言
+
+168
+00:05:31,310 --> 00:05:32,840
+pretty serious projects called Honey
+有一个叫做"Honey Pot"的项目
+
+169
+00:05:33,180 --> 00:05:35,310
+Pot Projects, which create fake
+它可以建立一个
+
+170
+00:05:35,710 --> 00:05:37,090
+email addresses and try to
+假的邮箱地址
+
+171
+00:05:37,200 --> 00:05:38,910
+get these fake email addresses into
+故意将这些地址
+
+172
+00:05:39,140 --> 00:05:40,710
+the hands of spammers and use
+泄露给发垃圾邮件的人
+
+173
+00:05:40,910 --> 00:05:41,870
+that to try to collect tons
+这样就能收到大量的垃圾邮件
+
+174
+00:05:42,140 --> 00:05:43,440
+of spam email, and therefore
+你看 这样的话
+
+175
+00:05:44,120 --> 00:05:46,120
+you know, get a lot of spam data to train learning algorithms.
+我们就能得到非常多的垃圾邮件来训练学习算法
+
+176
+00:05:47,060 --> 00:05:48,760
+But we've already seen in the
+但是
+
+177
+00:05:49,150 --> 00:05:50,630
+previous sets of videos
+在前面的课程中我们知道
+
+178
+00:05:50,650 --> 00:05:53,340
+that getting lots of data will often help, but not all the time.
+大量的数据可能会有帮助 也可能没有
+
+179
+00:05:54,600 --> 00:05:55,810
+But for most machine learning problems,
+对于大部分的机器学习问题
+
+180
+00:05:56,430 --> 00:05:57,280
+there are a lot of other things
+还有很多办法
+
+181
+00:05:57,620 --> 00:05:59,780
+you could usually imagine doing to improve performance.
+用来提升机器学习的效果
+
+182
+00:06:00,970 --> 00:06:02,130
+For spam, one thing you
+比如对于垃圾邮件而言
+
+183
+00:06:02,230 --> 00:06:03,420
+might think of is to develop
+也许你会想到
+
+184
+00:06:03,960 --> 00:06:05,620
+more sophisticated features on the
+用更复杂的特征变量
+
+185
+00:06:05,820 --> 00:06:08,070
+email, maybe based on the email routing information.
+像是邮件的路径信息
+
+186
+00:06:09,850 --> 00:06:11,920
+And this would be information contained in the email header.
+这种信息通常会出现在邮件的标题中
+
+187
+00:06:13,130 --> 00:06:14,820
+So, when spammers send email,
+因此 垃圾邮件发送方在发送垃圾邮件时
+
+188
+00:06:15,250 --> 00:06:16,420
+very often they will try
+他们总会试图
+
+189
+00:06:16,690 --> 00:06:18,110
+to obscure the origins of
+让这个邮件的来源变得模糊一些
+
+190
+00:06:18,340 --> 00:06:20,260
+the email, and maybe
+或者是
+
+191
+00:06:20,470 --> 00:06:21,880
+use fake email headers.
+用假的邮件标题
+
+192
+00:06:22,900 --> 00:06:24,060
+Or send email through very
+或者通过不常见的服务器
+
+193
+00:06:24,410 --> 00:06:26,360
+unusual sets of computer service.
+来发送邮件
+
+194
+00:06:27,060 --> 00:06:29,880
+Through very unusual routes, in order to get the spam to you.
+用不常见的路由 他们就能给你发送垃圾邮件
+
+195
+00:06:30,490 --> 00:06:33,690
+And some of this information will be reflected in the email header.
+而且这些信息也有可能包含在邮件标题部分
+
+196
+00:06:35,000 --> 00:06:36,600
+And so one can imagine,
+因此可以想到
+
+197
+00:06:38,070 --> 00:06:39,300
+looking at the email headers and
+我们可以通过邮件的标题部分
+
+198
+00:06:39,410 --> 00:06:41,060
+trying to develop more sophisticated features
+来构造更加复杂的特征
+
+199
+00:06:41,510 --> 00:06:42,760
+to capture this sort of
+来获得一系列的邮件路由信息
+
+200
+00:06:43,010 --> 00:06:45,770
+email routing information to identify if something is spam.
+进而判定这是否是一封垃圾邮件
+
+201
+00:06:46,650 --> 00:06:47,890
+Something else you might consider doing
+你还可能会想到别的方法
+
+202
+00:06:48,380 --> 00:06:49,300
+is to look at the
+比如
+
+203
+00:06:49,640 --> 00:06:50,860
+email message body, that is
+从邮件的正文出发
+
+204
+00:06:51,100 --> 00:06:54,350
+the email text, and try to develop more sophisticated features.
+寻找一些复杂点的特征
+
+205
+00:06:55,320 --> 00:06:56,310
+For example, should the word
+例如
+
+206
+00:06:56,500 --> 00:06:57,560
+'discount' and the word
+单词"discount"
+
+207
+00:06:57,690 --> 00:06:59,340
+'discounts' be treated as
+是否和单词"discounts"是一样的
+
+208
+00:06:59,550 --> 00:07:01,810
+the same words or should
+又比如
+
+209
+00:07:02,240 --> 00:07:04,120
+we have treat the words 'deal' and 'dealer' as the same word?
+单词"deal(交易)"和"dealer(交易商)"是否也应视为等同
+
+210
+00:07:04,380 --> 00:07:05,610
+Maybe even though one is
+甚至 像这个例子中
+
+211
+00:07:06,130 --> 00:07:08,020
+lower case and one in capitalized in this example.
+有的单词小写有的大写
+
+212
+00:07:08,740 --> 00:07:10,530
+Or do we want more complex features about punctuation because maybe spam
+或者我们是否应该用标点符号来构造复杂的特征变量
+
+213
+00:07:12,740 --> 00:07:14,110
+is using exclamation marks a lot more.
+因为垃圾邮件可能会更多的使用感叹号
+
+214
+00:07:14,450 --> 00:07:14,730
+I don't know.
+这些都不一定
+
+215
+00:07:15,580 --> 00:07:16,850
+And along the same lines, maybe
+同样的
+
+216
+00:07:17,170 --> 00:07:18,560
+we also want to develop more
+我们也可能构造
+
+217
+00:07:18,750 --> 00:07:20,380
+sophisticated algorithms to detect
+更加复杂的算法来检测
+
+218
+00:07:21,120 --> 00:07:22,700
+and maybe to correct to deliberate misspellings,
+或者纠正那些故意的拼写错误
+
+219
+00:07:23,360 --> 00:07:24,700
+like mortgage, medicine, watches.
+例如 "m0rtgage" "med1cine" "w4tches"
+
+220
+00:07:25,700 --> 00:07:26,890
+Because spammers actually do this,
+因为垃圾邮件发送方确实这么做了
+
+221
+00:07:27,150 --> 00:07:28,400
+because if you have watches
+因为如果你将4放到"w4tches"中
+
+222
+00:07:29,420 --> 00:07:31,060
+with a 4 in there then well,
+那么
+
+223
+00:07:31,450 --> 00:07:32,720
+with the simple technique that we
+用我们之前提到的
+
+224
+00:07:32,840 --> 00:07:34,760
+talked about just now, the spam
+简单的方法
+
+225
+00:07:35,090 --> 00:07:36,280
+classifier might not equate
+垃圾邮件分类器不会把"w4tches"
+
+226
+00:07:36,800 --> 00:07:38,170
+this as the same thing as the
+和"watches"
+
+227
+00:07:38,230 --> 00:07:40,260
+word "watches," and so it
+看成一样的
+
+228
+00:07:40,390 --> 00:07:41,430
+may have a harder time realizing
+这样我们就很难区分这些
+
+229
+00:07:42,000 --> 00:07:43,930
+that something is spam with these deliberate misspellings.
+故意拼错的垃圾邮件
+
+230
+00:07:44,830 --> 00:07:45,940
+And this is why spammers do it.
+发垃圾邮件的也很机智 他们这么做就逃避了一些过滤
+
+231
+00:07:48,230 --> 00:07:49,280
+While working on a machine learning
+当我们使用机器学习时
+
+232
+00:07:49,630 --> 00:07:51,370
+problem, very often you
+总是可以“头脑风暴”一下
+
+233
+00:07:51,480 --> 00:07:54,690
+can brainstorm lists of different things to try, like these.
+想出一堆方法来试试 就像这样
+
+234
+00:07:55,170 --> 00:07:56,560
+By the way, I've actually
+顺带一提
+
+235
+00:07:56,790 --> 00:07:58,480
+worked on the spam
+我有一段时间
+
+236
+00:07:58,900 --> 00:08:00,000
+problem myself for a while.
+研究过垃圾邮件分类的问题
+
+237
+00:08:00,650 --> 00:08:01,610
+And I actually spent quite some time on it.
+实际上我花了很多时间来研究这个
+
+238
+00:08:01,770 --> 00:08:03,040
+And even though I kind
+尽管我能够理解
+
+239
+00:08:03,360 --> 00:08:04,350
+of understand the spam problem,
+垃圾邮件分类的问题
+
+240
+00:08:04,820 --> 00:08:05,820
+I actually know a bit about it,
+我确实懂一些这方面的东西
+
+241
+00:08:06,470 --> 00:08:07,380
+I would actually have a very
+但是
+
+242
+00:08:07,600 --> 00:08:09,160
+hard time telling you of
+我还是很难告诉你
+
+243
+00:08:09,290 --> 00:08:10,790
+these four options which is
+这四种方法中
+
+244
+00:08:10,980 --> 00:08:12,190
+the best use of your time
+你最该去使用哪一种
+
+245
+00:08:12,670 --> 00:08:14,180
+so what happens, frankly what
+事实上 坦白地说
+
+246
+00:08:14,320 --> 00:08:15,790
+happens far too often is
+最常见的情况是
+
+247
+00:08:16,040 --> 00:08:17,240
+that a research group or
+一个研究小组
+
+248
+00:08:17,350 --> 00:08:20,330
+product group will randomly fixate on one of these options.
+可能会随机确定其中的一个方法
+
+249
+00:08:21,290 --> 00:08:22,870
+And sometimes that turns
+但是有时候
+
+250
+00:08:23,250 --> 00:08:24,350
+out not to be the most
+这种方法
+
+251
+00:08:24,580 --> 00:08:25,610
+fruitful way to spend your
+并不是最有成效的
+
+252
+00:08:25,740 --> 00:08:27,700
+time depending, you know, on which
+你知道
+
+253
+00:08:27,900 --> 00:08:30,400
+of these options someone ends up randomly fixating on.
+你只是随机选择了其中的一种方法
+
+254
+00:08:31,350 --> 00:08:32,670
+By the way, in fact, if
+实际上
+
+255
+00:08:32,800 --> 00:08:33,780
+you even get to the stage
+当你需要通过
+
+256
+00:08:34,150 --> 00:08:35,710
+where you brainstorm a list
+头脑风暴来想出
+
+257
+00:08:35,900 --> 00:08:37,100
+of different options to try, you're
+不同方法来尝试去提高精度的时候
+
+258
+00:08:37,250 --> 00:08:38,740
+probably already ahead of the curve.
+你可能已经超越了很多人了
+
+259
+00:08:39,390 --> 00:08:41,190
+Sadly, what most people do is
+令人难过的是 大部分人
+
+260
+00:08:41,420 --> 00:08:42,160
+instead of trying to list
+他们并不尝试着
+
+261
+00:08:42,230 --> 00:08:43,010
+out the options of things
+列出可能的方法
+
+262
+00:08:43,240 --> 00:08:44,510
+you might try, what far too
+他们做的
+
+263
+00:08:44,810 --> 00:08:46,100
+many people do is wake up
+只是
+
+264
+00:08:46,210 --> 00:08:47,380
+one morning and, for some
+某天早上醒来
+
+265
+00:08:47,580 --> 00:08:48,850
+reason, just, you know, have a weird
+因为某些原因
+
+266
+00:08:49,110 --> 00:08:50,440
+gut feeling that, "Oh let's
+有了一个突发奇想
+
+267
+00:08:51,290 --> 00:08:52,670
+have a huge honeypot project
+"让我们来试试 用Honey Pot项目
+
+268
+00:08:53,190 --> 00:08:54,570
+to go and collect tons more data"
+收集大量的数据吧"
+
+269
+00:08:55,320 --> 00:08:56,860
+and for whatever strange reason just
+不管出于什么奇怪的原因
+
+270
+00:08:57,570 --> 00:08:58,540
+sort of wake up one morning and randomly
+早上的灵机一动
+
+271
+00:08:59,050 --> 00:09:00,330
+fixate on one thing and just
+还是随机选一个
+
+272
+00:09:00,540 --> 00:09:02,340
+work on that for six months.
+然后干上大半年
+
+273
+00:09:03,520 --> 00:09:04,170
+But I think we can do better.
+但是我觉得我们有更好的方法
+
+274
+00:09:04,760 --> 00:09:06,130
+And in particular what I'd
+是的
+
+275
+00:09:06,270 --> 00:09:07,130
+like to do in the next
+我们将在随后的课程中
+
+276
+00:09:07,310 --> 00:09:08,410
+video is tell you about
+讲到这个
+
+277
+00:09:08,680 --> 00:09:09,890
+the concept of error analysis
+那就是误差分析
+
+278
+00:09:11,160 --> 00:09:12,530
+and talk about the way
+我会告诉你
+
+279
+00:09:13,270 --> 00:09:15,150
+where you can try
+怎样用一个
+
+280
+00:09:15,360 --> 00:09:16,830
+to have a more systematic way
+更加系统性的方法
+
+281
+00:09:17,360 --> 00:09:18,640
+to choose amongst the options
+从一堆不同的方法中
+
+282
+00:09:18,960 --> 00:09:19,950
+of the many different things you
+选取合适的那一个
+
+283
+00:09:20,010 --> 00:09:21,730
+might work, and therefore be
+因此
+
+284
+00:09:21,860 --> 00:09:23,430
+more likely to select what
+你更有可能选择
+
+285
+00:09:23,640 --> 00:09:24,820
+is actually a good way
+一个真正的好方法
+
+286
+00:09:25,070 --> 00:09:26,070
+to spend your time, you know
+能让你花上几天几周 甚至是几个月
+
+287
+00:09:26,200 --> 00:09:28,920
+for the next few weeks, or next few days or the next few months.
+去进行深入的研究
+
diff --git a/srt/11 - 2 - Error Analysis (13 min).srt b/srt/11 - 2 - Error Analysis (13 min).srt
new file mode 100644
index 00000000..f0d2241c
--- /dev/null
+++ b/srt/11 - 2 - Error Analysis (13 min).srt
@@ -0,0 +1,1970 @@
+1
+00:00:00,210 --> 00:00:01,300
+In the last video, I talked
+在上一节课中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,600 --> 00:00:03,390
+about how when faced with
+我讲到了应当怎样面对
+
+3
+00:00:03,520 --> 00:00:04,780
+a machine learning problem, there are
+机器学习问题
+
+4
+00:00:04,980 --> 00:00:07,260
+often lots of different ideas on how to improve the algorithm.
+有很多提高算法表现的方法
+
+5
+00:00:08,460 --> 00:00:09,510
+In this video let's
+在本次课程中
+
+6
+00:00:09,650 --> 00:00:11,060
+talk about the concepts of error
+我们将会讲到
+
+7
+00:00:11,330 --> 00:00:12,980
+analysis which will help
+误差分析(error analysis)的概念
+
+8
+00:00:13,070 --> 00:00:13,980
+me give you a way to more
+这会帮助你
+
+9
+00:00:14,300 --> 00:00:15,830
+systematically make some of these decisions.
+更系统地做出决定
+
+10
+00:00:18,070 --> 00:00:19,420
+If you're starting work on a
+如果你准备
+
+11
+00:00:19,540 --> 00:00:21,210
+machine learning product or building
+研究机器学习的东西
+
+12
+00:00:21,400 --> 00:00:23,340
+a machine learning application, it is
+或者构造机器学习应用程序
+
+13
+00:00:23,480 --> 00:00:24,880
+often considered very good practice
+最好的实践方法
+
+14
+00:00:25,840 --> 00:00:27,000
+to start, not by building
+不是建立一个
+
+15
+00:00:27,520 --> 00:00:29,070
+a very complicated system with
+非常复杂的系统
+
+16
+00:00:29,220 --> 00:00:30,490
+lots of complex features and so
+拥有多么复杂的变量
+
+17
+00:00:30,930 --> 00:00:32,450
+on, but to instead start
+而是
+
+18
+00:00:33,060 --> 00:00:34,120
+by building a very simple
+构建一个简单的算法
+
+19
+00:00:34,510 --> 00:00:35,760
+algorithm, the you can implement quickly.
+这样你可以很快地实现它
+
+20
+00:00:37,480 --> 00:00:38,610
+And when I start on
+每当我研究
+
+21
+00:00:38,740 --> 00:00:39,770
+a learning problem, what I usually
+机器学习的问题时
+
+22
+00:00:40,150 --> 00:00:41,350
+do is spend at most one
+我最多只会花一天的时间
+
+23
+00:00:41,570 --> 00:00:43,160
+day, literally at most 24
+就是字面意义上的24小时
+
+24
+00:00:43,460 --> 00:00:46,030
+hours to try to get something really quick and dirty.
+来试图很快的把结果搞出来 即便效果不好
+
+25
+00:00:47,040 --> 00:00:48,550
+Frankly not at all sophisticated system.
+坦白的说 就是根本没有用复杂的系统
+
+26
+00:00:49,370 --> 00:00:50,310
+But get something really quick and
+但是只是很快的得到的结果
+
+27
+00:00:50,400 --> 00:00:52,080
+dirty running and implement
+即便运行得不完美
+
+28
+00:00:52,590 --> 00:00:53,710
+it and then test it on
+但是也把它运行一遍
+
+29
+00:00:53,880 --> 00:00:55,870
+my cross validation data. Once
+最后通过交叉验证来检验数据
+
+30
+00:00:56,050 --> 00:00:57,140
+you've done that, you can
+一旦做完
+
+31
+00:00:57,480 --> 00:00:58,690
+then plot learning curves.
+你可以画出学习曲线
+
+32
+00:00:59,960 --> 00:01:02,670
+This is what we talked about in the previous set of videos.
+这个我们在前面的课程中已经讲过了
+
+33
+00:01:03,230 --> 00:01:05,160
+But plot learning curves of the
+通过画出学习曲线
+
+34
+00:01:05,370 --> 00:01:07,120
+training and test errors to
+以及检验误差
+
+35
+00:01:07,310 --> 00:01:08,280
+try to figure out if your
+来找出
+
+36
+00:01:08,400 --> 00:01:09,630
+learning algorithm may be suffering
+你的算法是否有
+
+37
+00:01:10,120 --> 00:01:11,240
+from high bias or high
+高偏差和高方差的问题
+
+38
+00:01:11,440 --> 00:01:13,180
+variance or something else and
+或者别的问题
+
+39
+00:01:13,440 --> 00:01:14,380
+use that to try to
+在这样分析之后
+
+40
+00:01:14,490 --> 00:01:15,610
+decide if having more data
+再来决定用更多的数据训练
+
+41
+00:01:16,080 --> 00:01:17,990
+and more features and so on are likely to help.
+或者加入更多的特征变量是否有用
+
+42
+00:01:18,670 --> 00:01:19,830
+And the reason that this
+这么做的原因是
+
+43
+00:01:20,000 --> 00:01:20,980
+is a good approach is often
+这在你刚接触机器学习问题时
+
+44
+00:01:21,940 --> 00:01:22,980
+when you're just starting out on
+是一个很好的方法
+
+45
+00:01:23,100 --> 00:01:24,460
+a learning problem, there's really no
+你并不能
+
+46
+00:01:24,680 --> 00:01:25,820
+way to tell in advance
+提前知道
+
+47
+00:01:26,480 --> 00:01:27,360
+whether you need more complex
+你是否需要复杂的特征变量
+
+48
+00:01:27,790 --> 00:01:29,200
+features or whether you need
+或者你是否需要
+
+49
+00:01:29,250 --> 00:01:30,950
+more data or something else.
+更多的数据 还是别的什么
+
+50
+00:01:31,280 --> 00:01:32,270
+And it's just very hard to tell
+提前知道你应该做什么
+
+51
+00:01:32,510 --> 00:01:33,840
+in advance, that is in
+是非常难的
+
+52
+00:01:33,970 --> 00:01:36,040
+the absence of evidence, in
+因为你缺少证据
+
+53
+00:01:36,160 --> 00:01:37,840
+the absence of seeing a
+缺少学习曲线
+
+54
+00:01:37,970 --> 00:01:39,130
+learning curve, it's just incredibly
+因此 你很难知道
+
+55
+00:01:39,750 --> 00:01:42,860
+difficult to figure out where you should be spending your time.
+你应该把时间花在什么地方来提高算法的表现
+
+56
+00:01:43,760 --> 00:01:45,360
+And it's often by implementing even
+但是当你实践一个
+
+57
+00:01:45,730 --> 00:01:46,670
+a very, very quick and dirty
+非常简单即便不完美的
+
+58
+00:01:46,980 --> 00:01:48,100
+implementation and by plotting
+方法时
+
+59
+00:01:48,540 --> 00:01:51,070
+learning curves that that helps you make these decisions.
+你可以通过画出学习曲线来做出进一步的选择
+
+60
+00:01:52,580 --> 00:01:53,340
+So if you like, you can think
+你可以
+
+61
+00:01:53,560 --> 00:01:54,490
+of this as a way of
+用这种方式
+
+62
+00:01:54,620 --> 00:01:56,270
+avoiding what's sometimes called
+来避免一种
+
+63
+00:01:56,570 --> 00:01:58,950
+premature optimization in computer programming.
+电脑编程里的过早优化问题
+
+64
+00:02:00,000 --> 00:02:01,070
+And this is idea that just
+这种理念是
+
+65
+00:02:01,200 --> 00:02:03,130
+says that we should let
+我们必须
+
+66
+00:02:03,460 --> 00:02:04,920
+evidence guide our decisions
+用证据来领导我们的决策
+
+67
+00:02:05,650 --> 00:02:06,540
+on where to spend our time
+怎样分配自己的时间来优化算法
+
+68
+00:02:07,160 --> 00:02:08,150
+rather than use gut feeling,
+而不是仅仅凭直觉
+
+69
+00:02:09,070 --> 00:02:09,680
+which is often wrong.
+凭直觉得出的东西一般总是错误的
+
+70
+00:02:10,930 --> 00:02:12,120
+In addition to plotting learning
+除了画出学习曲线之外
+
+71
+00:02:12,390 --> 00:02:13,540
+curves, one other thing
+一件非常有用的事是
+
+72
+00:02:13,810 --> 00:02:16,440
+that's often very useful to do is what's called error analysis.
+误差分析
+
+73
+00:02:18,120 --> 00:02:19,080
+And what I mean by that is
+我的意思是说
+
+74
+00:02:19,280 --> 00:02:20,520
+that when building, say
+当我们在构造
+
+75
+00:02:20,770 --> 00:02:22,190
+a spam classifier, I will
+比如构造垃圾邮件分类器时
+
+76
+00:02:22,470 --> 00:02:24,500
+often look at my
+我会看一看
+
+77
+00:02:24,730 --> 00:02:26,690
+cross validation set and manually
+我的交叉验证数据集
+
+78
+00:02:27,360 --> 00:02:29,110
+look at the emails that my
+然后亲自看一看
+
+79
+00:02:29,310 --> 00:02:30,910
+algorithm is making errors on.
+哪些邮件被算法错误地分类
+
+80
+00:02:31,180 --> 00:02:32,250
+So, look at the spam emails
+因此 通过这些
+
+81
+00:02:32,630 --> 00:02:34,440
+and non-spam emails that the
+被算法错误分类的垃圾邮件
+
+82
+00:02:34,640 --> 00:02:36,920
+algorithm is misclassifying, and see
+与非垃圾邮件
+
+83
+00:02:37,430 --> 00:02:38,590
+if you can spot any systematic
+你可以发现某些系统性的规律
+
+84
+00:02:39,210 --> 00:02:41,300
+patterns in what type of examples it is misclassifying.
+什么类型的邮件总是被错误分类
+
+85
+00:02:42,980 --> 00:02:44,560
+And often by doing that, this
+经常地 这样做之后
+
+86
+00:02:44,810 --> 00:02:45,960
+is the process that would inspire
+这个过程能启发你
+
+87
+00:02:47,170 --> 00:02:48,800
+you to design new features.
+构造新的特征变量
+
+88
+00:02:49,430 --> 00:02:50,420
+Or they'll tell you whether the
+或者告诉你
+
+89
+00:02:50,920 --> 00:02:52,150
+current things or current
+现在
+
+90
+00:02:52,400 --> 00:02:53,290
+shortcomings of the system
+这个系统的短处
+
+91
+00:02:54,270 --> 00:02:55,550
+and give you the inspiration you
+然后启发你
+
+92
+00:02:55,660 --> 00:02:57,680
+need to come up with improvements to it.
+如何去提高它
+
+93
+00:02:58,260 --> 00:03:00,070
+Concretely, here's a specific example.
+具体地说 这里有一个例子
+
+94
+00:03:01,350 --> 00:03:02,360
+Let's say you've built a spam
+假设你正在构造一个
+
+95
+00:03:02,780 --> 00:03:05,740
+classifier and you
+垃圾邮件分类器
+
+96
+00:03:05,840 --> 00:03:07,720
+have 500 examples in
+你拥有500个实例
+
+97
+00:03:07,940 --> 00:03:09,650
+your cross-validation set.
+在交叉验证集中
+
+98
+00:03:10,410 --> 00:03:11,760
+And let's say in this example, that the
+假设在这个例子中
+
+99
+00:03:12,010 --> 00:03:13,060
+algorithm has a very high error
+该算法有非常高的误差率
+
+100
+00:03:13,340 --> 00:03:14,640
+rate, and it misclassifies a
+它错误分类了
+
+101
+00:03:14,910 --> 00:03:16,500
+hundred of these cross-validation examples.
+一百个交叉验证实例
+
+102
+00:03:18,770 --> 00:03:19,850
+So what I do is manually
+所以我要做的是
+
+103
+00:03:20,450 --> 00:03:22,370
+examine these 100 errors, and
+人工检查这100个错误
+
+104
+00:03:22,530 --> 00:03:24,450
+manually categorize them, based
+然后手工为它们分类
+
+105
+00:03:24,700 --> 00:03:25,810
+on things like what type
+基于例如
+
+106
+00:03:25,980 --> 00:03:27,110
+of email it is and
+这些是什么类型的邮件
+
+107
+00:03:27,270 --> 00:03:28,630
+what cues or what features you
+哪些变量
+
+108
+00:03:28,710 --> 00:03:31,130
+think might have helped the algorithm classify them incorrectly.
+能帮助这个算法来正确分类它们
+
+109
+00:03:32,450 --> 00:03:33,880
+So, specifically, by what
+明确地说
+
+110
+00:03:34,080 --> 00:03:35,050
+type of email it is,
+通过鉴定这是哪种类型的邮件
+
+111
+00:03:35,560 --> 00:03:36,870
+you know, if I look through these
+通过检查
+
+112
+00:03:37,140 --> 00:03:38,180
+hundred errors I may find
+这一百封错误分类的邮件
+
+113
+00:03:38,520 --> 00:03:39,660
+that maybe the most
+我可能会发现
+
+114
+00:03:39,970 --> 00:03:41,350
+common types of spam
+最容易被误分类的邮件
+
+115
+00:03:41,840 --> 00:03:43,450
+emails in misclassifies are maybe
+可能是
+
+116
+00:03:44,010 --> 00:03:45,610
+emails on pharmacy, so basically
+有关药物的邮件
+
+117
+00:03:45,610 --> 00:03:48,300
+these are emails trying to
+基本上这些邮件都是卖药的
+
+118
+00:03:48,610 --> 00:03:50,000
+sell drugs, maybe emails that are
+或者
+
+119
+00:03:50,180 --> 00:03:51,740
+trying to sell replicas -
+卖仿品的
+
+120
+00:03:51,760 --> 00:03:54,330
+those are those fake watches fake you know, random things.
+比如卖假表
+
+121
+00:03:56,160 --> 00:03:59,410
+Maybe have some emails trying to steal passwords.
+或者一些骗子邮件
+
+122
+00:04:00,240 --> 00:04:01,400
+These are also called phishing emails.
+又叫做钓鱼邮件
+
+123
+00:04:02,180 --> 00:04:04,690
+But that's another big category of emails and maybe other categories.
+等等
+
+124
+00:04:06,160 --> 00:04:07,800
+So, in terms
+所以
+
+125
+00:04:08,120 --> 00:04:09,230
+of classify what type of email
+在检查哪些邮件被错误分类的时候
+
+126
+00:04:09,530 --> 00:04:10,420
+it is, I would actually go through
+我会看一看每封邮件
+
+127
+00:04:10,890 --> 00:04:11,990
+and count up, you know, of
+数一数 比如
+
+128
+00:04:12,200 --> 00:04:14,220
+my 100 emails, maybe I
+在这100封错误归类的邮件中
+
+129
+00:04:14,400 --> 00:04:15,510
+find that twelve of the
+我发现有12封
+
+130
+00:04:15,620 --> 00:04:17,600
+mislabeled emails are pharma emails.
+错误归类的邮件是和卖药有关的邮件
+
+131
+00:04:18,100 --> 00:04:19,460
+And maybe four of them
+4封
+
+132
+00:04:19,700 --> 00:04:20,840
+are emails trying to sell
+是推销仿品的
+
+133
+00:04:20,980 --> 00:04:22,680
+replicas, they sell fake watches or something.
+推销假表或者别的东西
+
+134
+00:04:23,720 --> 00:04:25,060
+And maybe I find that 53
+然后我发现
+
+135
+00:04:25,650 --> 00:04:26,970
+of them are these,
+有53封邮件
+
+136
+00:04:27,720 --> 00:04:29,480
+what's called phishing emails, basically emails
+是钓鱼邮件
+
+137
+00:04:29,730 --> 00:04:30,900
+trying to persuade you to
+诱骗你
+
+138
+00:04:31,020 --> 00:04:32,760
+give them your password, and 31 emails are other types of emails.
+告诉他们你的密码 剩下的31封别的类型的邮件
+
+139
+00:04:35,330 --> 00:04:37,210
+And it's by counting up the
+通过算出
+
+140
+00:04:37,280 --> 00:04:38,280
+number of emails in these
+每个类别中
+
+141
+00:04:38,430 --> 00:04:39,540
+different categories that you might
+不同的邮件数
+
+142
+00:04:39,790 --> 00:04:41,570
+discover, for example, that the
+你可能会发现 比如
+
+143
+00:04:41,870 --> 00:04:43,100
+algorithm is doing really particularly
+该算法在
+
+144
+00:04:44,170 --> 00:04:45,640
+poorly on emails trying to
+区分钓鱼邮件的时候
+
+145
+00:04:45,780 --> 00:04:47,240
+steal passwords, and that
+总是表现得很差
+
+146
+00:04:47,400 --> 00:04:49,230
+may suggest that it might
+这说明
+
+147
+00:04:49,380 --> 00:04:50,490
+be worth your effort to look
+你应该花更多的时间
+
+148
+00:04:50,690 --> 00:04:51,650
+more carefully at that type
+来研究这种类型的邮件
+
+149
+00:04:51,900 --> 00:04:53,350
+of email, and see if
+然后
+
+150
+00:04:53,450 --> 00:04:54,450
+you can come up with better features
+看一看你是否能通过构造更好的特征变量
+
+151
+00:04:55,070 --> 00:04:56,280
+to categorize them correctly.
+来正确区分这种类型的邮件
+
+152
+00:04:57,550 --> 00:04:58,930
+And also, what I might
+同时
+
+153
+00:04:59,000 --> 00:05:00,130
+do is look at what cues,
+我要做的是
+
+154
+00:05:00,550 --> 00:05:02,120
+or what features, additional features
+看一看哪些特征变量
+
+155
+00:05:02,620 --> 00:05:04,920
+might have helped the algorithm classify the emails.
+可能会帮助算法正确地分类邮件
+
+156
+00:05:06,090 --> 00:05:06,970
+So let's say that some of
+我们假设
+
+157
+00:05:07,060 --> 00:05:09,700
+our hypotheses about things or
+能帮助我们提高
+
+158
+00:05:09,840 --> 00:05:10,780
+features that might help us
+邮件分类表现
+
+159
+00:05:10,920 --> 00:05:13,240
+classify emails better are trying
+的方法是
+
+160
+00:05:13,490 --> 00:05:15,600
+to detect deliberate misspellings versus
+检查有意的拼写错误
+
+161
+00:05:16,220 --> 00:05:18,610
+unusual email routing versus unusual, you know,
+不寻常的邮件路由来源
+
+162
+00:05:19,950 --> 00:05:21,450
+spamming punctuation, such as
+以及垃圾邮件特有的标点符号方式
+
+163
+00:05:21,790 --> 00:05:23,230
+people use a lot of exclamation marks.
+比如很多感叹号
+
+164
+00:05:23,700 --> 00:05:24,470
+And once again, I would manually
+与之前一样
+
+165
+00:05:24,860 --> 00:05:25,670
+go through and let's say
+我会手动地浏览这些邮件
+
+166
+00:05:25,760 --> 00:05:27,490
+I find five cases of
+假设有5封这种类型的邮件
+
+167
+00:05:27,620 --> 00:05:29,400
+this, and 16 of
+16封这种类型的
+
+168
+00:05:29,500 --> 00:05:30,560
+this, and 32 of this and
+32封这种类型的
+
+169
+00:05:31,180 --> 00:05:33,620
+a bunch of other types of emails as well.
+以及一些别的类型的
+
+170
+00:05:34,770 --> 00:05:36,180
+And if this is what
+如果
+
+171
+00:05:36,350 --> 00:05:37,470
+you get on your cross validation
+这就是你从交叉验证中得到的结果
+
+172
+00:05:38,070 --> 00:05:39,170
+set then it really tells
+那么
+
+173
+00:05:39,300 --> 00:05:41,060
+you that, you know, maybe deliberate spelling
+这可能说明
+
+174
+00:05:41,660 --> 00:05:42,730
+is a sufficiently rare phenomenon
+有意地拼写错误出现频率较少
+
+175
+00:05:43,500 --> 00:05:44,480
+that maybe is not really worth
+这可能并不值得
+
+176
+00:05:44,840 --> 00:05:47,120
+all your time trying to write
+你花费时间
+
+177
+00:05:47,710 --> 00:05:48,780
+algorithms to detect that.
+去编写算法来检测这种类型的邮件
+
+178
+00:05:49,480 --> 00:05:50,480
+But if you find a lot
+但是如果你发现
+
+179
+00:05:50,780 --> 00:05:52,070
+of spammers are using, you
+很多的垃圾邮件
+
+180
+00:05:52,140 --> 00:05:54,150
+know, unusual punctuation then
+都有不一般的标点符号规律
+
+181
+00:05:54,290 --> 00:05:55,250
+maybe that's a strong sign
+那么这是一个很强的特征
+
+182
+00:05:55,670 --> 00:05:56,730
+that it might actually be
+说明你应该
+
+183
+00:05:57,000 --> 00:05:58,510
+worth your while to spend
+花费你的时间
+
+184
+00:05:58,780 --> 00:06:00,280
+the time to develop more sophisticated
+去构造基于标点符号的
+
+185
+00:06:00,910 --> 00:06:02,190
+features based on the punctuation.
+更加复杂的特征变量
+
+186
+00:06:03,330 --> 00:06:04,870
+So, this sort of error
+因此
+
+187
+00:06:05,040 --> 00:06:06,390
+analysis which is really
+这种类型的误差分析
+
+188
+00:06:06,690 --> 00:06:08,430
+the process of manually examining
+是一种手动检测的过程
+
+189
+00:06:09,190 --> 00:06:10,540
+the mistakes that the algorithm
+检测算法可能会犯的错误
+
+190
+00:06:10,780 --> 00:06:12,220
+makes, can often help
+这经常能够帮助你
+
+191
+00:06:12,560 --> 00:06:14,620
+guide you to the most fruitful avenues to pursue.
+找到更为有效的手段
+
+192
+00:06:16,000 --> 00:06:17,410
+And this also explains why I
+这也解释了为什么
+
+193
+00:06:17,590 --> 00:06:19,260
+often recommend implementing a quick
+我总是推荐先实践一种
+
+194
+00:06:19,550 --> 00:06:21,250
+and dirty implementation of an algorithm.
+快速即便不完美的算法
+
+195
+00:06:22,040 --> 00:06:22,940
+What we really want to do
+我们真正想要的是
+
+196
+00:06:23,260 --> 00:06:24,290
+is figure out what are
+找出什么类型的邮件
+
+197
+00:06:24,310 --> 00:06:26,770
+the most difficult examples for an algorithm to classify.
+是这种算法最难分类出来的
+
+198
+00:06:27,860 --> 00:06:29,920
+And very often for different
+对于不同的算法
+
+199
+00:06:30,460 --> 00:06:31,730
+algorithms, for different learning algorithms,
+不同的机器学习算法
+
+200
+00:06:32,010 --> 00:06:33,500
+they'll often find, you
+它们
+
+201
+00:06:33,560 --> 00:06:35,920
+know, similar categories of examples difficult.
+所遇到的问题一般总是相同的
+
+202
+00:06:37,010 --> 00:06:37,970
+And by having a quick and
+通过实践一些快速
+
+203
+00:06:38,060 --> 00:06:39,840
+dirty implementation, that's often a
+即便不完美的算法
+
+204
+00:06:39,910 --> 00:06:40,850
+quick way to let you
+你能够更快地
+
+205
+00:06:41,430 --> 00:06:43,070
+identify some errors and quickly
+找到错误的所在
+
+206
+00:06:43,620 --> 00:06:44,690
+identify what are the
+并且快速找出算法难以处理的例子
+
+207
+00:06:44,790 --> 00:06:47,760
+hard examples so that you can focus your efforts on those.
+这样你就能集中精力在这些真正的问题上
+
+208
+00:06:49,230 --> 00:06:51,220
+Lastly, when developing learning algorithms,
+最后 在构造机器学习算法时
+
+209
+00:06:52,260 --> 00:06:53,880
+one other useful tip is
+另一个有用的小窍门是
+
+210
+00:06:54,190 --> 00:06:55,230
+to make sure that you have
+保证你自己
+
+211
+00:06:55,590 --> 00:06:56,450
+a way, that you have a
+保证你能有一种
+
+212
+00:06:56,810 --> 00:06:59,710
+numerical evaluation of your learning algorithm.
+数值计算的方式来评估你的机器学习算法
+
+213
+00:07:02,130 --> 00:07:03,220
+Now what I mean by that is that
+我这么说的意思是
+
+214
+00:07:03,460 --> 00:07:04,670
+if you're developing a learning algorithm,
+如果你在构造一个学习算法
+
+215
+00:07:05,230 --> 00:07:07,180
+it is often incredibly helpful
+如果你能有一种
+
+216
+00:07:08,060 --> 00:07:09,170
+if you have a way of
+评估你算法的方法
+
+217
+00:07:09,460 --> 00:07:10,830
+evaluating your learning algorithm
+这是非常有用的
+
+218
+00:07:11,290 --> 00:07:13,100
+that just gives you back a single real number.
+一种用数字说话的评估方法
+
+219
+00:07:13,650 --> 00:07:14,880
+Maybe accuracy, maybe error.
+你的算法可能精确 可能有错
+
+220
+00:07:15,620 --> 00:07:18,390
+But the single real number that tells you how well your learning algorithm is doing.
+但是它能准确的告诉你你的算法到底表现有多好
+
+221
+00:07:20,280 --> 00:07:21,330
+I'll talk more about this specific
+在接下来的课程中
+
+222
+00:07:21,770 --> 00:07:24,650
+concepts in later videos, but here's a specific example.
+我会更详细的讲述这个概念 但是先看看这个例子
+
+223
+00:07:25,790 --> 00:07:26,600
+Let's say we are trying to
+假设我们试图
+
+224
+00:07:26,690 --> 00:07:27,990
+decide whether or not we
+决定是否应该
+
+225
+00:07:28,060 --> 00:07:29,140
+should treat words like discount,
+把像"discount""discounts""discounter""discountring"
+
+226
+00:07:29,590 --> 00:07:32,060
+discounts, discounter, discounting, as the same word.
+这样的单词都视为等同
+
+227
+00:07:32,370 --> 00:07:33,390
+So maybe one way to
+一种方法
+
+228
+00:07:33,520 --> 00:07:34,770
+do that is to just
+是检查这些单词的
+
+229
+00:07:35,400 --> 00:07:38,780
+look at the first few characters in a word.
+开头几个字母
+
+230
+00:07:38,960 --> 00:07:40,240
+Like, you know, if you just look at
+比如
+
+231
+00:07:40,300 --> 00:07:41,690
+the first few characters of
+当你在检查这些单词开头几个字母的时候
+
+232
+00:07:41,780 --> 00:07:44,640
+a word, then you figure
+你发现
+
+233
+00:07:44,920 --> 00:07:45,970
+out that maybe all of these
+这几个单词
+
+234
+00:07:46,130 --> 00:07:47,990
+words are roughly - have similar meanings.
+大概可能有着相同的意思
+
+235
+00:07:50,460 --> 00:07:52,090
+In natural language processing, the
+在自然语言处理中
+
+236
+00:07:52,250 --> 00:07:53,270
+way that this is done is
+这种方法
+
+237
+00:07:53,510 --> 00:07:55,960
+actually using a type of software called stemming software.
+是通过一种叫做词干提取的软件实现的
+
+238
+00:07:56,940 --> 00:07:58,080
+If you ever want to do
+如果你想自己来试试
+
+239
+00:07:58,160 --> 00:07:59,880
+this yourself, search on a
+你可以
+
+240
+00:07:59,950 --> 00:08:01,240
+web search engine for the
+在网上搜索一下
+
+241
+00:08:01,500 --> 00:08:02,660
+Porter Stemmer and that
+"Porter Stemmer(波特词干提取法)"
+
+242
+00:08:02,960 --> 00:08:04,320
+would be, you know, one reasonable piece of
+这是在词干提取方面
+
+243
+00:08:04,620 --> 00:08:05,830
+software for doing this sort
+一个比较不错的软件
+
+244
+00:08:06,110 --> 00:08:07,020
+of stemming, which will let
+这个软件会
+
+245
+00:08:07,130 --> 00:08:08,140
+you treat all of these discount,
+将单词"discount""discounts"以及等等
+
+246
+00:08:08,800 --> 00:08:10,540
+discounts, and so on as the same word.
+都视为同一个单词
+
+247
+00:08:13,950 --> 00:08:15,930
+But using a stemming software
+但是这种词干提取软件
+
+248
+00:08:16,630 --> 00:08:17,710
+that basically looks at the
+只会检查
+
+249
+00:08:17,830 --> 00:08:19,290
+first few alphabets of the
+单词的头几个字母
+
+250
+00:08:19,450 --> 00:08:21,630
+word more or less, it can help but it can hurt.
+这有用 但是也可能会造成一些问题
+
+251
+00:08:22,240 --> 00:08:23,490
+And it can hurt because, for
+因为
+
+252
+00:08:23,900 --> 00:08:25,360
+example, this software may
+举个例子
+
+253
+00:08:25,930 --> 00:08:27,850
+mistake the words universe and
+因为这个软件会把单词"universe(宇宙)"
+
+254
+00:08:27,990 --> 00:08:29,980
+university as being the
+和"university(大学)"
+
+255
+00:08:30,070 --> 00:08:31,220
+same thing because, you know,
+也视为同一个单词
+
+256
+00:08:31,450 --> 00:08:33,220
+these two words start off
+因为
+
+257
+00:08:33,480 --> 00:08:35,480
+with very similar characters, with the same alphabets.
+这两个单词开头的字母是一样的
+
+258
+00:08:37,300 --> 00:08:39,050
+So if you're trying
+因此
+
+259
+00:08:39,280 --> 00:08:40,290
+to decide whether or not
+当你在决定
+
+260
+00:08:40,630 --> 00:08:42,490
+to use stemming software for
+是否应该使用词干提取软件用来分类
+
+261
+00:08:42,670 --> 00:08:45,960
+a stem classifier, it is not always easy to tell.
+这总是很难说清楚
+
+262
+00:08:46,350 --> 00:08:47,810
+And in particular, error analysis
+特别地
+
+263
+00:08:48,510 --> 00:08:49,590
+may not actually be helpful
+误差分析
+
+264
+00:08:51,030 --> 00:08:52,860
+for deciding if this
+也并不能帮助你决定
+
+265
+00:08:53,060 --> 00:08:54,410
+sort of stemming idea is a good idea.
+词干提取是不是一个好的方法
+
+266
+00:08:55,570 --> 00:08:56,740
+Instead, the best way
+与之相对地 最好的方法
+
+267
+00:08:57,020 --> 00:08:58,320
+to figure out if using stemming
+来发现词干提取软件
+
+268
+00:08:58,690 --> 00:08:59,970
+software is good to help
+对你的分类器
+
+269
+00:09:00,190 --> 00:09:01,570
+your classifier is if you
+到底有没有用
+
+270
+00:09:01,740 --> 00:09:02,980
+have a way to very quickly
+是迅速地着手试一试
+
+271
+00:09:03,370 --> 00:09:05,170
+just try it and see if it works.
+来看看它表现到底怎么样
+
+272
+00:09:08,560 --> 00:09:09,530
+And in order to do this,
+为了这么做
+
+273
+00:09:10,260 --> 00:09:11,350
+having a way to numerically
+通过数值来评估你的算法
+
+274
+00:09:12,250 --> 00:09:14,570
+evaluate your algorithm, is going to be very helpful.
+是非常有用的
+
+275
+00:09:15,940 --> 00:09:17,670
+Concretely, maybe the most
+具体地说
+
+276
+00:09:18,110 --> 00:09:19,190
+natural thing to do is
+自然而然地
+
+277
+00:09:19,350 --> 00:09:20,250
+to look at the cross validation
+你应该通过交叉验证
+
+278
+00:09:20,900 --> 00:09:23,510
+error of the algorithm's performance with and without stemming.
+来验证不用词干提取与用词干提取的算法的错误率
+
+279
+00:09:24,590 --> 00:09:25,560
+So, if you run your
+因此
+
+280
+00:09:25,800 --> 00:09:27,190
+algorithm without stemming and you
+如果你不在你的算法中使用词干提取
+
+281
+00:09:27,330 --> 00:09:28,430
+end up with, let's say,
+然后你得到 比如
+
+282
+00:09:29,080 --> 00:09:31,260
+five percent classification error, and
+5%的分类错误率
+
+283
+00:09:31,360 --> 00:09:32,410
+you re-run it and you
+然后你再使用词干提取来运行你的算法
+
+284
+00:09:32,540 --> 00:09:33,780
+end up with, let's say, three
+你得到 比如
+
+285
+00:09:34,110 --> 00:09:36,170
+percent classification error, then this
+3%的分类错误
+
+286
+00:09:36,440 --> 00:09:37,920
+decrease in error very quickly
+那么这很大的减少了错误发生
+
+287
+00:09:38,640 --> 00:09:39,980
+allows you to decide that,
+于是你决定
+
+288
+00:09:40,310 --> 00:09:42,250
+you know, it looks like using stemming is a good idea.
+词干提取是一个好的办法
+
+289
+00:09:43,080 --> 00:09:44,650
+For this particular problem, there's
+就这个特定的问题而言
+
+290
+00:09:44,940 --> 00:09:46,560
+a very natural single real
+这里有一个数量的评估数字
+
+291
+00:09:46,830 --> 00:09:50,210
+number evaluation metric, namely, the cross validation error.
+即交差验证错误率
+
+292
+00:09:50,930 --> 00:09:52,700
+We'll see later, examples where coming
+我们以后会发现
+
+293
+00:09:53,080 --> 00:09:54,360
+up with this, sort of, single
+这个例子中的评估数字
+
+294
+00:09:54,790 --> 00:09:58,220
+row number evaluation metric may need a little bit more work.
+还需要一些处理
+
+295
+00:09:58,790 --> 00:09:59,840
+But as we'll see in
+但是
+
+296
+00:09:59,930 --> 00:10:01,620
+the later video, doing so would
+我们可以在今后的课程中看到
+
+297
+00:10:01,750 --> 00:10:02,860
+also then let you
+这么做还是会让你
+
+298
+00:10:02,990 --> 00:10:04,290
+make these decisions much more quickly
+能更快地做出决定
+
+299
+00:10:04,760 --> 00:10:06,380
+of, say, whether or not to use stemming.
+比如 是否使用词干提取
+
+300
+00:10:08,700 --> 00:10:09,950
+And just this one more quick example.
+再说一个例子
+
+301
+00:10:10,680 --> 00:10:11,670
+Let's say that you're also trying
+假设
+
+302
+00:10:12,040 --> 00:10:13,450
+to decide whether or not
+你在想是否应该
+
+303
+00:10:13,650 --> 00:10:15,710
+to distinguish between upper versus lower case.
+区分单词的大小写
+
+304
+00:10:15,990 --> 00:10:16,910
+So, you know, is the red
+比如
+
+305
+00:10:17,060 --> 00:10:18,850
+mom with uppercase M
+单词"mom" 大写的"M"
+
+306
+00:10:19,060 --> 00:10:20,390
+versus lower case m,
+和小写的"m"
+
+307
+00:10:20,700 --> 00:10:21,720
+I mean, should that be treated as
+它们应该被视作
+
+308
+00:10:21,780 --> 00:10:23,810
+the same word or as different words?
+同一个单词还是不同的单词
+
+309
+00:10:23,970 --> 00:10:26,890
+Should these be treated as the same feature or as different features?
+它们应该被视作相同的特征变量还是不同的
+
+310
+00:10:27,010 --> 00:10:28,060
+And so once again,
+再说一次
+
+311
+00:10:28,350 --> 00:10:29,150
+because we have a way
+因为我们有一种
+
+312
+00:10:29,300 --> 00:10:30,790
+to evaluate our algorithm, if
+能够评估我们算法的方法
+
+313
+00:10:31,060 --> 00:10:32,350
+you try this out here, if
+如果你在这里试一试
+
+314
+00:10:32,650 --> 00:10:34,910
+I stop distinguishing upper
+如果我不区分
+
+315
+00:10:35,140 --> 00:10:36,490
+and lower case, maybe I
+大小写
+
+316
+00:10:36,600 --> 00:10:38,580
+end up with 3.2%
+最后得到3.2%的错误率
+
+317
+00:10:38,700 --> 00:10:39,820
+error and I find that
+然后我发现
+
+318
+00:10:40,020 --> 00:10:41,750
+therefore this does worse
+这个表现的较差些
+
+319
+00:10:42,260 --> 00:10:43,360
+than, you know, if I use only
+如果
+
+320
+00:10:43,640 --> 00:10:45,110
+stemming, and so this lets
+如果我只用了词干提取
+
+321
+00:10:45,370 --> 00:10:47,420
+me very quickly decide to go
+这之后我再思考
+
+322
+00:10:48,270 --> 00:10:49,720
+ahead and to distinguish or to
+是否要区分
+
+323
+00:10:49,820 --> 00:10:51,540
+not distinguish between upper and lower case.
+大小写
+
+324
+00:10:52,140 --> 00:10:53,390
+So when you' re developing
+因此当你在
+
+325
+00:10:53,690 --> 00:10:55,260
+a learning algorithm, very often
+构造学习算法的时候
+
+326
+00:10:55,650 --> 00:10:56,840
+you'll be trying out lots of
+你总是会去尝试
+
+327
+00:10:57,050 --> 00:10:59,930
+new ideas and lots of new versions of your learning algorithm.
+很多新的想法 实现出很多版本的学习算法
+
+328
+00:11:00,960 --> 00:11:02,050
+If every time you try
+如果每一次
+
+329
+00:11:02,350 --> 00:11:03,740
+out a new idea if you
+你实践新想法的时候
+
+330
+00:11:03,840 --> 00:11:05,610
+end up manually examining a
+你都手动地检测
+
+331
+00:11:05,750 --> 00:11:06,730
+bunch of examples, you begin to
+这些例子
+
+332
+00:11:06,860 --> 00:11:08,530
+see better or worse, you
+去看看是表现差还是表现好
+
+333
+00:11:08,640 --> 00:11:09,410
+know, that's going to make it
+那么这很难让你
+
+334
+00:11:09,580 --> 00:11:10,610
+really hard to make decisions
+做出决定
+
+335
+00:11:10,980 --> 00:11:12,410
+on do you use stemming or not.
+到底是否使用词干提取
+
+336
+00:11:12,580 --> 00:11:13,640
+Do you distinguish upper or lowercase or not?
+是否区分大小写
+
+337
+00:11:15,180 --> 00:11:16,590
+But by having a single rule
+但是通过一个
+
+338
+00:11:16,770 --> 00:11:18,520
+number evaluation metric, you can
+量化的数值评估
+
+339
+00:11:18,680 --> 00:11:21,150
+then just look and see oh, did the error go up or go down?
+你可以看看这个数字 误差是变大还是变小了
+
+340
+00:11:22,420 --> 00:11:23,620
+And you can use that much
+你可以通过它
+
+341
+00:11:23,940 --> 00:11:25,760
+more rapidly, try out
+更快地实践
+
+342
+00:11:25,840 --> 00:11:27,820
+new ideas and almost right
+你的新想法
+
+343
+00:11:27,990 --> 00:11:29,550
+away tell if your new
+它基本上非常直观地告诉你
+
+344
+00:11:29,690 --> 00:11:31,480
+idea has improved or worsened
+你的想法是提高了算法表现
+
+345
+00:11:32,440 --> 00:11:33,230
+the performance of the learning algorithm
+还是让它变得更坏
+
+346
+00:11:33,930 --> 00:11:35,440
+and this will let
+这会大大提高
+
+347
+00:11:35,560 --> 00:11:38,340
+you often make much faster progress.
+你实践算法时的速度
+
+348
+00:11:38,530 --> 00:11:39,720
+So the recommended, strongly recommended
+所以我强烈推荐
+
+349
+00:11:40,220 --> 00:11:41,790
+way to do error analysis is
+在交叉验证集上来实施误差分析
+
+350
+00:11:42,370 --> 00:11:44,760
+on the cross validation set rather than the test set.
+而不是在测试集上
+
+351
+00:11:45,490 --> 00:11:46,970
+But, you know, there are
+但是
+
+352
+00:11:47,240 --> 00:11:48,260
+people that will do this on
+还是有一些人
+
+353
+00:11:48,370 --> 00:11:49,480
+the test set even though that's
+会在测试集上来做误差分析
+
+354
+00:11:49,730 --> 00:11:51,530
+definitely a less mathematically appropriate
+即使这从数学上讲
+
+355
+00:11:52,190 --> 00:11:54,560
+set of your list, recommended what
+是不合适的
+
+356
+00:11:54,730 --> 00:11:55,660
+you think to do than to
+所以我还是推荐你
+
+357
+00:11:55,780 --> 00:11:57,240
+do error analysis on your
+在交叉验证向量上
+
+358
+00:11:57,450 --> 00:11:58,760
+cross validation sector.
+来做误差分析
+
+359
+00:11:59,140 --> 00:12:01,160
+So, to wrap up this video, when starting
+总结一下
+
+360
+00:12:01,830 --> 00:12:03,340
+on the new machine learning problem, what
+当你在研究一个新的机器学习问题时
+
+361
+00:12:03,610 --> 00:12:05,370
+I almost always recommend is
+我总是推荐你
+
+362
+00:12:05,610 --> 00:12:06,930
+to implement a quick and
+实现一个较为简单快速
+
+363
+00:12:07,030 --> 00:12:08,710
+dirty implementation of your learning algorithm.
+即便不是那么完美的算法
+
+364
+00:12:09,780 --> 00:12:11,760
+And I've almost never seen
+我几乎从未见过
+
+365
+00:12:12,120 --> 00:12:15,370
+anyone spend too little time on this quick and dirty implementation.
+人们这样做
+
+366
+00:12:18,640 --> 00:12:20,210
+I pretty much only ever see
+大家经常干的事情是
+
+367
+00:12:20,480 --> 00:12:22,050
+people spend much too much
+花费大量的时间
+
+368
+00:12:22,370 --> 00:12:23,720
+time building their first, you know,
+在构造算法上
+
+369
+00:12:24,580 --> 00:12:25,800
+supposedly quick and dirty implementations.
+构造他们以为的简单的方法
+
+370
+00:12:26,590 --> 00:12:28,100
+So really, don't worry about
+因此
+
+371
+00:12:29,070 --> 00:12:31,210
+it being too quick, or don't worry about it being too dirty.
+不要担心你的算法太简单 或者太不完美
+
+372
+00:12:32,120 --> 00:12:33,580
+But really implement something as
+而是尽可能快地
+
+373
+00:12:33,690 --> 00:12:35,220
+quickly as you can, and once
+实现你的算法
+
+374
+00:12:35,450 --> 00:12:37,550
+you have the initial implementation this
+当你有了初始的实现之后
+
+375
+00:12:37,820 --> 00:12:38,860
+is then a powerful tool for
+它会变成一个非常有力的工具
+
+376
+00:12:39,230 --> 00:12:40,420
+deciding where to spend your
+来帮助你决定
+
+377
+00:12:40,610 --> 00:12:42,170
+time next, because first we
+下一步的做法
+
+378
+00:12:42,390 --> 00:12:43,390
+can look at the errors it makes,
+因为我们可以先看看算法造成的错误
+
+379
+00:12:43,630 --> 00:12:44,720
+and do this sort of error analysis
+通过误差分析
+
+380
+00:12:45,280 --> 00:12:46,360
+to see what mistakes it makes
+来看看他犯了什么错
+
+381
+00:12:47,010 --> 00:12:48,420
+and use that to inspire further development.
+然后来决定优化的方式
+
+382
+00:12:49,030 --> 00:12:50,880
+And second, assuming your
+另一件事是
+
+383
+00:12:51,000 --> 00:12:53,360
+quick and dirty implementation incorporated a
+假设你有了一个快速而不完美的算法实现
+
+384
+00:12:53,620 --> 00:12:55,700
+single real number evaluation metric, this
+又有一个数值的评估数据
+
+385
+00:12:55,940 --> 00:12:57,660
+can then be a vehicle for
+这会帮助你
+
+386
+00:12:57,730 --> 00:12:58,980
+you to try out different ideas
+尝试新的想法
+
+387
+00:12:59,810 --> 00:13:00,810
+and quickly see if the
+快速地发现
+
+388
+00:13:01,030 --> 00:13:02,170
+different ideas you're trying out
+你尝试的这些想法
+
+389
+00:13:02,440 --> 00:13:03,830
+are improving the performance of
+是否能够提高算法的表现
+
+390
+00:13:03,920 --> 00:13:05,420
+your algorithm and therefore let
+从而
+
+391
+00:13:05,570 --> 00:13:06,470
+you maybe much more quickly
+你会更快地
+
+392
+00:13:06,860 --> 00:13:08,440
+make decisions about what things
+做出决定
+
+393
+00:13:08,760 --> 00:13:09,900
+to fold, and what things to
+在算法中放弃什么 吸收什么
+
+394
+00:13:10,240 --> 00:13:11,520
+incorporate into your learning algorithm.
+
diff --git a/srt/11 - 3 - Error Metrics for Skewed Classes (12 min).srt b/srt/11 - 3 - Error Metrics for Skewed Classes (12 min).srt
new file mode 100644
index 00000000..a06c2476
--- /dev/null
+++ b/srt/11 - 3 - Error Metrics for Skewed Classes (12 min).srt
@@ -0,0 +1,1616 @@
+1
+00:00:00,290 --> 00:00:01,690
+In the previous video, I talked
+在前面的课程中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,060 --> 00:00:03,900
+about error analysis and the
+我提到了误差分析
+
+3
+00:00:04,350 --> 00:00:06,070
+importance of having error
+以及设定误差度量值的重要性
+
+4
+00:00:06,330 --> 00:00:07,480
+metrics, that is of having
+那就是
+
+5
+00:00:08,210 --> 00:00:10,200
+a single real number evaluation metric
+设定某个实数来评估你的学习算法
+
+6
+00:00:11,020 --> 00:00:13,290
+for your learning algorithm to tell how well it's doing.
+并衡量它的表现
+
+7
+00:00:14,310 --> 00:00:15,670
+In the context of evaluation
+有了算法的评估
+
+8
+00:00:16,700 --> 00:00:18,320
+and of error metrics, there is
+和误差度量值
+
+9
+00:00:18,430 --> 00:00:20,290
+one important case, where it's
+有一件重要的事情要注意
+
+10
+00:00:20,480 --> 00:00:22,180
+particularly tricky to come
+就是使用一个合适的误差度量值
+
+11
+00:00:22,510 --> 00:00:24,430
+up with an appropriate error metric,
+这有时会对于你的学习算法
+
+12
+00:00:24,930 --> 00:00:26,990
+or evaluation metric, for your learning algorithm.
+造成非常微妙的影响
+
+13
+00:00:28,040 --> 00:00:29,140
+That case is the case
+这件重要的事情就是
+
+14
+00:00:29,610 --> 00:00:31,310
+of what's called skewed classes.
+偏斜类(skewed classes)的问题
+
+15
+00:00:32,610 --> 00:00:33,480
+Let me tell you what that means.
+让我告诉你这是什么意思
+
+16
+00:00:36,170 --> 00:00:37,550
+Consider the problem of cancer
+想一想之前的癌症分类问题
+
+17
+00:00:38,180 --> 00:00:40,040
+classification, where we have
+我们拥有
+
+18
+00:00:40,300 --> 00:00:41,960
+features of medical patients and
+内科病人的特征变量
+
+19
+00:00:42,070 --> 00:00:44,050
+we want to decide whether or not they have cancer.
+我们希望知道他们是否患有癌症
+
+20
+00:00:44,630 --> 00:00:45,790
+So this is like the malignant
+因此这就像恶性
+
+21
+00:00:46,350 --> 00:00:48,290
+versus benign tumor classification
+与良性肿瘤的分类问题
+
+22
+00:00:48,930 --> 00:00:50,070
+example that we had earlier.
+我们之前讲过这个
+
+23
+00:00:51,140 --> 00:00:52,360
+So let's say y equals 1 if the
+我们假设 y=1 表示患者患有癌症
+
+24
+00:00:52,550 --> 00:00:53,780
+patient has cancer and y equals 0
+假设 y=0
+
+25
+00:00:54,280 --> 00:00:56,530
+if they do not.
+表示他们没有得癌症
+
+26
+00:00:56,810 --> 00:00:57,460
+We have trained the progression
+我们训练逻辑回归模型
+
+27
+00:00:57,940 --> 00:00:59,780
+classifier and let's say
+假设我们用测试集
+
+28
+00:01:00,000 --> 00:01:01,520
+we test our classifier on
+检验了这个分类模型
+
+29
+00:01:01,660 --> 00:01:04,470
+a test set and find that we get 1 percent error.
+并且发现它只有1%的错误
+
+30
+00:01:04,790 --> 00:01:05,720
+So, we're making 99% correct diagnosis.
+因此我们99%会做出正确诊断
+
+31
+00:01:06,530 --> 00:01:09,610
+Seems like a really impressive result, right.
+看起来是非常不错的结果
+
+32
+00:01:09,910 --> 00:01:10,920
+We're correct 99% percent of the time.
+我们99%的情况都是正确的
+
+33
+00:01:12,560 --> 00:01:13,630
+But now, let's say we find
+但是 假如我们发现
+
+34
+00:01:13,940 --> 00:01:15,660
+out that only 0.5 percent
+在测试集中
+
+35
+00:01:16,510 --> 00:01:17,950
+of patients in our
+只有0.5%的患者
+
+36
+00:01:18,160 --> 00:01:19,590
+training test sets actually have cancer.
+真正得了癌症
+
+37
+00:01:20,400 --> 00:01:21,900
+So only half a
+因此
+
+38
+00:01:21,950 --> 00:01:23,460
+percent of the patients that
+在我们的筛选程序里
+
+39
+00:01:23,580 --> 00:01:25,500
+come through our screening process have cancer.
+只有0.5%的患者患了癌症
+
+40
+00:01:26,560 --> 00:01:27,970
+In this case, the 1%
+因此在这个例子中
+
+41
+00:01:28,270 --> 00:01:30,010
+error no longer looks so impressive.
+1%的错误率就不再显得那么好了
+
+42
+00:01:31,130 --> 00:01:32,510
+And in particular, here's a piece
+举个具体的例子
+
+43
+00:01:32,670 --> 00:01:33,730
+of code, here's actually a piece
+这里有一行代码
+
+44
+00:01:33,850 --> 00:01:35,730
+of non learning code that takes
+不是机器学习代码
+
+45
+00:01:36,080 --> 00:01:38,260
+this input of features x and it ignores it.
+它忽略了输入值X
+
+46
+00:01:38,480 --> 00:01:39,820
+It just sets y equals 0
+它让y总是等于0
+
+47
+00:01:39,950 --> 00:01:41,640
+and always predicts, you
+因此它总是预测
+
+48
+00:01:41,720 --> 00:01:43,920
+know, nobody has cancer and this
+没有人得癌症
+
+49
+00:01:44,170 --> 00:01:45,720
+algorithm would actually get
+那么这个算法实际上只有
+
+50
+00:01:46,000 --> 00:01:47,840
+0.5 percent error.
+0.5%的错误率
+
+51
+00:01:48,830 --> 00:01:50,280
+So this is even better than
+因此这甚至比
+
+52
+00:01:50,400 --> 00:01:51,140
+the 1% error that we were getting just now
+我们之前得到的1%的错误率更好
+
+53
+00:01:51,240 --> 00:01:52,960
+and this is a non
+这是一个
+
+54
+00:01:53,160 --> 00:01:54,600
+learning algorithm that you know, it is just
+非机器学习算法
+
+55
+00:01:54,800 --> 00:01:56,950
+predicting y equals 0 all the time.
+因为它只是预测y总是等于0
+
+56
+00:01:57,990 --> 00:01:59,430
+So this setting of when
+这种情况发生在
+
+57
+00:02:00,060 --> 00:02:01,980
+the ratio of positive to
+正例和负例的比率
+
+58
+00:02:02,180 --> 00:02:04,130
+negative examples is very close
+非常接近于
+
+59
+00:02:04,810 --> 00:02:06,480
+to one of two extremes, where,
+一个极端
+
+60
+00:02:07,040 --> 00:02:08,620
+in this case, the number of
+在这个例子中
+
+61
+00:02:08,710 --> 00:02:10,050
+positive examples is much,
+正样本的数量
+
+62
+00:02:10,350 --> 00:02:11,310
+much smaller than the number
+与负样本的数量相比
+
+63
+00:02:11,620 --> 00:02:13,180
+of negative examples because y
+非常非常少
+
+64
+00:02:13,480 --> 00:02:15,500
+equals one so rarely, this
+因为y=1非常少
+
+65
+00:02:15,730 --> 00:02:16,850
+is what we call the
+我们把这种情况叫做
+
+66
+00:02:17,000 --> 00:02:18,600
+case of skewed classes.
+偏斜类
+
+67
+00:02:20,790 --> 00:02:21,710
+We just have a lot more
+一个类中的样本数
+
+68
+00:02:22,000 --> 00:02:23,140
+of examples from one class
+与另一个类的数据相比
+
+69
+00:02:23,570 --> 00:02:25,040
+than from the other class.
+多很多
+
+70
+00:02:25,220 --> 00:02:26,560
+And by just predicting y equals
+通过总是预测y=0
+
+71
+00:02:26,920 --> 00:02:28,270
+0 all the time, or maybe
+或者
+
+72
+00:02:28,650 --> 00:02:29,650
+our predicting y equals 1
+总是预测y=1
+
+73
+00:02:29,790 --> 00:02:32,080
+all the time, an algorithm can do pretty well.
+算法可能表现非常好
+
+74
+00:02:32,980 --> 00:02:34,050
+So the problem with using
+因此使用分类误差
+
+75
+00:02:34,670 --> 00:02:36,210
+classification error or classification
+或者分类精确度
+
+76
+00:02:36,590 --> 00:02:39,240
+accuracy as our evaluation metric is the following.
+来作为评估度量可能会产生如下问题
+
+77
+00:02:40,430 --> 00:02:41,360
+Let's say you have one joining
+假如说你有一个算法
+
+78
+00:02:41,700 --> 00:02:43,570
+algorithm that's getting 99.2% accuracy.
+它的精确度是99.2%
+
+79
+00:02:46,530 --> 00:02:47,200
+So, that's a 0.8% error.
+因此它只有0.8%的误差
+
+80
+00:02:47,330 --> 00:02:50,850
+Let's say you
+假设
+
+81
+00:02:51,000 --> 00:02:52,000
+make a change to your algorithm
+你对你的算法做出了一点改动
+
+82
+00:02:52,810 --> 00:02:53,890
+and you now are getting
+现在你得到了
+
+83
+00:02:54,280 --> 00:02:56,080
+99.5% accuracy.
+99.5%的精确度
+
+84
+00:02:59,280 --> 00:03:02,110
+That is 0.5% error.
+只有0.5%的误差
+
+85
+00:03:04,230 --> 00:03:06,460
+So, is this an improvement to the algorithm or not?
+这到底是不是算法的一个提升呢
+
+86
+00:03:06,770 --> 00:03:07,930
+One of the nice things
+用某个实数来
+
+87
+00:03:08,300 --> 00:03:09,990
+about having a single real
+作为评估度量值
+
+88
+00:03:10,120 --> 00:03:11,480
+number evaluation metric is this
+的一个好处就是
+
+89
+00:03:11,650 --> 00:03:13,080
+helps us to quickly decide if
+它可以帮助我们迅速决定
+
+90
+00:03:13,240 --> 00:03:15,530
+we just need a good change to the algorithm or not.
+我们是否需要对算法做出一些改进
+
+91
+00:03:16,370 --> 00:03:20,160
+By going from 99.2% accuracy to 99.5% accuracy.
+将精确度从99.2%提高到99.5%
+
+92
+00:03:21,430 --> 00:03:22,490
+You know, did we just do something
+但是我们的改进到底是有用的
+
+93
+00:03:22,780 --> 00:03:23,640
+useful or did we
+还是说
+
+94
+00:03:23,770 --> 00:03:25,150
+just replace our code with
+我们只是把代码替换成了
+
+95
+00:03:25,320 --> 00:03:26,690
+something that just predicts y equals
+例如总是预测y=0
+
+96
+00:03:27,000 --> 00:03:28,830
+zero more often?
+这样的东西
+
+97
+00:03:29,300 --> 00:03:30,430
+So, if you have very skewed classes
+因此如果你有一个偏斜类
+
+98
+00:03:31,340 --> 00:03:33,280
+it becomes much harder to use
+用分类精确度
+
+99
+00:03:33,640 --> 00:03:36,000
+just classification accuracy, because you
+并不能很好地衡量算法
+
+100
+00:03:36,120 --> 00:03:37,730
+can get very high classification accuracies
+因为你可能会获得一个很高的精确度
+
+101
+00:03:38,420 --> 00:03:40,950
+or very low errors, and
+非常低的错误率
+
+102
+00:03:41,110 --> 00:03:42,880
+it's not always clear if
+但是我们并不知道
+
+103
+00:03:43,070 --> 00:03:44,190
+doing so is really improving
+我们是否真的提升了
+
+104
+00:03:44,770 --> 00:03:45,780
+the quality of your classifier
+分类模型的质量
+
+105
+00:03:46,400 --> 00:03:48,320
+because predicting y equals 0 all the
+因为总是预测y=0
+
+106
+00:03:48,380 --> 00:03:50,710
+time doesn't seem like
+并不是一个
+
+107
+00:03:51,570 --> 00:03:52,570
+a particularly good classifier.
+好的分类模型
+
+108
+00:03:53,900 --> 00:03:55,500
+But just predicting y equals 0 more
+但是总是预测y=0
+
+109
+00:03:55,720 --> 00:03:57,300
+often can bring your error
+会将你的误差降低至
+
+110
+00:03:57,830 --> 00:03:59,460
+down to, you know, maybe as
+比如
+
+111
+00:03:59,650 --> 00:04:01,120
+low as 0.5%.
+降低至0.5%
+
+112
+00:04:01,490 --> 00:04:02,590
+When we're faced with such
+当我们遇到
+
+113
+00:04:02,770 --> 00:04:04,990
+a skewed classes therefore we
+这样一个偏斜类时
+
+114
+00:04:05,250 --> 00:04:06,350
+would want to come up
+我们希望有一个
+
+115
+00:04:06,470 --> 00:04:07,920
+with a different error metric
+不同的误差度量值
+
+116
+00:04:08,320 --> 00:04:09,500
+or a different evaluation metric.
+或者不同的评估度量值
+
+117
+00:04:10,290 --> 00:04:12,360
+One such evaluation metric are
+其中一种评估度量值
+
+118
+00:04:12,870 --> 00:04:14,240
+what's called precision recall.
+叫做查准率(precision)和召回率(recall)
+
+119
+00:04:15,440 --> 00:04:16,410
+Let me explain what that is.
+让我来解释一下
+
+120
+00:04:17,520 --> 00:04:19,890
+Let's say we are evaluating a classifier on the test set.
+假设我们正在用测试集来评估一个分类模型
+
+121
+00:04:20,750 --> 00:04:21,800
+For the examples in the
+对于
+
+122
+00:04:21,890 --> 00:04:23,890
+test set the actual
+测试集中的样本
+
+123
+00:04:25,450 --> 00:04:26,880
+class of that example
+每个测试集中的样本
+
+124
+00:04:27,320 --> 00:04:28,440
+in the test set is going to
+都会等于
+
+125
+00:04:28,550 --> 00:04:29,810
+be either one or zero, right,
+0或者1
+
+126
+00:04:30,440 --> 00:04:32,520
+if there is a binary classification problem.
+假设这是一个二分问题
+
+127
+00:04:33,870 --> 00:04:34,960
+And what our learning algorithm
+我们的学习算法
+
+128
+00:04:35,360 --> 00:04:37,070
+will do is it will, you know,
+要做的是
+
+129
+00:04:37,930 --> 00:04:39,270
+predict some value for the
+做出值的预测
+
+130
+00:04:39,450 --> 00:04:41,160
+class and our learning
+并且学习算法
+
+131
+00:04:41,560 --> 00:04:43,300
+algorithm will predict the value
+会为每一个
+
+132
+00:04:43,760 --> 00:04:44,830
+for each example in my
+测试集中的实例
+
+133
+00:04:44,910 --> 00:04:46,520
+test set and the predicted value
+做出预测
+
+134
+00:04:46,920 --> 00:04:48,560
+will also be either one or zero.
+预测值也是等于0或1
+
+135
+00:04:50,050 --> 00:04:52,060
+So let me draw a two
+让我画一个
+
+136
+00:04:52,270 --> 00:04:53,340
+by two table as follows,
+2x2的表格
+
+137
+00:04:53,910 --> 00:04:55,870
+depending on a full of these entries
+基于所有这些值
+
+138
+00:04:56,320 --> 00:04:57,800
+depending on what was the
+基于
+
+139
+00:04:57,960 --> 00:04:59,350
+actual class and what was the predicted class.
+实际的类与预测的类
+
+140
+00:05:00,220 --> 00:05:01,270
+If we have an
+如果
+
+141
+00:05:01,570 --> 00:05:02,890
+example where the actual class is
+有一个样本它实际所属的类是1
+
+142
+00:05:02,970 --> 00:05:03,950
+one and the predicted class
+预测的类也是1
+
+143
+00:05:04,240 --> 00:05:06,140
+is one then that's called
+那么
+
+144
+00:05:07,620 --> 00:05:08,640
+an example that's a true
+我们把这个样本叫做真阳性(true positive)
+
+145
+00:05:08,940 --> 00:05:10,300
+positive, meaning our algorithm
+意思是说我们的学习算法
+
+146
+00:05:10,730 --> 00:05:11,700
+predicted that it's positive
+预测这个值为阳性
+
+147
+00:05:12,400 --> 00:05:15,780
+and in reality the example is positive.
+实际上这个样本也确实是阳性
+
+148
+00:05:16,240 --> 00:05:17,300
+If our learning algorithm predicted that
+如果我们的学习算法
+
+149
+00:05:17,490 --> 00:05:19,010
+something is negative, class zero,
+预测某个值是阴性 等于0
+
+150
+00:05:19,570 --> 00:05:20,620
+and the actual class is also
+实际的类也确实属于0
+
+151
+00:05:20,970 --> 00:05:23,650
+class zero then that's what's called a true negative.
+那么我们把这个叫做真阴性(true negative)
+
+152
+00:05:24,070 --> 00:05:26,370
+We predicted zero and it actually is zero.
+我们预测为0的值实际上也等于0
+
+153
+00:05:27,880 --> 00:05:28,740
+To find the other two boxes,
+还剩另外的两个单元格
+
+154
+00:05:29,470 --> 00:05:31,120
+if our learning algorithm predicts that
+如果我们的学习算法
+
+155
+00:05:31,360 --> 00:05:33,210
+the class is one but the
+预测某个值等于1
+
+156
+00:05:34,340 --> 00:05:36,370
+actual class is zero, then
+但是实际上它等于0
+
+157
+00:05:36,670 --> 00:05:37,910
+that's called a false positive.
+这个叫做假阳性(false positive)
+
+158
+00:05:39,350 --> 00:05:40,630
+So that means our algorithm for
+比如我们的算法
+
+159
+00:05:40,830 --> 00:05:41,970
+the patient is cancelled out in
+预测某些病人患有癌症
+
+160
+00:05:42,190 --> 00:05:43,520
+reality if the patient does not.
+但是事实上他们并没有得癌症
+
+161
+00:05:44,730 --> 00:05:47,340
+And finally, the last box is a zero, one.
+最后 这个单元格是 1和0
+
+162
+00:05:48,200 --> 00:05:50,330
+That's called a false negative
+这个叫做假阴性(false negative)
+
+163
+00:05:51,180 --> 00:05:52,690
+because our algorithm predicted
+因为我们的算法预测值为0
+
+164
+00:05:53,450 --> 00:05:56,170
+zero, but the actual class was one.
+但是实际值是1
+
+165
+00:05:57,230 --> 00:05:59,020
+And so, we
+这样
+
+166
+00:05:59,150 --> 00:06:00,830
+have this little sort of two by
+我们有了一个2x2的表格
+
+167
+00:06:00,990 --> 00:06:02,720
+two table based on
+基于
+
+168
+00:06:03,250 --> 00:06:05,500
+what was the actual class and what was the predicted class.
+实际类与预测类
+
+169
+00:06:07,080 --> 00:06:08,380
+So here's a different way
+这样我们有了一个
+
+170
+00:06:08,690 --> 00:06:10,310
+of evaluating the performance of
+另一种方式来
+
+171
+00:06:10,420 --> 00:06:11,940
+our algorithm. We're
+评估算法的表现
+
+172
+00:06:12,550 --> 00:06:12,870
+going to compute two numbers.
+我们要计算两个数字
+
+173
+00:06:13,310 --> 00:06:14,780
+The first is called precision -
+第一个叫做查准率
+
+174
+00:06:14,940 --> 00:06:16,100
+and what that says is,
+这个意思是
+
+175
+00:06:17,170 --> 00:06:18,330
+of all the patients where we've
+对于所有我们预测
+
+176
+00:06:18,580 --> 00:06:19,580
+predicted that they have cancer,
+他们患有癌症的病人
+
+177
+00:06:20,640 --> 00:06:23,140
+what fraction of them actually have cancer?
+有多大比率的病人是真正患有癌症的
+
+178
+00:06:24,560 --> 00:06:25,310
+So let me write this down,
+让我把这个写下来
+
+179
+00:06:26,020 --> 00:06:27,300
+the precision of a classifier
+一个分类模型的查准率
+
+180
+00:06:27,680 --> 00:06:29,070
+is the number of true
+等于
+
+181
+00:06:29,310 --> 00:06:31,880
+positives divided by
+真阳性除以
+
+182
+00:06:32,940 --> 00:06:35,190
+the number that we predicted
+所有我们预测为阳性
+
+183
+00:06:37,140 --> 00:06:37,370
+as positive, right?
+的数量
+
+184
+00:06:39,150 --> 00:06:40,660
+So of all the patients that
+对于那些病人
+
+185
+00:06:41,090 --> 00:06:43,590
+we went to those patients and we told them, "We think you have cancer."
+我们告诉他们 "你们患有癌症"
+
+186
+00:06:43,890 --> 00:06:45,730
+Of all those patients, what
+对于这些病人而言
+
+187
+00:06:45,890 --> 00:06:47,410
+fraction of them actually have cancer?
+有多大比率是真正患有癌症的
+
+188
+00:06:47,500 --> 00:06:48,920
+So that's called precision.
+这个就叫做查准率
+
+189
+00:06:49,800 --> 00:06:50,680
+And another way to write this
+另一个写法是
+
+190
+00:06:50,950 --> 00:06:54,920
+would be true positives and
+分子是真阳性
+
+191
+00:06:55,010 --> 00:06:56,430
+then in the denominator is the
+分母是
+
+192
+00:06:56,670 --> 00:06:59,050
+number of predicted positives, and
+所有预测阳性的数量
+
+193
+00:06:59,210 --> 00:07:00,160
+so that would be the
+那么这个等于
+
+194
+00:07:00,240 --> 00:07:01,730
+sum of the, you know, entries
+表格第一行的值
+
+195
+00:07:02,410 --> 00:07:04,510
+in this first row of the table.
+的和
+
+196
+00:07:04,720 --> 00:07:07,760
+So it would be true positives divided by true positives.
+也就是真阳性除以真阳性...
+
+197
+00:07:08,670 --> 00:07:10,470
+I'm going to abbreviate positive
+这里我把阳性简写为
+
+198
+00:07:11,220 --> 00:07:12,980
+as POS and then
+POS
+
+199
+00:07:13,130 --> 00:07:15,470
+plus false positives, again
+加上假阳性
+
+200
+00:07:15,890 --> 00:07:18,550
+abbreviating positive using POS.
+这里我还是把阳性简写为POS
+
+201
+00:07:20,030 --> 00:07:21,850
+So that's called precision, and as you
+这个就叫做查准率
+
+202
+00:07:21,920 --> 00:07:23,490
+can tell high precision would be good.
+查准率越高就越好
+
+203
+00:07:23,660 --> 00:07:24,680
+That means that all the patients
+这是说 对于那些病人
+
+204
+00:07:25,070 --> 00:07:27,100
+that we went to and we said, "You know, we're very sorry.
+我们告诉他们 "非常抱歉
+
+205
+00:07:27,510 --> 00:07:28,960
+We think you have cancer," high precision
+我们认为你得了癌症"
+
+206
+00:07:29,440 --> 00:07:31,750
+means that of that group
+高查准率说明
+
+207
+00:07:31,980 --> 00:07:33,160
+of patients most of them
+对于这类病人
+
+208
+00:07:33,390 --> 00:07:34,460
+we had actually made accurate
+我们对预测他们得了癌症
+
+209
+00:07:34,820 --> 00:07:36,630
+predictions on them and they do have cancer.
+有很高的准确率
+
+210
+00:07:38,840 --> 00:07:39,880
+The second number we're going to compute
+另一个数字我们要计算的
+
+211
+00:07:40,440 --> 00:07:41,730
+is called recall, and what
+叫做召回率
+
+212
+00:07:42,060 --> 00:07:44,230
+recall say is, if all
+召回率是
+
+213
+00:07:44,480 --> 00:07:46,100
+the patients in, let's say,
+如果所有的病人
+
+214
+00:07:46,190 --> 00:07:47,510
+in the test set or the
+假设测试集中的病人
+
+215
+00:07:47,620 --> 00:07:48,830
+cross-validation set, but if
+或者交叉验证集中的
+
+216
+00:07:48,960 --> 00:07:49,980
+all the patients in the data
+如果所有这些在数据集中的病人
+
+217
+00:07:50,150 --> 00:07:51,550
+set that actually have cancer,
+确实得了癌症
+
+218
+00:07:52,670 --> 00:07:54,240
+what fraction of them that
+有多大比率
+
+219
+00:07:54,400 --> 00:07:56,250
+we correctly detect as having cancer.
+我们正确预测他们得了癌症
+
+220
+00:07:56,950 --> 00:07:57,870
+So if all the patients have
+如果所有的病人
+
+221
+00:07:58,090 --> 00:07:59,170
+cancer, how many of them
+都患了癌症
+
+222
+00:07:59,400 --> 00:08:01,130
+did we actually go to them
+有多少人我们能够
+
+223
+00:08:01,320 --> 00:08:03,850
+and you know, correctly told them that we think they need treatment.
+正确告诉他们 你需要治疗
+
+224
+00:08:05,860 --> 00:08:07,010
+So, writing this down,
+把这个写下来
+
+225
+00:08:07,360 --> 00:08:08,970
+recall is defined as the
+召回率被定义为
+
+226
+00:08:09,040 --> 00:08:12,020
+number of positives, the number
+真阳性
+
+227
+00:08:12,470 --> 00:08:14,760
+of true positives,
+的数量
+
+228
+00:08:15,260 --> 00:08:16,320
+meaning the number of people
+意思是我们正确预测
+
+229
+00:08:16,520 --> 00:08:17,890
+that have cancer and that
+患有癌症的人
+
+230
+00:08:18,030 --> 00:08:19,280
+we correctly predicted have cancer
+的数量
+
+231
+00:08:20,310 --> 00:08:21,440
+and we take that and divide
+我们用这个来
+
+232
+00:08:21,790 --> 00:08:23,510
+that by, divide that by
+除以
+
+233
+00:08:23,740 --> 00:08:29,300
+the number of actual positives,
+实际阳性
+
+234
+00:08:31,200 --> 00:08:32,070
+so this is the right
+这个值是
+
+235
+00:08:32,510 --> 00:08:35,190
+number of actual positives of all the people that do have cancer.
+所有患有癌症的人的数量
+
+236
+00:08:35,850 --> 00:08:37,000
+What fraction do we directly
+有多大比率
+
+237
+00:08:37,430 --> 00:08:38,950
+flag and you know, send the treatment.
+我们能正确发现癌症 并给予治疗
+
+238
+00:08:40,560 --> 00:08:41,780
+So, to rewrite this in
+把这个以另一种形式
+
+239
+00:08:41,930 --> 00:08:44,060
+a different form, the denominator would
+写下来
+
+240
+00:08:44,210 --> 00:08:45,160
+be the number of actual
+分母是
+
+241
+00:08:45,430 --> 00:08:46,990
+positives as you know, is the sum
+实际阳性的数量
+
+242
+00:08:47,220 --> 00:08:49,480
+of the entries in this first column over here.
+表格第一列值的和
+
+243
+00:08:50,600 --> 00:08:51,660
+And so writing things out differently,
+将这个以不同的形式写下来
+
+244
+00:08:52,160 --> 00:08:53,470
+this is therefore, the number of
+那就是
+
+245
+00:08:53,650 --> 00:08:57,120
+true positives, divided by
+真阳性除以
+
+246
+00:08:59,010 --> 00:09:01,340
+the number of true positives
+真阳性
+
+247
+00:09:02,790 --> 00:09:05,430
+plus the number of
+加上
+
+248
+00:09:06,750 --> 00:09:07,690
+false negatives.
+假阴性
+
+249
+00:09:09,570 --> 00:09:12,180
+And so once again, having a high recall would be a good thing.
+同样地 召回率越高越好
+
+250
+00:09:14,180 --> 00:09:15,810
+So by computing precision and
+通过计算查准率
+
+251
+00:09:15,930 --> 00:09:17,100
+recall this will usually
+和召回率
+
+252
+00:09:17,340 --> 00:09:18,740
+give us a better sense of
+我们能更好的知道
+
+253
+00:09:19,140 --> 00:09:20,560
+how well our classifier is doing.
+分类模型到底好不好
+
+254
+00:09:21,620 --> 00:09:22,960
+And in particular if we have
+具体地说
+
+255
+00:09:23,330 --> 00:09:24,740
+a learning algorithm that predicts
+如果我们有一个算法
+
+256
+00:09:25,520 --> 00:09:27,020
+y equals zero all
+总是预测y=0
+
+257
+00:09:27,190 --> 00:09:28,290
+the time, if it predicts no
+它总是预测
+
+258
+00:09:28,460 --> 00:09:30,080
+one has cancer, then this
+没有人患癌症
+
+259
+00:09:30,250 --> 00:09:31,880
+classifier will have a
+那么这个分类模型
+
+260
+00:09:32,070 --> 00:09:33,820
+recall equal to zero,
+召回率等于0
+
+261
+00:09:34,370 --> 00:09:35,300
+because there won't be any
+因为它不会有
+
+262
+00:09:35,570 --> 00:09:36,940
+true positives and so that's
+真阳性
+
+263
+00:09:37,190 --> 00:09:37,930
+a quick way for us to
+因此我们能会快发现
+
+264
+00:09:38,010 --> 00:09:40,290
+recognize that, you know, a
+这个分类模型
+
+265
+00:09:40,360 --> 00:09:41,570
+classifier that predicts y equals 0 all the time,
+总是预测y=0
+
+266
+00:09:42,050 --> 00:09:43,350
+just isn't a very good classifier.
+它不是一个好的模型
+
+267
+00:09:44,000 --> 00:09:46,660
+And more generally,
+总的来说
+
+268
+00:09:47,450 --> 00:09:48,830
+even for settings where we
+即使我们有一个
+
+269
+00:09:48,950 --> 00:09:50,800
+have very skewed classes, it's
+非常偏斜的类
+
+270
+00:09:51,050 --> 00:09:53,350
+not possible for an
+算法也不能够
+
+271
+00:09:53,440 --> 00:09:54,900
+algorithm to sort of "cheat"
+"欺骗"我们
+
+272
+00:09:55,450 --> 00:09:56,400
+and somehow get a very
+仅仅通过预测
+
+273
+00:09:56,750 --> 00:09:57,930
+high precision and a
+y总是等于0
+
+274
+00:09:58,010 --> 00:09:59,360
+very high recall by doing
+或者y总是等于1
+
+275
+00:09:59,620 --> 00:10:00,800
+some simple thing like predicting
+它没有办法得到
+
+276
+00:10:01,050 --> 00:10:02,680
+y equals 0 all the time or
+高的查准率
+
+277
+00:10:02,720 --> 00:10:04,720
+predicting y equals 1 all the time.
+和高的召回率
+
+278
+00:10:04,960 --> 00:10:06,540
+And so we're much
+因此我们
+
+279
+00:10:06,680 --> 00:10:08,230
+more sure that a classifier
+能够更肯定
+
+280
+00:10:08,840 --> 00:10:09,780
+of a high precision or high recall
+拥有高查准率或者高召回率的模型
+
+281
+00:10:10,610 --> 00:10:11,550
+actually is a good classifier,
+是一个好的分类模型
+
+282
+00:10:12,470 --> 00:10:13,940
+and this gives us a
+这给予了我们一个
+
+283
+00:10:14,040 --> 00:10:15,660
+more useful evaluation metric that
+更好的评估值
+
+284
+00:10:15,910 --> 00:10:16,960
+is a more direct way to
+给予我们一种更直接的方法
+
+285
+00:10:17,230 --> 00:10:20,360
+actually understand whether, you know, our algorithm may be doing well.
+来评估模型的好与坏
+
+286
+00:10:21,680 --> 00:10:23,000
+So one final note in
+最后一件需要记住的事
+
+287
+00:10:23,200 --> 00:10:24,960
+the definition of precision and
+在查准率和召回率的定义中
+
+288
+00:10:25,150 --> 00:10:26,190
+recall, that we would define
+我们定义
+
+289
+00:10:26,720 --> 00:10:28,720
+precision and recall, usually we
+查准率和召回率
+
+290
+00:10:29,100 --> 00:10:31,970
+use the convention that y is equal to 1, in
+我们总是习惯性地用y=1
+
+291
+00:10:32,090 --> 00:10:33,700
+the presence of the more rare class.
+如果这个类出现得非常少
+
+292
+00:10:34,160 --> 00:10:35,410
+So if we are trying to detect.
+因此如果我们试图检测
+
+293
+00:10:35,880 --> 00:10:37,300
+rare conditions such as cancer,
+某种很稀少的情况 比如癌症
+
+294
+00:10:37,720 --> 00:10:38,610
+hopefully that's a rare condition,
+我希望它是个很稀少的情况
+
+295
+00:10:39,340 --> 00:10:40,950
+precision and recall are
+查准率和召回率
+
+296
+00:10:41,000 --> 00:10:42,440
+defined setting y equals
+会被定义为
+
+297
+00:10:42,790 --> 00:10:43,930
+1, rather than y
+y=1
+
+298
+00:10:44,190 --> 00:10:45,690
+equals 0, to be sort of
+而不是y=0
+
+299
+00:10:45,820 --> 00:10:47,100
+that the presence of that rare
+作为某种我们希望检测的
+
+300
+00:10:47,250 --> 00:10:50,220
+class that we're trying to detect.
+出现较少的类
+
+301
+00:10:50,450 --> 00:10:51,960
+And by using precision and recall,
+通过使用查准率和召回率
+
+302
+00:10:52,890 --> 00:10:54,250
+we find, what happens is
+我们发现
+
+303
+00:10:54,390 --> 00:10:55,400
+that even if we have
+即使我们拥有
+
+304
+00:10:55,610 --> 00:10:57,400
+very skewed classes, it's not
+非常偏斜的类
+
+305
+00:10:57,590 --> 00:10:59,080
+possible for an algorithm to
+算法不能够
+
+306
+00:10:59,600 --> 00:11:01,060
+you know, "cheat" and predict
+通过总是预测y=1
+
+307
+00:11:01,380 --> 00:11:02,400
+y equals 1 all the time,
+来"欺骗"我们
+
+308
+00:11:02,760 --> 00:11:03,870
+or predict y equals 0 all
+或者总是预测y=0
+
+309
+00:11:03,980 --> 00:11:05,750
+the time, and get high precision and recall.
+因为它不能够获得高查准率和召回率
+
+310
+00:11:06,640 --> 00:11:07,830
+And in particular, if a classifier
+具体地说 如果一个分类模型
+
+311
+00:11:08,480 --> 00:11:09,700
+is getting high precision and high
+拥有高查准率和召回率
+
+312
+00:11:09,880 --> 00:11:11,160
+recall, then we are
+那么
+
+313
+00:11:11,270 --> 00:11:13,040
+actually confident that the algorithm
+我们可以确信地说
+
+314
+00:11:13,590 --> 00:11:15,120
+has to be doing well, even
+这个算法表现很好
+
+315
+00:11:15,400 --> 00:11:16,620
+if we have very skewed classes.
+即便我们拥有很偏斜的类
+
+316
+00:11:18,030 --> 00:11:20,360
+So for the problem of skewed classes precision
+因此对于偏斜类的问题
+
+317
+00:11:20,950 --> 00:11:22,560
+recall gives us more
+查准率和召回率
+
+318
+00:11:22,780 --> 00:11:24,670
+direct insight into how
+给予了我们更好的方法
+
+319
+00:11:24,910 --> 00:11:26,010
+the learning algorithm is doing
+来检测学习算法表现如何
+
+320
+00:11:26,660 --> 00:11:27,980
+and this is often a much
+这是一种
+
+321
+00:11:28,070 --> 00:11:29,360
+better way to evaluate our learning algorithms,
+更好地评估学习算法的标准
+
+322
+00:11:30,270 --> 00:11:32,200
+than looking at classification error
+当出现偏斜类时
+
+323
+00:11:32,510 --> 00:11:35,200
+or classification accuracy, when the classes are very skewed.
+比仅仅只用分类误差或者分类精度好
+
diff --git a/srt/11 - 4 - Trading Off Precision and Recall (14 min).srt b/srt/11 - 4 - Trading Off Precision and Recall (14 min).srt
new file mode 100644
index 00000000..b332720d
--- /dev/null
+++ b/srt/11 - 4 - Trading Off Precision and Recall (14 min).srt
@@ -0,0 +1,2091 @@
+1
+00:00:00,410 --> 00:00:01,520
+In the last video, we talked
+在之前的课程中 我们谈到
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,820 --> 00:00:04,130
+about precision and recall as
+查准率和召回率
+
+3
+00:00:04,280 --> 00:00:06,180
+an evaluation metric for classification
+作为遇到偏斜类问题
+
+4
+00:00:06,840 --> 00:00:08,220
+problems with skew classes.
+的评估度量值
+
+5
+00:00:09,530 --> 00:00:11,020
+For many applications, we'll want
+在很多应用中
+
+6
+00:00:11,180 --> 00:00:13,350
+to somehow control the trade
+我们希望能够保证
+
+7
+00:00:13,630 --> 00:00:15,640
+off between position and recall.
+查准率和召回率的相对平衡
+
+8
+00:00:16,500 --> 00:00:17,310
+Let me tell you how
+在这节课中
+
+9
+00:00:17,470 --> 00:00:19,020
+to do that and also show
+我将告诉你应该怎么做
+
+10
+00:00:19,390 --> 00:00:20,520
+you some, even more effective
+同时也向你展示一些
+
+11
+00:00:21,050 --> 00:00:22,810
+ways to use precision and
+查准率和召回率
+
+12
+00:00:22,980 --> 00:00:24,290
+recall as an evaluation
+作为算法评估度量值的
+
+13
+00:00:24,720 --> 00:00:27,380
+metric for learning algorithms.
+更有效的方式
+
+14
+00:00:28,620 --> 00:00:30,180
+As a reminder, here are the
+回忆一下
+
+15
+00:00:30,250 --> 00:00:32,150
+definitions of precision and
+这是查准率和召回率的定义
+
+16
+00:00:32,380 --> 00:00:34,100
+recall from the previous video.
+我们在上一节中讲到的
+
+17
+00:00:35,920 --> 00:00:37,650
+Let's continue our cancer classification
+让我们继续用
+
+18
+00:00:38,680 --> 00:00:39,980
+example, where y equals
+癌症分类的例子
+
+19
+00:00:40,370 --> 00:00:41,790
+one if the patient has cancer
+如果病人患癌症 则y=1
+
+20
+00:00:42,270 --> 00:00:43,310
+and y equals zero otherwise.
+反之则y=0
+
+21
+00:00:44,800 --> 00:00:46,060
+And let's say we've trained in
+假设我们用
+
+22
+00:00:46,360 --> 00:00:48,580
+logistic regression classifier, which outputs
+逻辑回归模型训练了数据
+
+23
+00:00:49,070 --> 00:00:50,690
+probabilities between zero and one.
+输出概率在0-1之间的值
+
+24
+00:00:51,740 --> 00:00:52,830
+So, as usual, we're
+因此
+
+25
+00:00:53,010 --> 00:00:54,690
+going to predict one, y equals
+我们预测y=1
+
+26
+00:00:55,080 --> 00:00:56,290
+one if h of x
+如果h(x)
+
+27
+00:00:56,560 --> 00:00:57,980
+is greater than or equal to
+大于或等于0.5
+
+28
+00:00:58,090 --> 00:00:59,720
+0.5 and predict zero if
+预测值为0
+
+29
+00:01:00,140 --> 00:01:01,570
+the hypothesis outputs a value
+如果方程输出值
+
+30
+00:01:01,820 --> 00:01:03,720
+less than 0.5 and this
+小于0.5
+
+31
+00:01:04,040 --> 00:01:05,400
+classifier may give us
+这个回归模型
+
+32
+00:01:05,710 --> 00:01:08,430
+some value for precision and some value for recall.
+能够计算查准率和召回率
+
+33
+00:01:10,420 --> 00:01:11,860
+But now, suppose we want
+但是现在
+
+34
+00:01:12,140 --> 00:01:13,440
+to predict that a patient
+假如我们希望
+
+35
+00:01:13,730 --> 00:01:15,510
+has cancer only if we're
+在我们非常确信地情况下
+
+36
+00:01:15,750 --> 00:01:17,190
+very confident that they really do.
+才预测一个病人得了癌症
+
+37
+00:01:18,010 --> 00:01:18,900
+Because you know if you go
+因为你知道
+
+38
+00:01:19,140 --> 00:01:20,180
+to a patient and you tell
+如果你告诉一个病人
+
+39
+00:01:20,480 --> 00:01:21,570
+them that they have cancer, it's
+告诉他们你得了癌症
+
+40
+00:01:21,710 --> 00:01:22,450
+going to give them a huge
+他们会非常震惊
+
+41
+00:01:22,680 --> 00:01:23,860
+shock because this is seriously
+因为这是一个
+
+42
+00:01:24,220 --> 00:01:25,610
+bad news and they may
+非常坏的消息
+
+43
+00:01:25,700 --> 00:01:27,080
+end up going through a pretty
+而且他们会经历一段
+
+44
+00:01:27,660 --> 00:01:29,570
+painful treatment process and so on.
+非常痛苦的治疗过程
+
+45
+00:01:29,780 --> 00:01:30,770
+And so maybe we want to
+因此我们希望
+
+46
+00:01:30,980 --> 00:01:31,880
+tell someone that we think
+只有在我们非常确信的情况下
+
+47
+00:01:32,090 --> 00:01:34,240
+they have cancer only if they're very confident.
+才告诉这个人他得了癌症
+
+48
+00:01:36,230 --> 00:01:37,210
+One way to do this would
+这样做的一种方法
+
+49
+00:01:37,320 --> 00:01:38,940
+be to modify the algorithm, so
+是修改算法
+
+50
+00:01:39,120 --> 00:01:40,290
+that instead of setting the
+我们不再将临界值
+
+51
+00:01:40,710 --> 00:01:42,270
+threshold at 0.5, we
+设为0.5
+
+52
+00:01:42,820 --> 00:01:44,360
+might instead say that we'll
+也许 我们只在
+
+53
+00:01:44,510 --> 00:01:45,370
+predict that y is equal
+h(x)的值大于或等于0.7
+
+54
+00:01:46,330 --> 00:01:48,630
+to 1, only if H of
+的情况下
+
+55
+00:01:48,700 --> 00:01:50,200
+x is greater than or equal to 0.7.
+才预测y=1
+
+56
+00:01:50,490 --> 00:01:51,620
+So this, I think
+因此
+
+57
+00:01:52,360 --> 00:01:53,400
+will tell someone if they
+我们会告诉一个人
+
+58
+00:01:53,510 --> 00:01:54,530
+have cancer only if we think
+他得了癌症
+
+59
+00:01:54,810 --> 00:01:56,280
+there's a greater than, greater
+在我们认为
+
+60
+00:01:56,730 --> 00:01:59,060
+than or equal to 70% that they have cancer.
+他有大于等于70%得癌症的概率情况下
+
+61
+00:02:00,830 --> 00:02:02,000
+And if you do this then
+如果你这么做
+
+62
+00:02:02,850 --> 00:02:03,740
+you're predicting some of this
+那么你只在
+
+63
+00:02:03,840 --> 00:02:04,990
+cancer only when you're
+非常确信地情况下
+
+64
+00:02:05,100 --> 00:02:07,230
+more confident, and so
+才预测癌症
+
+65
+00:02:07,520 --> 00:02:08,830
+you end up with a classifier
+那么你的回归模型
+
+66
+00:02:09,920 --> 00:02:13,410
+that has higher precision, because
+会有较高的查准率
+
+67
+00:02:14,140 --> 00:02:15,300
+all the patients that you're
+因为所有你准备
+
+68
+00:02:15,450 --> 00:02:16,630
+going to and say, you know,
+告诉他们
+
+69
+00:02:16,860 --> 00:02:18,220
+we think you have cancer, all
+患有癌症的病人
+
+70
+00:02:18,440 --> 00:02:19,760
+of those patients are now
+所有这些人
+
+71
+00:02:20,350 --> 00:02:21,420
+pretty, once they hear, pretty
+有比较高的可能性
+
+72
+00:02:21,720 --> 00:02:23,100
+confident they actually have cancer.
+他们真的患有癌症
+
+73
+00:02:24,260 --> 00:02:26,050
+And so, a higher fraction of
+你预测患有癌症的病人中
+
+74
+00:02:26,150 --> 00:02:27,460
+the patients that you predict to
+有较大比率的人
+
+75
+00:02:27,530 --> 00:02:28,990
+have cancer, will actually turn
+他们确实患有癌症
+
+76
+00:02:29,280 --> 00:02:30,720
+out to have cancer, because in
+因为这是我们
+
+77
+00:02:31,000 --> 00:02:32,870
+making those predictions we are pretty confident.
+在非常确信的情况下做出的预测
+
+78
+00:02:34,510 --> 00:02:36,360
+But in contrast, this classifier will
+与之相反
+
+79
+00:02:36,540 --> 00:02:38,530
+have lower recall, because
+这个回归模型会有较低的召回率
+
+80
+00:02:39,140 --> 00:02:40,220
+now we are going
+因为
+
+81
+00:02:40,340 --> 00:02:41,650
+to make predictions, we are
+当我们做预测的时候
+
+82
+00:02:41,740 --> 00:02:44,180
+going to predict y equals one, on a smaller number of patients.
+我们只给很小一部分的病人预测y=1
+
+83
+00:02:45,090 --> 00:02:45,920
+Now we could even take this further.
+现在我们把这个情况夸大一下
+
+84
+00:02:46,330 --> 00:02:47,520
+Instead of setting the threshold
+我们不再把临界值
+
+85
+00:02:48,080 --> 00:02:49,210
+at 0.7, we can set
+设在0.7
+
+86
+00:02:49,490 --> 00:02:51,550
+this at 0.9 and we'll predict
+我们把它设为0.9
+
+87
+00:02:52,430 --> 00:02:53,270
+y1 only if we are
+我们只在至少90%肯定
+
+88
+00:02:53,320 --> 00:02:54,560
+more than 90% certain that
+这个病人患有癌症的情况下
+
+89
+00:02:55,380 --> 00:02:57,020
+the patient has cancer, and so,
+预测y=1
+
+90
+00:02:57,600 --> 00:02:58,720
+you know, a large fraction that
+那么这些病人当中
+
+91
+00:02:58,850 --> 00:02:59,820
+those patients will turn out
+有非常大的比率
+
+92
+00:03:00,020 --> 00:03:01,380
+to have cancer and so,
+真正患有癌症
+
+93
+00:03:01,560 --> 00:03:03,060
+this is the high precision classifier
+因此这是一个高查准率的模型
+
+94
+00:03:04,160 --> 00:03:06,090
+will have lower recall because we
+但是召回率会变低
+
+95
+00:03:06,190 --> 00:03:08,550
+want to correctly detect that those patients have cancer.
+因为我们希望能够正确检测患有癌症的病人
+
+96
+00:03:09,310 --> 00:03:10,780
+Now consider a different example.
+现在考虑一个不同的例子
+
+97
+00:03:12,100 --> 00:03:13,200
+Suppose we want to avoid
+假设我们希望
+
+98
+00:03:13,470 --> 00:03:15,530
+missing too many actual cases of cancer.
+避免遗漏掉患有癌症的人
+
+99
+00:03:15,960 --> 00:03:17,480
+So we want to avoid the false negatives.
+即我们希望避免假阴性
+
+100
+00:03:18,600 --> 00:03:19,820
+In particular, if a patient
+具体地说
+
+101
+00:03:20,350 --> 00:03:22,280
+actually has cancer, but we
+如果一个病人实际患有癌症
+
+102
+00:03:22,520 --> 00:03:23,700
+fail to tell them that
+但是我们并没有告诉他患有癌症
+
+103
+00:03:23,860 --> 00:03:25,710
+they have cancer, then that can be really bad.
+那这可能造成严重后果
+
+104
+00:03:25,880 --> 00:03:27,460
+Because if we tell
+因为
+
+105
+00:03:27,760 --> 00:03:28,870
+a patient that they don't
+如果我们告诉病人他们没有患癌症
+
+106
+00:03:29,240 --> 00:03:31,460
+have cancer then they are
+那么
+
+107
+00:03:31,530 --> 00:03:32,870
+not going to go for treatment and
+他们就不会接受治疗
+
+108
+00:03:32,980 --> 00:03:33,890
+if it turns out that they
+但是如果
+
+109
+00:03:34,050 --> 00:03:35,380
+have cancer or we fail
+他们患有癌症
+
+110
+00:03:35,520 --> 00:03:36,410
+to tell them they have
+我们又没有告诉他们
+
+111
+00:03:36,660 --> 00:03:39,060
+cancer, well they may not get treated at all.
+那么他们就根本不会接受治疗
+
+112
+00:03:39,430 --> 00:03:40,520
+And so that would be
+那么
+
+113
+00:03:40,640 --> 00:03:41,820
+a really bad outcome because he
+这么可能造成严重后果
+
+114
+00:03:42,080 --> 00:03:43,050
+died because we told them
+病人丧失生命
+
+115
+00:03:43,140 --> 00:03:44,560
+they don't have cancer they failed
+因为我们没有告诉他患有癌症
+
+116
+00:03:44,670 --> 00:03:46,780
+to get treated, but it turns
+他没有接受治疗
+
+117
+00:03:48,230 --> 00:03:48,790
+out that they actually have cancer.
+但事实上他又患有癌症
+
+118
+00:03:49,260 --> 00:03:50,260
+When in doubt, we want to
+这种i情况下
+
+119
+00:03:50,360 --> 00:03:52,430
+predict that y equals one.
+我们希望预测y=1
+
+120
+00:03:52,720 --> 00:03:54,260
+So when in doubt, we want
+我们希望
+
+121
+00:03:54,480 --> 00:03:55,510
+to predict that they have
+预测病人患有癌症
+
+122
+00:03:55,770 --> 00:03:56,820
+cancer so that at least
+这样
+
+123
+00:03:57,110 --> 00:03:58,150
+they look further into it
+他们会做进一步的检测
+
+124
+00:03:59,400 --> 00:04:00,720
+and this can get treated,
+然后接受治疗
+
+125
+00:04:01,180 --> 00:04:02,750
+in case they do turn out to have cancer.
+以避免他们真的患有癌症
+
+126
+00:04:04,870 --> 00:04:06,300
+In this case, rather than setting
+在这个例子中
+
+127
+00:04:06,750 --> 00:04:08,920
+higher probability threshold, we might
+我们不再设置高的临界值
+
+128
+00:04:09,100 --> 00:04:11,370
+instead take this value
+我们会设置另一个值
+
+129
+00:04:12,270 --> 00:04:13,310
+and this then sets it to
+将临界值
+
+130
+00:04:13,540 --> 00:04:14,710
+a lower value, so maybe
+设得较低
+
+131
+00:04:15,060 --> 00:04:17,390
+0.3 like so.
+比如0.3
+
+132
+00:04:18,760 --> 00:04:19,780
+By doing so, we're saying
+这样做
+
+133
+00:04:20,070 --> 00:04:21,380
+that, you know what, if we
+我们认为
+
+134
+00:04:21,480 --> 00:04:22,190
+think there's more than a 30%
+他们有大于30%的几率
+
+135
+00:04:22,220 --> 00:04:24,660
+chance that they have caner, we better
+患有癌症
+
+136
+00:04:24,890 --> 00:04:26,270
+be more conservative and tell
+我们以更加保守的方式
+
+137
+00:04:26,510 --> 00:04:27,330
+them that they may have cancer,
+告诉他们患有癌症
+
+138
+00:04:27,850 --> 00:04:29,610
+so they can seek treatment if necessary.
+因此他们能够接受治疗
+
+139
+00:04:31,110 --> 00:04:32,630
+And in this case, what
+在这种情况下
+
+140
+00:04:32,790 --> 00:04:34,200
+we would have is going to
+我们会有一个
+
+141
+00:04:35,120 --> 00:04:38,260
+be a higher recall classifier,
+较高召回率的模型
+
+142
+00:04:39,550 --> 00:04:41,440
+because we're going to
+因为
+
+143
+00:04:41,580 --> 00:04:43,330
+be correctly flagging a higher
+确实患有癌症的病人
+
+144
+00:04:43,580 --> 00:04:44,760
+fraction of all of
+有很大一部分
+
+145
+00:04:44,800 --> 00:04:45,920
+the patients that actually do have
+被我们正确标记出来了
+
+146
+00:04:46,130 --> 00:04:47,570
+cancer, but we're going
+但是
+
+147
+00:04:47,740 --> 00:04:51,040
+to end up with lower precision,
+我们会得到较低的查准率
+
+148
+00:04:51,670 --> 00:04:53,490
+because the higher fraction of
+因为
+
+149
+00:04:53,600 --> 00:04:54,700
+the patients that we said have
+我们预测患有癌症的病人比例越大
+
+150
+00:04:54,820 --> 00:04:57,530
+cancer, the higher fraction of them will turn out not to have cancer after all.
+那么就有较大比例的人其实没有患癌症
+
+151
+00:05:00,470 --> 00:05:01,320
+And by the way, just as an
+顺带一提
+
+152
+00:05:01,400 --> 00:05:02,640
+aside, when I talk
+当我在给
+
+153
+00:05:02,920 --> 00:05:04,900
+about this to other
+别的学生讲这个的时候
+
+154
+00:05:05,160 --> 00:05:07,760
+students up until before, it's pretty amazing.
+令人惊讶的是
+
+155
+00:05:08,390 --> 00:05:09,720
+Some of my students say is
+有的学生问
+
+156
+00:05:09,850 --> 00:05:11,960
+how I can tell the story both ways.
+怎么可以从两面来看这个问题
+
+157
+00:05:12,550 --> 00:05:14,220
+Why we might want to have
+为什么我总是
+
+158
+00:05:14,450 --> 00:05:15,490
+higher precision or higher recall
+只想要高查准率或高召回率
+
+159
+00:05:16,130 --> 00:05:18,570
+and the story actually seems to work both ways.
+但是这看起来可以使两边都提高
+
+160
+00:05:19,340 --> 00:05:20,550
+But I hope the details of
+但是我希望
+
+161
+00:05:20,670 --> 00:05:22,720
+the algorithm is true and the
+算法是正确的
+
+162
+00:05:22,990 --> 00:05:24,360
+more general principle is, depending
+更普遍的一个原则是
+
+163
+00:05:24,780 --> 00:05:26,150
+on where you want, whether
+这取决于你想要什么
+
+164
+00:05:26,330 --> 00:05:28,010
+you want high precision, lower recall
+你想要高查准率 低召回率
+
+165
+00:05:28,540 --> 00:05:30,340
+or higher recall, lower precision, you
+还是高召回率 低查准率
+
+166
+00:05:30,450 --> 00:05:32,100
+can end up predicting y equals
+你可以预测y=1
+
+167
+00:05:32,540 --> 00:05:35,040
+one when h(x) is greater than some threshold.
+当h(x)大于某个临界值
+
+168
+00:05:36,590 --> 00:05:39,240
+And so, in general, for
+因此 总的来说
+
+169
+00:05:39,880 --> 00:05:41,330
+most classifiers, there is going
+对于大多数的回归模型
+
+170
+00:05:41,540 --> 00:05:44,200
+to be a trade off between precision and recall.
+你得权衡查准率和召回率
+
+171
+00:05:45,360 --> 00:05:46,540
+And as you vary the value
+当你改变
+
+172
+00:05:47,050 --> 00:05:48,700
+of this threshold, this value,
+临界值的值时
+
+173
+00:05:49,030 --> 00:05:49,850
+this special that I have
+我在这儿画了一个
+
+174
+00:05:49,910 --> 00:05:51,470
+joined here, you can actually
+临界值
+
+175
+00:05:51,790 --> 00:05:53,850
+plot us some curve that
+你可以画出曲线
+
+176
+00:05:54,030 --> 00:05:56,060
+trades off precision and
+来权衡查准率
+
+177
+00:05:56,200 --> 00:05:58,010
+recall, where a value
+和召回率
+
+178
+00:05:58,410 --> 00:06:00,020
+up here, this would correspond
+这里的一个值
+
+179
+00:06:01,360 --> 00:06:02,620
+to a very high value of
+反应出一个较高的临界值
+
+180
+00:06:02,770 --> 00:06:04,490
+the threshold, maybe threshold equals
+这个临界值可能等于0.99
+
+181
+00:06:05,420 --> 00:06:06,790
+over 0.99, so that say, predict
+我们假设
+
+182
+00:06:07,090 --> 00:06:08,270
+y equals 1 only where
+只在有大于99%的确信度的情况下
+
+183
+00:06:08,480 --> 00:06:09,640
+no more than 99 percent
+才预测y=1
+
+184
+00:06:10,290 --> 00:06:11,700
+confident, at least 99
+至少
+
+185
+00:06:11,950 --> 00:06:13,460
+percent probability this once, so
+有99%的可能性
+
+186
+00:06:13,760 --> 00:06:15,390
+that will be a precision relatively
+因此这个点反应高查准率
+
+187
+00:06:15,960 --> 00:06:17,550
+low recall, whereas the point
+低召回率
+
+188
+00:06:17,820 --> 00:06:20,380
+down here will correspond to
+然而这里的一个点
+
+189
+00:06:20,490 --> 00:06:22,240
+a value of the threshold that's
+反映一个较低的临界值
+
+190
+00:06:22,450 --> 00:06:24,940
+much lower, maybe 0.01.
+比如说0.01
+
+191
+00:06:25,520 --> 00:06:26,810
+When in doubt at all, put down y1.
+毫无疑问 在这里预测y=1
+
+192
+00:06:27,120 --> 00:06:28,380
+And if you do that, you
+如果你这么做
+
+193
+00:06:28,520 --> 00:06:29,570
+end up with a much
+你最后会得到
+
+194
+00:06:29,760 --> 00:06:31,730
+lower precision higher recall classifier.
+很低的查准率 但是较高的召回率
+
+195
+00:06:33,350 --> 00:06:34,970
+And as you vary the threshold, if
+当你改变临界值
+
+196
+00:06:35,430 --> 00:06:36,550
+you want, you can actually trace
+如果你愿意
+
+197
+00:06:37,000 --> 00:06:38,280
+all the curve from your classifier
+你可以画出回归模型的所有曲线
+
+198
+00:06:38,930 --> 00:06:41,420
+to see the range of different values you can get for precision recall.
+来看看你能得到的查准率和召回率的范围
+
+199
+00:06:43,050 --> 00:06:43,810
+And by the way, the position
+顺带一提
+
+200
+00:06:44,230 --> 00:06:46,860
+recall curve can look like many different shapes.
+查准率-召回率曲线可以是各种不同的形状
+
+201
+00:06:47,260 --> 00:06:49,140
+Sometimes it'll look this, sometimes
+有时它看起来是这样
+
+202
+00:06:50,550 --> 00:06:51,260
+it'll look like that.
+有时是那样
+
+203
+00:06:52,330 --> 00:06:53,210
+Now, there are many different possible
+查准率-召回率曲线的形状
+
+204
+00:06:53,610 --> 00:06:54,820
+shapes in the position of recall
+有很多可能性
+
+205
+00:06:55,020 --> 00:06:56,850
+curve, depending on the details of the classifier.
+这取决于回归模型的具体算法
+
+206
+00:06:57,990 --> 00:06:59,620
+So this raises another
+因此这又产生了
+
+207
+00:06:59,900 --> 00:07:01,680
+interesting question which is, is
+另一个有趣的问题
+
+208
+00:07:01,870 --> 00:07:03,130
+there a way to choose
+那就是
+
+209
+00:07:03,510 --> 00:07:06,100
+this threshold automatically?
+有没有办法自动选取临界值
+
+210
+00:07:06,810 --> 00:07:07,890
+Or, more generally, if we have
+或者 更广泛地说
+
+211
+00:07:08,500 --> 00:07:10,060
+a few different algorithms or a
+如果我们有不同的算法
+
+212
+00:07:10,150 --> 00:07:12,290
+few different ideas for algorithms, how
+或者不同的想法
+
+213
+00:07:12,490 --> 00:07:15,340
+do we compare different precision recall numbers?
+我们如何比较不同的查准率和召回率呢?
+
+214
+00:07:16,400 --> 00:07:16,400
+completely.
+具体来说
+
+215
+00:07:17,170 --> 00:07:18,250
+Suppose we have three different
+假设我们有三个
+
+216
+00:07:18,590 --> 00:07:20,050
+learning algorithms, or actually maybe
+不同的学习算法
+
+217
+00:07:20,120 --> 00:07:22,060
+these are three different learning algorithms, may
+或者这三个不同的学习曲线
+
+218
+00:07:22,250 --> 00:07:25,010
+be these are the same algorithm, but just with different values for the threshold.
+是同样的算法 但是临界值不同
+
+219
+00:07:26,190 --> 00:07:28,560
+How do we decide which of these algorithms is best?
+我们怎样决定哪一个算法是最好的
+
+220
+00:07:29,770 --> 00:07:30,460
+One of the things we talked
+我们之前讲到的
+
+221
+00:07:30,680 --> 00:07:31,630
+about earlier is the importance
+其中一件事就是
+
+222
+00:07:32,520 --> 00:07:34,590
+of a single real number evaluation metric.
+评估度量值的重要性
+
+223
+00:07:35,880 --> 00:07:36,890
+And that is the idea of
+这个概念是
+
+224
+00:07:36,970 --> 00:07:38,050
+having a number that just
+通过一个具体的数字
+
+225
+00:07:38,370 --> 00:07:40,130
+tells you how well is your classifier doing.
+来反映你的回归模型到底如何
+
+226
+00:07:41,270 --> 00:07:42,260
+But by switching to the precision
+但是查准率和召回率的问题
+
+227
+00:07:42,690 --> 00:07:44,330
+recall metric, we've actually lost that.
+我们却不能这样做
+
+228
+00:07:44,580 --> 00:07:46,090
+We now have two real numbers.
+因为在这里我们有两个可以判断的数字
+
+229
+00:07:47,190 --> 00:07:48,600
+And so we often end up
+因此 我们经常会
+
+230
+00:07:48,770 --> 00:07:50,580
+facing situations, like if
+不得不面对这样的情况
+
+231
+00:07:50,750 --> 00:07:52,770
+we are trying to compare algorithm 1
+如果我们正在试图比较算法1
+
+232
+00:07:52,970 --> 00:07:54,350
+to algorithm 2, we
+和算法2
+
+233
+00:07:54,420 --> 00:07:55,420
+end up asking ourselves, Is a
+我们最后问自己
+
+234
+00:07:55,450 --> 00:07:56,550
+position of point five and
+到底是0.5的查准率与
+
+235
+00:07:56,700 --> 00:07:57,580
+a recall of point four, well
+0.4的召回率好
+
+236
+00:07:57,830 --> 00:07:58,830
+is that better or worse than
+还是说
+
+237
+00:07:58,960 --> 00:08:00,120
+a position of point seven or
+0.7的查准率与
+
+238
+00:08:00,300 --> 00:08:01,890
+a recall point one?
+0.1的召回率好
+
+239
+00:08:02,150 --> 00:08:03,020
+If every time you try
+或者每一次
+
+240
+00:08:03,350 --> 00:08:04,730
+on a new algorithm you end up
+你设计一个新算法
+
+241
+00:08:04,890 --> 00:08:06,070
+having to sit around and think
+你都要坐下来思考
+
+242
+00:08:06,530 --> 00:08:07,710
+well, maybe point five
+到底0.5 0.4好
+
+243
+00:08:07,960 --> 00:08:09,170
+point four, is better than point
+还是说
+
+244
+00:08:09,330 --> 00:08:11,120
+seven point one, maybe not, I do not know.
+0.7 0.1好 我不知道
+
+245
+00:08:11,590 --> 00:08:13,740
+If you end up having to sit around and think and make these decisions.
+如果你最后这样坐下来思考
+
+246
+00:08:14,440 --> 00:08:15,830
+that really slows down your
+这回降低
+
+247
+00:08:16,030 --> 00:08:18,710
+decision making process, for what
+你的决策速度
+
+248
+00:08:19,120 --> 00:08:21,560
+changes are useful to incorporate into your algorithm.
+思考到底哪些改变是有用的 应该被融入到你的算法
+
+249
+00:08:23,070 --> 00:08:24,810
+Where as in contrast, if we
+与此相反的是
+
+250
+00:08:24,880 --> 00:08:26,410
+had a single real number evaluation metric,
+如果我们有一个评估度量值
+
+251
+00:08:27,220 --> 00:08:28,240
+like a number that just
+一个数字
+
+252
+00:08:28,590 --> 00:08:31,140
+tells us is either algorithm 1 or is algorithm 2 better.
+能够告诉我们到底是算法1好还是算法2好
+
+253
+00:08:31,710 --> 00:08:33,110
+That helps us much more
+这能够帮助我们
+
+254
+00:08:33,370 --> 00:08:34,840
+quickly decide which algorithm to
+更快地决定
+
+255
+00:08:34,950 --> 00:08:36,290
+go with and helps us
+哪一个算法更好
+
+256
+00:08:36,450 --> 00:08:37,520
+as well to much more quickly
+同时也能够更快地帮助我们
+
+257
+00:08:38,110 --> 00:08:39,700
+evaluate different changes that we
+评估不同的改动
+
+258
+00:08:39,830 --> 00:08:41,370
+may be contemplating for an algorithm.
+哪些应该被融入进算法里面
+
+259
+00:08:42,050 --> 00:08:43,080
+So, how can we get
+那么 我们怎样才能
+
+260
+00:08:43,480 --> 00:08:45,910
+a single real number evaluation metric.
+得到这个评估度量值呢?
+
+261
+00:08:47,480 --> 00:08:48,590
+One natural thing that you
+你可能会去尝试的
+
+262
+00:08:48,660 --> 00:08:49,910
+might try is to look
+一件事情是
+
+263
+00:08:50,150 --> 00:08:52,110
+at the average between precision and recall.
+计算一下查准率和召回率的平均值
+
+264
+00:08:52,330 --> 00:08:53,310
+So using p and r
+用 P 和 R
+
+265
+00:08:53,570 --> 00:08:54,800
+to denote position and recall, what
+来表示查准率和召回率
+
+266
+00:08:55,010 --> 00:08:56,300
+you could do is just compute the
+你可以做的是
+
+267
+00:08:56,520 --> 00:08:57,280
+average and look at what classifier
+计算它们的平均值
+
+268
+00:08:57,770 --> 00:08:59,300
+has the highest average value.
+看一看哪个模型有最高的均值
+
+269
+00:09:00,840 --> 00:09:01,990
+But this turns out not to
+但是这可能
+
+270
+00:09:02,040 --> 00:09:04,990
+be such a good solution because, similar
+并不是一个很好的解决办法
+
+271
+00:09:05,350 --> 00:09:06,480
+to the example we had earlier,
+因为 像我们之前的例子一样
+
+272
+00:09:07,860 --> 00:09:08,970
+it turns out that if we
+如果我们的回归模型
+
+273
+00:09:09,070 --> 00:09:10,260
+have a classifier that predicts
+总是预测
+
+274
+00:09:11,310 --> 00:09:13,830
+y1 all the time, then if
+y=1
+
+275
+00:09:13,960 --> 00:09:15,540
+you do that, you can get a very high recall.
+这么做你可能得到非常高的召回率
+
+276
+00:09:16,290 --> 00:09:18,700
+That's you end up with a very low value of Vision.
+得到非常低的查准率
+
+277
+00:09:19,690 --> 00:09:21,230
+Conversely,if you have a classifier
+相反地 如果你的模型
+
+278
+00:09:21,640 --> 00:09:24,060
+that predicts y=0 almost all
+总是预测y=0
+
+279
+00:09:25,340 --> 00:09:26,400
+the time, that is, if
+就是说
+
+280
+00:09:26,490 --> 00:09:28,100
+it predicts y one very sparingly.
+如果很少预测y=1
+
+281
+00:09:28,910 --> 00:09:30,820
+This corresponds to setting
+对应的
+
+282
+00:09:31,130 --> 00:09:34,190
+a very high threshold using the notation of previous line.
+设置了一个高临界值
+
+283
+00:09:34,490 --> 00:09:35,610
+And then you can actually
+最后 你会得到非常高的
+
+284
+00:09:35,670 --> 00:09:37,650
+end up with a very high precision with a very low recall.
+查准率和非常低的召回率
+
+285
+00:09:39,280 --> 00:09:40,470
+So the two extremes of either
+这两个极端情况
+
+286
+00:09:40,790 --> 00:09:42,380
+are a very high threshold or a
+一个有非常高的临界值
+
+287
+00:09:42,540 --> 00:09:44,050
+very low threshold, neither of
+一个有非常低的临界值
+
+288
+00:09:44,170 --> 00:09:45,610
+them would give it paticularary good classifier.
+它们中的任何一个都不是一个好的模型
+
+289
+00:09:46,280 --> 00:09:47,560
+And we recognize that is
+我们可以通过
+
+290
+00:09:47,650 --> 00:09:48,650
+by seeing if we end
+非常低的查准率
+
+291
+00:09:48,710 --> 00:09:49,830
+up with a very low
+或者非常低的召回率
+
+292
+00:09:50,030 --> 00:09:52,710
+precision or a very low recall.
+判断这不是一个好模型
+
+293
+00:09:54,140 --> 00:09:56,120
+And if you just take the average of people's ro2.
+如果你只是使用(P+R)/2
+
+294
+00:09:57,140 --> 00:09:58,980
+One does the example the average
+算法3的这个值
+
+295
+00:09:59,760 --> 00:10:01,410
+is actually highest for algorithm 3.
+是最高的
+
+296
+00:10:01,810 --> 00:10:02,800
+Even though you can get
+即使你可以通过
+
+297
+00:10:02,910 --> 00:10:04,010
+that sort of performance by predicting
+使用总是预测y=1这样的方法
+
+298
+00:10:04,510 --> 00:10:05,850
+y1 all the time.
+来得到这样的值
+
+299
+00:10:06,220 --> 00:10:08,580
+And that is just not a very good classifier, right?
+但这并不是一个好的模型 对吧
+
+300
+00:10:08,670 --> 00:10:09,680
+You predict y equals one all
+你总是预测y=1
+
+301
+00:10:09,780 --> 00:10:11,010
+the time is just not a
+这不是一个有用的模型
+
+302
+00:10:11,210 --> 00:10:13,950
+useful classifier if all it does is prints out y equals one.
+因为它只输出y=1
+
+303
+00:10:15,000 --> 00:10:16,580
+And so algorithm one or algorithm
+那么算法1和
+
+304
+00:10:17,040 --> 00:10:18,080
+two would be more
+算法2
+
+305
+00:10:18,280 --> 00:10:19,620
+useful than algorithm three,
+比算法3更有用
+
+306
+00:10:20,500 --> 00:10:22,330
+but in this example algorithm three
+但是在这个例子中
+
+307
+00:10:23,080 --> 00:10:24,840
+has a higher average value of
+查准率和召回率的平均值
+
+308
+00:10:24,920 --> 00:10:27,460
+precision recall than algorithm one and two.
+算法3是最高的
+
+309
+00:10:28,770 --> 00:10:29,780
+So we usually think of
+因此我们通常认为
+
+310
+00:10:29,900 --> 00:10:31,410
+this average of precision recall
+查准率和召回率的平均值
+
+311
+00:10:32,280 --> 00:10:35,000
+as not a particularly good way to evaluate our learning algorithm.
+不是评估算法的一个好的方法
+
+312
+00:10:38,200 --> 00:10:39,820
+In contrast, there is a
+相反地
+
+313
+00:10:40,030 --> 00:10:41,770
+different way of combining precision recall.
+有一种结合查准率和召回率的不同方式
+
+314
+00:10:42,370 --> 00:10:44,940
+It is called the f score and it uses that formula.
+叫做F值 公式是这样
+
+315
+00:10:46,420 --> 00:10:48,740
+So, in this example, here are the f scores.
+在这个例子中 F值是这样的
+
+316
+00:10:49,530 --> 00:10:50,440
+And so we would tell
+我们可以通过
+
+317
+00:10:50,900 --> 00:10:52,140
+from these f scores and
+F值来判断
+
+318
+00:10:52,270 --> 00:10:53,660
+we'll say algorithm 1 has
+算法1
+
+319
+00:10:53,820 --> 00:10:55,290
+the highest f score.
+有最高的F值
+
+320
+00:10:55,620 --> 00:10:56,910
+Algorithm 2 has the second highest and
+算法2第二
+
+321
+00:10:57,180 --> 00:10:58,560
+algorithm 3 has the lowest and so
+算法3是最低的
+
+322
+00:10:59,040 --> 00:10:59,920
+you know, if we go by
+因此 通过F值
+
+323
+00:11:00,190 --> 00:11:02,700
+the f score, we would pick probably algorithm of 1 over the others.
+我们会在这几个算法中选择算法1
+
+324
+00:11:04,950 --> 00:11:06,120
+The f score, which is also
+F值
+
+325
+00:11:06,310 --> 00:11:07,470
+called the f1 score,
+也叫做F1值
+
+326
+00:11:07,670 --> 00:11:09,110
+is usually written f1 score
+一般写作F1值
+
+327
+00:11:09,340 --> 00:11:11,620
+that I have here, but often people will just say f score.
+但是人们一般只说F值
+
+328
+00:11:12,800 --> 00:11:14,750
+It determines use is a
+它的定义
+
+329
+00:11:15,080 --> 00:11:16,130
+little bit like taking the
+会考虑一部分
+
+330
+00:11:16,290 --> 00:11:17,660
+average of precision of recall,
+查准率和召回率的平均值
+
+331
+00:11:18,080 --> 00:11:19,220
+but it gives the lower
+但是它
+
+332
+00:11:19,580 --> 00:11:20,860
+value of precision and recall
+会给查准率和召回率中较低的值
+
+333
+00:11:21,060 --> 00:11:23,460
+- whichever it is - it gives it a higher weight.
+更高的权重
+
+334
+00:11:23,950 --> 00:11:25,220
+And so, you see in
+因此
+
+335
+00:11:25,360 --> 00:11:27,090
+the numerator here that the
+你可以看到F值的分子
+
+336
+00:11:27,250 --> 00:11:29,910
+f score takes a product or position of equal.
+是查准率和召回率的乘积
+
+337
+00:11:30,460 --> 00:11:31,900
+And so, if either position is
+因此如果查准率等于0
+
+338
+00:11:32,050 --> 00:11:33,070
+0 or recall is equal to
+或者召回率等于0
+
+339
+00:11:33,180 --> 00:11:34,310
+0, the f score will
+F值也会
+
+340
+00:11:34,600 --> 00:11:35,590
+be equal to o. So
+等于0
+
+341
+00:11:35,690 --> 00:11:38,290
+in that sense, it kind of combines position and recall.
+因此它结合了查准率和召回率
+
+342
+00:11:38,850 --> 00:11:40,160
+but for the f score to
+对于一个较大的F值
+
+343
+00:11:40,300 --> 00:11:41,600
+be large, both position
+查准率
+
+344
+00:11:42,100 --> 00:11:43,480
+and recall have to be pretty large.
+和召回率都必须较大
+
+345
+00:11:44,490 --> 00:11:45,770
+I should say that there are
+我必须说
+
+346
+00:11:45,950 --> 00:11:47,950
+many different possible formulas for
+有较多的公式
+
+347
+00:11:48,060 --> 00:11:49,450
+combining position and recall.
+可以结合查准率和召回率
+
+348
+00:11:50,040 --> 00:11:51,400
+This f score formula is
+F值公式
+
+349
+00:11:51,730 --> 00:11:53,470
+really, maybe just one out
+只是
+
+350
+00:11:53,640 --> 00:11:54,800
+of a much larger number of
+其中一个
+
+351
+00:11:54,880 --> 00:11:57,200
+possibilities, but historically or
+但是出于历史原因
+
+352
+00:11:57,270 --> 00:11:58,310
+traditionally this is what
+和习惯问题
+
+353
+00:11:58,460 --> 00:12:00,110
+people in machine learning use.
+人们在机器学习中使用F值
+
+354
+00:12:01,550 --> 00:12:02,840
+And the term f score, you
+这个术语F值
+
+355
+00:12:02,960 --> 00:12:04,160
+know, it doesn't really mean
+没有什么特别的意义
+
+356
+00:12:04,390 --> 00:12:05,460
+anything, so don't worry about
+所以不要担心
+
+357
+00:12:05,690 --> 00:12:07,680
+why it's called f score or f1 score.
+它到底为什么叫做F值或者F1值
+
+358
+00:12:08,510 --> 00:12:10,900
+But this usually gives
+但是它给了你
+
+359
+00:12:11,370 --> 00:12:12,230
+you the effect that you want
+你需要的有效方法
+
+360
+00:12:12,600 --> 00:12:14,070
+because if either position is
+因为无论是查准率等于0
+
+361
+00:12:14,370 --> 00:12:15,410
+0 or recall is 0, this
+还是召回率等于0
+
+362
+00:12:15,470 --> 00:12:17,470
+gives you a very low f score.
+它都会得到一个很低的F值
+
+363
+00:12:17,610 --> 00:12:18,730
+And so, to have a
+因此
+
+364
+00:12:18,770 --> 00:12:20,030
+high f score you can't
+如果要得到一个很高的F值
+
+365
+00:12:20,280 --> 00:12:21,790
+need a preserve quality 1
+你的算法的查准率和召回率都要接近于1
+
+366
+00:12:22,230 --> 00:12:24,630
+and completely if p
+具体地说
+
+367
+00:12:25,010 --> 00:12:26,300
+equals zero or i
+如果P=0或者
+
+368
+00:12:26,450 --> 00:12:28,440
+equals zero then this
+R=0
+
+369
+00:12:28,650 --> 00:12:31,540
+gives you the f score equals zero.
+你的F值也会等于0
+
+370
+00:12:33,430 --> 00:12:34,630
+Where as a perfect f
+对于一个最完美的F值
+
+371
+00:12:34,820 --> 00:12:36,120
+score, so if position equals
+如果查准率等于1
+
+372
+00:12:36,550 --> 00:12:38,520
+one and [xx] equals
+同时召回率
+
+373
+00:12:38,940 --> 00:12:40,380
+one that would give
+也等于1
+
+374
+00:12:40,580 --> 00:12:43,450
+you an f score that's
+那你得到的F值
+
+375
+00:12:43,680 --> 00:12:44,780
+equal to one times one
+等于1乘以1
+
+376
+00:12:45,100 --> 00:12:46,650
+over two times two.
+除以2再乘以2
+
+377
+00:12:46,800 --> 00:12:47,590
+So the f score will be
+那么F值
+
+378
+00:12:47,900 --> 00:12:48,610
+equal to 1 if you
+就等于1
+
+379
+00:12:48,680 --> 00:12:50,300
+have perfect precision and perfect recall.
+如果你能得到最完美的查准率和召回率
+
+380
+00:12:51,280 --> 00:12:53,230
+And intermediate values between 0
+在0和1中间的值
+
+381
+00:12:53,520 --> 00:12:54,980
+and 1, this usually gives a
+这经常是
+
+382
+00:12:55,210 --> 00:12:57,240
+reasonable rank ordering of different classifiers.
+回归模型最经常出现的分数
+
+383
+00:13:00,000 --> 00:13:01,070
+So this video we talked
+在这次的视频中
+
+384
+00:13:01,370 --> 00:13:03,240
+about the notion of trading
+我们讲到了如何
+
+385
+00:13:03,760 --> 00:13:05,290
+off between position and recall
+权衡查准率和召回率
+
+386
+00:13:06,140 --> 00:13:07,410
+and how we can vary the
+以及我们如何变动
+
+387
+00:13:07,540 --> 00:13:09,110
+threshold that we use to
+临界值
+
+388
+00:13:09,250 --> 00:13:10,340
+decide whether to predict y
+来决定我们希望预测y=1
+
+389
+00:13:10,540 --> 00:13:11,980
+equals one or y equals zero.
+还是y=0
+
+390
+00:13:12,180 --> 00:13:13,990
+This threshold that says do
+比如我们
+
+391
+00:13:14,070 --> 00:13:15,180
+we need to be at least
+需要一个
+
+392
+00:13:15,500 --> 00:13:16,970
+seventy percent confident or ninety
+70%还是90%置信度的临界值
+
+393
+00:13:17,200 --> 00:13:19,340
+percent confident or whatever before
+或者别的
+
+394
+00:13:19,650 --> 00:13:21,150
+we predict y equals one and
+来预测y=1
+
+395
+00:13:21,260 --> 00:13:22,610
+by varying the threshold you
+通过变动临界值
+
+396
+00:13:22,950 --> 00:13:23,930
+can control a trade off
+你可以控制权衡
+
+397
+00:13:24,300 --> 00:13:25,960
+between precision and recall.
+查准率和召回率
+
+398
+00:13:26,430 --> 00:13:27,150
+Ross talked about the f score
+之后我们讲到了F值
+
+399
+00:13:27,420 --> 00:13:28,850
+which takes precision and recall
+它权衡查准率和召回率
+
+400
+00:13:29,640 --> 00:13:30,730
+and gives you a single
+给了你一个
+
+401
+00:13:31,270 --> 00:13:32,480
+real number evaluation metric.
+评估度量值
+
+402
+00:13:33,320 --> 00:13:34,460
+And of course, if your goal is
+当然 如果你的目标是
+
+403
+00:13:34,740 --> 00:13:36,590
+to automatically set that
+自动选择临界值
+
+404
+00:13:36,880 --> 00:13:38,390
+threshold, to decide which
+来决定
+
+405
+00:13:38,590 --> 00:13:39,320
+one of y equals 1 or
+你希望预测y=1
+
+406
+00:13:39,520 --> 00:13:41,180
+y equals 0, one pretty
+还是y=0
+
+407
+00:13:41,420 --> 00:13:42,410
+reasonable way to do that
+那么一个比较理想的办法是
+
+408
+00:13:42,740 --> 00:13:44,140
+would also be to try
+试一试不同的
+
+409
+00:13:44,640 --> 00:13:46,350
+a range of different values of thresholds.
+临界值
+
+410
+00:13:46,930 --> 00:13:47,740
+So, try a range of values
+试一下
+
+411
+00:13:48,190 --> 00:13:50,430
+of thresholds and evaluate these
+不同的临界值
+
+412
+00:13:50,620 --> 00:13:51,610
+different threshold on say your
+然后评估这些不同的临界值
+
+413
+00:13:51,790 --> 00:13:53,650
+cross validation set, and then
+在交叉检验集上进行测试
+
+414
+00:13:53,840 --> 00:13:55,760
+to pick whatever value of threshold
+然后选择哪一个临界值
+
+415
+00:13:56,580 --> 00:13:57,910
+gives you the highest f score
+能够在交叉检验集上
+
+416
+00:13:58,060 --> 00:13:59,760
+on your cross validation setting.
+得到最高的F值
+
+417
+00:14:00,130 --> 00:14:01,220
+That would be a pretty reasonable way
+这是自动选择临界值的较好办法
+
+418
+00:14:01,720 --> 00:14:04,620
+to automatically chose the threshold for your classifier as well.
+较好办法
+
diff --git a/srt/11 - 5 - Data For Machine Learning (11 min).srt b/srt/11 - 5 - Data For Machine Learning (11 min).srt
new file mode 100644
index 00000000..a809099d
--- /dev/null
+++ b/srt/11 - 5 - Data For Machine Learning (11 min).srt
@@ -0,0 +1,1677 @@
+1
+00:00:00,390 --> 00:00:03,570
+In the previous video, we talked about evaluation metrics.
+在之前的视频中 我们讨论了评价指标
+(字幕修改合并:中国海洋大学 黄海广
+haiguang2000@qq.com )
+
+2
+00:00:04,730 --> 00:00:05,840
+In this video, I'd like
+在这个视频中
+
+3
+00:00:06,080 --> 00:00:07,230
+to switch tracks a bit and
+我要稍微转换一下
+
+4
+00:00:07,480 --> 00:00:08,900
+touch on another important aspect of
+讨论一下机器学习系统设计中
+
+5
+00:00:09,570 --> 00:00:10,990
+machine learning system design,
+另一个重要的方面
+
+6
+00:00:11,800 --> 00:00:13,290
+which will often come up, which
+这往往涉及到
+
+7
+00:00:13,470 --> 00:00:14,990
+is the issue of how much
+用来训练的数据
+
+8
+00:00:15,270 --> 00:00:17,110
+data to train on.
+有多少
+
+9
+00:00:17,310 --> 00:00:18,440
+Now, in some earlier videos, I
+在之前的一些视频中
+
+10
+00:00:18,620 --> 00:00:20,320
+had cautioned against blindly
+我曾告诫大家不要盲目地开始
+
+11
+00:00:20,690 --> 00:00:21,660
+going out and just spending
+而是花大量的时间
+
+12
+00:00:21,980 --> 00:00:23,300
+lots of time collecting lots of
+来收集大量的数据
+
+13
+00:00:23,420 --> 00:00:24,730
+data, because it's only
+因为数据
+
+14
+00:00:25,040 --> 00:00:26,360
+sometimes that that would actually help.
+有时是唯一能实际起到作用的
+
+15
+00:00:27,510 --> 00:00:28,580
+But it turns out that under
+但事实证明
+
+16
+00:00:28,820 --> 00:00:30,270
+certain conditions, and I
+在一定条件下
+
+17
+00:00:30,550 --> 00:00:31,580
+will say in this video what those
+我会在这个视频里讲到
+
+18
+00:00:31,770 --> 00:00:33,590
+conditions are, getting a
+这些条件是什么
+
+19
+00:00:33,820 --> 00:00:35,440
+lot of data and training on
+在这些条件下 得到大量的数据并在
+
+20
+00:00:35,730 --> 00:00:36,730
+a certain type of learning
+某种类型的学习算法中进行训练
+
+21
+00:00:37,010 --> 00:00:38,160
+algorithm, can be a
+可以是一种
+
+22
+00:00:38,240 --> 00:00:39,470
+very effective way to get
+有效的方法来获得
+
+23
+00:00:39,770 --> 00:00:41,320
+a learning algorithm to do very good performance.
+一个具有良好性能的学习算法
+
+24
+00:00:42,810 --> 00:00:44,280
+And this arises often enough
+而这种情况往往出现在
+
+25
+00:00:44,710 --> 00:00:45,930
+that if those conditions hold true
+这些条件对于你的问题
+
+26
+00:00:46,310 --> 00:00:47,850
+for your problem and if
+都成立 并且
+
+27
+00:00:47,970 --> 00:00:48,770
+you're able to get a lot
+你能够得到大量数据的
+
+28
+00:00:48,980 --> 00:00:50,070
+of data, this could be
+情况下 这可以是
+
+29
+00:00:50,210 --> 00:00:51,330
+a very good way to get
+一个很好的方式来获得
+
+30
+00:00:51,560 --> 00:00:52,970
+a very high performance learning algorithm.
+非常高性能的学习算法
+
+31
+00:00:53,990 --> 00:00:55,620
+So in this video, let's
+因此 在这段视频中
+
+32
+00:00:56,320 --> 00:00:58,960
+talk more about that.
+让我们一起讨论一下这个问题
+
+33
+00:00:59,110 --> 00:00:59,820
+Let me start with a story.
+我先讲一个故事
+
+34
+00:01:01,080 --> 00:01:02,620
+Many, many years ago, two researchers
+很多很多年前 我认识的两位研究人员
+
+35
+00:01:03,400 --> 00:01:05,380
+that I know, Michelle Banko and
+Michele Banko 和
+
+36
+00:01:05,520 --> 00:01:08,110
+Eric Broule ran the following fascinating study.
+Eric Brill 进行了一项有趣的研究
+
+37
+00:01:09,820 --> 00:01:11,290
+They were interested in studying the
+他们感兴趣的是研究
+
+38
+00:01:11,550 --> 00:01:12,910
+effect of using different learning
+使用不同的学习算法的效果
+
+39
+00:01:13,290 --> 00:01:15,210
+algorithms versus trying them
+与将这些效果
+
+40
+00:01:15,380 --> 00:01:17,420
+out on different training set sciences,
+使用到不同训练数据集上 两者的比较
+
+41
+00:01:18,020 --> 00:01:19,550
+they were considering the problem
+他们当时考虑这样一个问题
+
+42
+00:01:20,170 --> 00:01:22,120
+of classifying between confusable words,
+如何在易混淆的词之间进行分类
+
+43
+00:01:22,550 --> 00:01:23,610
+so for example, in the sentence:
+比如 在这样的句子中:
+
+44
+00:01:24,410 --> 00:01:26,990
+for breakfast I ate, should it be to, two or too?
+早餐我吃了__个鸡蛋 (to,two too 与 then,than 是两组混淆词)
+
+45
+00:01:27,940 --> 00:01:28,890
+Well, for this example,
+在这个例子中
+
+46
+00:01:29,450 --> 00:01:32,390
+for breakfast I ate two, 2 eggs.
+早餐我吃了2个鸡蛋
+
+47
+00:01:33,510 --> 00:01:34,530
+So, this is one example
+这是一个
+
+48
+00:01:35,320 --> 00:01:37,800
+of a set of confusable words and that's a different set.
+易混淆的单词的例子 而这是另外一组情况
+
+49
+00:01:38,240 --> 00:01:39,650
+So they took machine learning
+于是他们把诸如这样的机器学习问题
+
+50
+00:01:39,950 --> 00:01:41,580
+problems like these, sort of supervised learning
+当做一类监督学习问题
+
+51
+00:01:41,780 --> 00:01:43,190
+problems to try to categorize
+并尝试将其分类
+
+52
+00:01:43,970 --> 00:01:45,210
+what is the appropriate word to
+什么样的词
+
+53
+00:01:45,400 --> 00:01:46,560
+go into a certain position
+在一个英文句子特定的位置
+
+54
+00:01:47,540 --> 00:01:48,140
+in an English sentence.
+才是合适的
+
+55
+00:01:51,010 --> 00:01:52,110
+They took a few different learning
+他们用了几种不同的学习算法
+
+56
+00:01:52,340 --> 00:01:53,520
+algorithms which were, you know,
+这些算法都是
+
+57
+00:01:53,730 --> 00:01:55,210
+sort of considered state of
+在他们2001年进行研究的时候
+
+58
+00:01:55,310 --> 00:01:56,110
+the art back in the day,
+都已经
+
+59
+00:01:56,410 --> 00:01:57,670
+when they ran the study in
+被公认是比较领先的
+
+60
+00:01:57,730 --> 00:01:59,220
+2001, so they took a
+因此他们使用了一个方差
+
+61
+00:01:59,750 --> 00:02:01,140
+variance, roughly a variance
+用于逻辑回归上的一个方差
+
+62
+00:02:01,630 --> 00:02:03,180
+on logistic regression called the Perceptron.
+被称作“感知器” (perceptron)
+
+63
+00:02:03,760 --> 00:02:05,160
+They also took some of
+他们也采取了一些
+
+64
+00:02:05,250 --> 00:02:06,700
+their algorithms that were fairly
+比较公正的算法
+
+65
+00:02:07,090 --> 00:02:08,620
+out back then but somewhat less
+但是现在比较少用了
+
+66
+00:02:08,830 --> 00:02:10,600
+used now so when the
+因此当
+
+67
+00:02:10,750 --> 00:02:11,980
+algorithm also very similar
+这样一个类似于
+
+68
+00:02:12,380 --> 00:02:13,410
+to which is a regression
+回归问题
+
+69
+00:02:13,660 --> 00:02:15,580
+but different in some ways, much
+但在一些方法上又有所不同
+
+70
+00:02:16,140 --> 00:02:18,220
+used somewhat less, used
+过去用得比较多
+
+71
+00:02:18,480 --> 00:02:19,380
+not too much right now
+但现在用得不太多
+
+72
+00:02:20,180 --> 00:02:21,180
+took what's called a memory based
+一种基于内存的学习算法
+
+73
+00:02:21,430 --> 00:02:23,240
+learning algorithm again used somewhat less now.
+现在也用得比较少了
+
+74
+00:02:23,610 --> 00:02:25,940
+But I'll talk a little bit about that later.
+但是我稍后会讨论一点
+
+75
+00:02:26,230 --> 00:02:27,230
+And they used a naive based
+而且他们用了一个朴素算法
+
+76
+00:02:27,690 --> 00:02:29,240
+algorithm, which is something they'll
+这个我们将在
+
+77
+00:02:29,410 --> 00:02:32,380
+actually talk about in this course.
+这门课程中讨论到
+
+78
+00:02:32,690 --> 00:02:34,400
+The exact algorithms of these details aren't important.
+这些具体算法的细节不那么重要
+
+79
+00:02:35,040 --> 00:02:36,080
+Think of this as, you know, just picking
+想象一下
+
+80
+00:02:36,430 --> 00:02:40,380
+four different classification algorithms and really the exact algorithms aren't important.
+仅仅选取四种分类算法 且这些具体算法并不重要
+
+81
+00:02:41,980 --> 00:02:42,990
+But what they did was they
+他们所做的就是
+
+82
+00:02:43,140 --> 00:02:44,160
+varied the training set size
+改变了训练数据集的大小
+
+83
+00:02:44,500 --> 00:02:45,390
+and tried out these learning
+并尝将这些学习
+
+84
+00:02:45,450 --> 00:02:47,070
+algorithms on the range of
+算法用于不同大小的
+
+85
+00:02:47,210 --> 00:02:49,640
+training set sizes and that's the result they got.
+训练数据集中 这就是他们得到的结果
+
+86
+00:02:50,300 --> 00:02:51,310
+And the trends are very
+这些趋势非常明显
+
+87
+00:02:51,470 --> 00:02:53,170
+clear right first most of
+首先大部分
+
+88
+00:02:53,290 --> 00:02:55,470
+these outer rooms give remarkably similar performance.
+外部空间都具有相似的性能
+
+89
+00:02:56,200 --> 00:02:57,760
+And second, as the training
+其次 随着训练
+
+90
+00:02:58,150 --> 00:02:59,760
+set size increases, on the
+数据集的增大
+
+91
+00:02:59,860 --> 00:03:00,970
+horizontal axis is the
+在横轴上代表
+
+92
+00:03:01,280 --> 00:03:02,510
+training set size in millions
+数以百万计的训练集大小
+
+93
+00:03:04,070 --> 00:03:05,360
+go from you know a
+从成百上千到
+
+94
+00:03:05,420 --> 00:03:07,440
+hundred thousand up to a
+成千上万
+
+95
+00:03:07,720 --> 00:03:09,060
+thousand million that is a
+这是一个
+
+96
+00:03:09,330 --> 00:03:10,980
+billion training examples. The
+十亿大小训练集的例子
+
+97
+00:03:11,090 --> 00:03:11,860
+performance of the algorithms
+这些算法的性能
+
+98
+00:03:12,870 --> 00:03:15,360
+all pretty much monotonically increase
+也都对应地增强了
+
+99
+00:03:15,740 --> 00:03:16,610
+and the fact that if
+事实上 如果
+
+100
+00:03:16,650 --> 00:03:18,600
+you pick any algorithm may be
+你选择任意一个算法 可能是
+
+101
+00:03:19,000 --> 00:03:21,320
+pick a "inferior algorithm" but
+选择了一个"劣质的"算法
+
+102
+00:03:21,490 --> 00:03:22,650
+if you give that "inferior
+如果你给这个
+
+103
+00:03:23,190 --> 00:03:26,150
+algorithm" more data, then from
+劣质算法更多的数据 那么
+
+104
+00:03:26,390 --> 00:03:27,570
+these examples, it looks like
+从这些列子中看起来的话 它看上去
+
+105
+00:03:27,670 --> 00:03:30,330
+it will most likely beat even a "superior algorithm".
+很有可能会其他算法更好 甚至会比"优质算法"更好
+
+106
+00:03:32,200 --> 00:03:33,270
+So since this original study
+由于这项原始的研究
+
+107
+00:03:33,720 --> 00:03:35,850
+which is very influential, there's been
+非常具有影响力 因此已经有
+
+108
+00:03:36,360 --> 00:03:37,500
+a range of many different
+一系列许多不同的
+
+109
+00:03:37,830 --> 00:03:39,020
+studies showing similar results
+研究显示了类似的结果
+
+110
+00:03:39,550 --> 00:03:40,840
+that show that many different learning
+这些结果表明 许多不同的学习
+
+111
+00:03:41,150 --> 00:03:42,270
+algorithms you know tend
+算法有时倾向于
+
+112
+00:03:42,630 --> 00:03:44,290
+to, can sometimes, depending on
+依赖一些细节
+
+113
+00:03:44,460 --> 00:03:46,060
+details, can give pretty similar ranges
+并表现出相当相似
+
+114
+00:03:46,490 --> 00:03:48,320
+of performance, but what can
+的性能 但是真正能提高性能的
+
+115
+00:03:48,520 --> 00:03:51,570
+really drive performance is you can give the algorithm a ton of training data.
+是你能够给予一个算法大量的训练数据
+
+116
+00:03:53,190 --> 00:03:54,640
+And this is, results like these
+像这样的结果
+
+117
+00:03:55,010 --> 00:03:56,030
+has led to a saying in
+引起了一种
+
+118
+00:03:56,130 --> 00:03:57,360
+machine learning that often in
+在机器学习中
+
+119
+00:03:57,510 --> 00:03:58,920
+machine learning it's not
+的常用说法:
+
+120
+00:03:59,180 --> 00:04:00,460
+who has the best algorithm that
+并不是拥有最好算法的人能成功
+
+121
+00:04:00,600 --> 00:04:01,720
+wins, it's who has the
+而是拥有最多数据
+
+122
+00:04:02,810 --> 00:04:04,260
+most data So when is this
+的人能成功 那么
+
+123
+00:04:04,460 --> 00:04:06,240
+true and when is this not true?
+这种情况什么时候是真 什么时候是假呢?
+
+124
+00:04:06,560 --> 00:04:07,710
+Because we have a learning
+因为我们有一个学习算法
+
+125
+00:04:07,850 --> 00:04:09,000
+algorithm for which this is
+这种算法在这种情况下是真的
+
+126
+00:04:09,150 --> 00:04:10,590
+true then getting a
+那么得到大量
+
+127
+00:04:10,820 --> 00:04:11,970
+lot of data is often
+的数据通常是
+
+128
+00:04:12,620 --> 00:04:13,830
+maybe the best way to ensure
+保证我们
+
+129
+00:04:14,180 --> 00:04:15,700
+that we have an algorithm with
+具有一个高性能算法
+
+130
+00:04:15,900 --> 00:04:17,360
+very high performance rather than
+的最佳方式 而不是
+
+131
+00:04:17,520 --> 00:04:20,080
+you know, debating worrying about exactly which of these items to use.
+去争辩使用什么样的算法
+
+132
+00:04:21,710 --> 00:04:23,200
+Let's try to lay out a
+假如有这样一些假设
+
+133
+00:04:23,330 --> 00:04:25,130
+set of assumptions under which having
+在这些假设下有
+
+134
+00:04:25,660 --> 00:04:28,230
+a massive training set we think will be able to help.
+大量我们认为有用的训练集
+
+135
+00:04:29,780 --> 00:04:31,310
+Let's assume that in our
+我们假设在我们的
+
+136
+00:04:31,410 --> 00:04:33,210
+machine learning problem, the features
+机器学习问题中 特征值
+
+137
+00:04:34,080 --> 00:04:36,560
+x have sufficient information with which
+x包含了足够的信息
+
+138
+00:04:36,830 --> 00:04:38,600
+we can use to predict y accurately.
+这些信息可以帮助我们用来准确地预测 y
+
+139
+00:04:40,380 --> 00:04:41,490
+For example, if we take
+例如 如果我们采用了
+
+140
+00:04:41,790 --> 00:04:44,860
+the confusable words all of them that we had on the previous slide.
+我们前一张幻灯片里的所有容易混淆的词
+
+141
+00:04:45,740 --> 00:04:47,040
+Let's say that it features x
+假如说它能够描述 x
+
+142
+00:04:47,520 --> 00:04:48,360
+capture what are the surrounding
+捕捉到需要填写
+
+143
+00:04:49,090 --> 00:04:51,620
+words around the blank that we're trying to fill in.
+的周围空白的词语
+
+144
+00:04:51,840 --> 00:04:53,630
+So the features capture then we
+那么特征捕捉到之后
+
+145
+00:04:54,220 --> 00:04:56,440
+want to have, sometimes for breakfast I have black eggs.
+我们就希望有 有些时候是“早饭我吃了__鸡蛋”
+
+146
+00:04:57,350 --> 00:04:58,220
+Then yeah that is pretty
+那么这就有
+
+147
+00:04:58,480 --> 00:04:59,970
+much information to tell me
+大量的信息来告诉我
+
+148
+00:05:00,170 --> 00:05:01,050
+that the word I want
+中间我需要填的
+
+149
+00:05:01,420 --> 00:05:03,640
+in the middle is TWO and that
+词是“两个” (two)
+
+150
+00:05:03,850 --> 00:05:06,640
+is not word TO and its not the word TOO. So
+而不是单词 to 或 too
+
+151
+00:05:09,650 --> 00:05:11,270
+the features capture, you know, one
+因此特征捕捉
+
+152
+00:05:11,620 --> 00:05:13,390
+of these surrounding words then that
+哪怕是周围词语中的一个词
+
+153
+00:05:13,560 --> 00:05:15,360
+gives me enough information to pretty
+就能够给我足够的信息来
+
+154
+00:05:15,790 --> 00:05:17,640
+unambiguously decide what is
+明确
+
+155
+00:05:17,780 --> 00:05:18,830
+the label y or in
+标签 y 是什么
+
+156
+00:05:19,300 --> 00:05:20,190
+other words what is the word
+换句话说
+
+157
+00:05:20,750 --> 00:05:21,760
+that I should be using to fill
+从这三组易混淆的词中
+
+158
+00:05:22,100 --> 00:05:23,520
+in that blank out of
+我应该选什么
+
+159
+00:05:23,930 --> 00:05:25,610
+this set of three confusable words.
+词来填空
+
+160
+00:05:27,110 --> 00:05:28,320
+So that's an example what
+这就是一个例子
+
+161
+00:05:28,460 --> 00:05:29,840
+the future ex has sufficient information
+特征值 x 有充足的信息
+
+162
+00:05:30,410 --> 00:05:32,270
+for specific y. For
+来确定 y
+
+163
+00:05:32,470 --> 00:05:33,240
+a counter example.
+举一个反例
+
+164
+00:05:34,690 --> 00:05:36,010
+Consider a problem of predicting
+设想一个
+
+165
+00:05:36,470 --> 00:05:38,090
+the price of a house from
+房子价格的问题
+
+166
+00:05:38,340 --> 00:05:39,330
+only the size of the
+房子只有大小信息
+
+167
+00:05:39,390 --> 00:05:40,350
+house and from no other
+没有其他特征
+
+168
+00:05:42,060 --> 00:05:42,060
+features. So
+那么
+
+169
+00:05:42,820 --> 00:05:43,890
+if you imagine I tell you
+如果我告诉你
+
+170
+00:05:44,150 --> 00:05:45,270
+that a house is, you
+这个房子有
+
+171
+00:05:45,370 --> 00:05:48,100
+know, 500 square feet but I don't give you any other features.
+500平方英尺 但是我没有告诉你其他的特征信息
+
+172
+00:05:48,530 --> 00:05:49,520
+I don't tell you that the
+我也不告诉你这个
+
+173
+00:05:49,590 --> 00:05:51,990
+house is in an expensive part of the city.
+房子位于这个城市房价比较昂贵的区域
+
+174
+00:05:52,590 --> 00:05:53,710
+Or if I don't tell you that
+如果我也不告诉你
+
+175
+00:05:53,840 --> 00:05:55,290
+the house, the number of
+这所房子的
+
+176
+00:05:55,500 --> 00:05:57,030
+rooms in the house, or how
+房间数量 或者
+
+177
+00:05:57,180 --> 00:05:58,400
+nicely furnished the house
+它里面陈设了多漂亮的家具
+
+178
+00:05:58,790 --> 00:06:00,540
+is, or whether the house is new or old.
+或这个房子是新的还是旧的
+
+179
+00:06:01,090 --> 00:06:02,290
+If I don't tell you anything other
+我不告诉你其他任何信息
+
+180
+00:06:02,540 --> 00:06:03,360
+than that this is a
+除了这个房子
+
+181
+00:06:03,520 --> 00:06:05,440
+500 square foot house, well there's
+有500平方英尺以外
+
+182
+00:06:05,720 --> 00:06:07,160
+so many other factors that would
+然而除此之外还有许多其他因素
+
+183
+00:06:07,340 --> 00:06:08,280
+affect the price of a
+会影响房子
+
+184
+00:06:08,470 --> 00:06:09,940
+house other than just the size
+的价格 不仅仅是房子的大小
+
+185
+00:06:10,320 --> 00:06:11,330
+of a house that if all
+如果所有
+
+186
+00:06:11,440 --> 00:06:12,910
+you know is the size, it's actually
+你所知道的只有房子的尺寸 那么事实上
+
+187
+00:06:13,050 --> 00:06:14,610
+very difficult to predict the price accurately.
+是很难准确预测它的价格的
+
+188
+00:06:16,220 --> 00:06:16,860
+So that would be a counter
+这是这对于这个假设
+
+189
+00:06:17,280 --> 00:06:18,230
+example to this assumption
+的一个反例
+
+190
+00:06:18,880 --> 00:06:20,300
+that the features have sufficient information
+假设是特征能够提供足够的信息
+
+191
+00:06:20,800 --> 00:06:23,260
+to predict the price to the desired level of accuracy.
+来在需要的水平上预测出价格
+
+192
+00:06:24,090 --> 00:06:25,180
+The way I think about testing
+我想检测这样一个
+
+193
+00:06:25,540 --> 00:06:26,730
+this assumption, one way I
+假设的方式是
+
+194
+00:06:26,940 --> 00:06:29,160
+often think about it is, how often I ask myself.
+去思考它 "多久一次?" 我这样问自己
+
+195
+00:06:30,260 --> 00:06:31,660
+Given the input features x,
+给定一个输入特征向量 x
+
+196
+00:06:32,180 --> 00:06:33,320
+given the features, given the
+给定这些特征值
+
+197
+00:06:33,380 --> 00:06:35,440
+same information available as well as learning algorithm.
+也给定了相同的可用的信息和学习算法
+
+198
+00:06:36,510 --> 00:06:38,690
+If we were to go to human expert in this domain.
+如果我们去人类专家这个领域
+
+199
+00:06:39,680 --> 00:06:41,570
+Can a human experts actually or
+一个人类学家能够
+
+200
+00:06:41,720 --> 00:06:43,160
+can human expert confidently predict
+准确或自信的预测出
+
+201
+00:06:43,490 --> 00:06:45,390
+the value of y. For this
+y值吗?
+
+202
+00:06:45,630 --> 00:06:46,730
+first example if we go
+第一个例子 如果我们去
+
+203
+00:06:46,980 --> 00:06:49,420
+to, you know an expert human English speaker.
+找你认识的一个人类专家 说英语的
+
+204
+00:06:49,810 --> 00:06:51,260
+You go to someone that
+你找到了一个
+
+205
+00:06:51,390 --> 00:06:53,740
+speaks English well, right, then
+英语说得很好的人 好 那么
+
+206
+00:06:53,940 --> 00:06:55,230
+a human expert in English
+一个说英语的人类学家
+
+207
+00:06:55,940 --> 00:06:57,260
+just read most people like
+刚好理解大部分像
+
+208
+00:06:57,450 --> 00:06:59,730
+you and me will probably we
+你和我这样的人 可能会
+
+209
+00:07:00,160 --> 00:07:01,080
+would probably be able to
+我们就可能能够
+
+210
+00:07:01,170 --> 00:07:02,370
+predict what word should go in
+预测出在这种情况下
+
+211
+00:07:02,620 --> 00:07:03,960
+here, to a good English
+该使用什么样的语言 对于一个英语说得
+
+212
+00:07:04,290 --> 00:07:05,550
+speaker can predict this well,
+好的人来说 可以预测得很好
+
+213
+00:07:05,850 --> 00:07:06,710
+and so this gives me confidence
+因此这就给了我信心
+
+214
+00:07:07,470 --> 00:07:08,640
+that x allows us to predict
+x能够让我们准确
+
+215
+00:07:08,810 --> 00:07:10,550
+y accurately, but in contrast
+地预测y 但是与此相反
+
+216
+00:07:11,240 --> 00:07:13,550
+if we go to an expert in human prices.
+如果我们去找一个价格上的专家
+
+217
+00:07:14,040 --> 00:07:16,390
+Like maybe an expert realtor, right, someone
+如 可能是一个房地产经纪人专家
+
+218
+00:07:16,950 --> 00:07:18,090
+who sells houses for a living.
+以买房子为生的人
+
+219
+00:07:18,610 --> 00:07:19,450
+If I just tell them the
+如果我只是告诉他们
+
+220
+00:07:19,550 --> 00:07:20,440
+size of a house and I
+一个房子的大小
+
+221
+00:07:20,530 --> 00:07:21,860
+tell them what the price
+并告诉他们房子的价格
+
+222
+00:07:22,240 --> 00:07:23,410
+is well even an expert
+那么即使是房价评估或售房
+
+223
+00:07:23,600 --> 00:07:25,210
+in pricing or selling
+方面的专家
+
+224
+00:07:25,600 --> 00:07:26,520
+houses wouldn't be able
+他也不能
+
+225
+00:07:26,550 --> 00:07:28,280
+to tell me and so this is fine that
+告诉我(房子的预测价格)
+
+226
+00:07:29,000 --> 00:07:31,060
+for the housing price example knowing
+所以在房子价格的例子中
+
+227
+00:07:31,600 --> 00:07:33,300
+only the size doesn't give
+只知道房子的大小并不能
+
+228
+00:07:33,460 --> 00:07:34,960
+me enough information to predict
+给我足够的信息来预测
+
+229
+00:07:35,920 --> 00:07:36,870
+the price of the house.
+房子的价格
+
+230
+00:07:37,690 --> 00:07:39,890
+So, let's say, this assumption holds.
+如果这个假设是成立的
+
+231
+00:07:41,200 --> 00:07:42,650
+Let's see then, when having
+那么让我们来看一看 当有
+
+232
+00:07:43,040 --> 00:07:44,230
+a lot of data could help.
+大量的数据时是有帮助的
+
+233
+00:07:45,020 --> 00:07:46,370
+Suppose the features have enough
+假设特征值有
+
+234
+00:07:46,650 --> 00:07:47,870
+information to predict the
+足够的信息来预测
+
+235
+00:07:48,050 --> 00:07:49,380
+value of y.
+y值
+
+236
+00:07:49,540 --> 00:07:50,750
+And let's suppose we use a
+假设我们使用一种
+
+237
+00:07:50,960 --> 00:07:52,380
+learning algorithm with a
+需要大量参数的
+
+238
+00:07:52,600 --> 00:07:54,430
+large number of parameters so
+学习算法
+
+239
+00:07:54,580 --> 00:07:56,020
+maybe logistic regression or linear
+也许是有很多特征值的
+
+240
+00:07:56,280 --> 00:07:58,090
+regression with a large number of features.
+逻辑回归或线性回归
+
+241
+00:07:58,550 --> 00:07:59,490
+Or one thing that I sometimes
+或者我有时做的一件事
+
+242
+00:07:59,950 --> 00:08:00,740
+do, one thing that I often
+我经常做的一件事
+
+243
+00:08:00,960 --> 00:08:03,300
+do actually is using neural network with many hidden units.
+实际上是在利用许多隐藏单元的神经网络
+
+244
+00:08:03,860 --> 00:08:05,230
+That would be another learning
+那又将使另外一个
+
+245
+00:08:05,500 --> 00:08:07,420
+algorithm with a lot of parameters.
+带有很多参数的学习算法了
+
+246
+00:08:08,470 --> 00:08:10,280
+So these are all powerful learning
+这些都是强大的学习
+
+247
+00:08:10,350 --> 00:08:12,350
+algorithms with a lot of parameters that
+算法 它们有很多参数
+
+248
+00:08:13,040 --> 00:08:14,810
+can fit very complex functions.
+这些参数可以拟合非常复杂的函数
+
+249
+00:08:16,750 --> 00:08:17,550
+So, I'm going to call these, I'm
+因此我要调用这些
+
+250
+00:08:18,630 --> 00:08:19,720
+going to think of these as
+我将把这些算法想象成
+
+251
+00:08:20,510 --> 00:08:21,970
+low-bias algorithms because you
+低偏差算法 因为
+
+252
+00:08:22,140 --> 00:08:23,540
+know we can fit very complex functions
+我们能够拟合非常复杂的函数
+
+253
+00:08:25,480 --> 00:08:26,740
+and because we have
+而且因为我们有
+
+254
+00:08:27,260 --> 00:08:28,920
+a very powerful learning algorithm,
+非常强大的学习算法
+
+255
+00:08:29,380 --> 00:08:30,590
+they can fit very complex functions.
+这些学习算法能够拟合非常复杂的函数
+
+256
+00:08:31,680 --> 00:08:33,470
+Chances are, if we
+很有可能 如果我们
+
+257
+00:08:34,070 --> 00:08:35,790
+run these algorithms on
+用这些数据运行
+
+258
+00:08:35,940 --> 00:08:37,250
+the data sets, it will
+这些算法 这种算法
+
+259
+00:08:37,430 --> 00:08:38,770
+be able to fit the training
+很好地拟合训练集
+
+260
+00:08:39,200 --> 00:08:40,680
+set well, and so
+因此
+
+261
+00:08:40,940 --> 00:08:43,230
+hopefully the training error will be slow.
+训练误差就会很低
+
+262
+00:08:44,520 --> 00:08:45,520
+Now let's say, we use
+现在假设我们使用了
+
+263
+00:08:46,020 --> 00:08:47,790
+a massive, massive training set,
+非常非常大的训练集
+
+264
+00:08:48,190 --> 00:08:49,370
+in that case, if we
+在这种情况下 如果我们
+
+265
+00:08:49,430 --> 00:08:51,460
+have a huge training set, then
+有一个庞大的训练集 那么
+
+266
+00:08:51,630 --> 00:08:53,490
+hopefully even though we have a lot of parameters
+尽管我们希望有很多参数
+
+267
+00:08:53,760 --> 00:08:56,080
+but if the training set is sort of even much
+但是如果训练集比
+
+268
+00:08:56,360 --> 00:08:57,450
+larger than the number of
+比参数的数量大一些
+
+269
+00:09:02,590 --> 00:09:03,660
+Right because we have such a
+因为我们有如此
+
+270
+00:09:03,710 --> 00:09:05,680
+massive training set and by
+庞大的训练集
+
+271
+00:09:06,070 --> 00:09:07,870
+unlikely to overfit what that
+并且不太可能过度拟合
+
+272
+00:09:08,070 --> 00:09:09,090
+means is that the training
+也就是说训练
+
+273
+00:09:09,390 --> 00:09:10,860
+error will hopefully be
+误差有希望
+
+274
+00:09:11,050 --> 00:09:13,270
+close to the test error.
+接近测试误差
+
+275
+00:09:13,960 --> 00:09:15,160
+Finally putting these two
+最后把这两个
+
+276
+00:09:15,350 --> 00:09:16,770
+together that the train
+放在一起 训练集
+
+277
+00:09:16,990 --> 00:09:18,590
+set error is small and
+误差很小 而
+
+278
+00:09:18,700 --> 00:09:19,870
+the test set error is close
+测试集误差又接近
+
+279
+00:09:20,360 --> 00:09:22,290
+to the training error what
+训练误差
+
+280
+00:09:22,460 --> 00:09:24,510
+this two together imply is
+这两个就意味着
+
+281
+00:09:24,710 --> 00:09:26,630
+that hopefully the test set error
+测试集的误差
+
+282
+00:09:27,780 --> 00:09:28,450
+will also be small.
+也会很小
+
+283
+00:09:30,000 --> 00:09:32,610
+Another way to
+另一种考虑
+
+284
+00:09:32,720 --> 00:09:33,930
+think about this is that
+这个问题的方式是
+
+285
+00:09:34,700 --> 00:09:35,740
+in order to have a high
+为了有一个高
+
+286
+00:09:35,880 --> 00:09:37,630
+performance learning algorithm we want
+性能的学习算法 我们希望
+
+287
+00:09:37,930 --> 00:09:40,470
+it not to have high bias and not to have high variance.
+它不要有高的偏差和方差
+
+288
+00:09:42,060 --> 00:09:43,270
+So the bias problem we're going
+因此偏差问题 我么将
+
+289
+00:09:43,350 --> 00:09:44,700
+to address by making sure we
+通过确保
+
+290
+00:09:44,880 --> 00:09:45,910
+have a learning algorithm with many
+有一个具有很多
+
+291
+00:09:46,170 --> 00:09:47,670
+parameters and so that
+参数的学习算法来保证 以便
+
+292
+00:09:47,840 --> 00:09:48,930
+gives us a low bias alorithm
+我们能够得到一个较低偏差的算法
+
+293
+00:09:50,110 --> 00:09:51,460
+and by using a
+并且通过用
+
+294
+00:09:51,610 --> 00:09:53,240
+very large training set, this ensures
+非常大的训练集来保证
+
+295
+00:09:53,760 --> 00:09:55,590
+that we don't have a variance problem here.
+我们在此没有方差问题
+
+296
+00:09:55,840 --> 00:09:57,280
+So hopefully our algorithm will
+我们的算法将
+
+297
+00:09:57,430 --> 00:09:59,100
+have no variance and so
+没有方差 并且
+
+298
+00:09:59,340 --> 00:10:00,940
+is by pulling these two together,
+通过将这两个值放在一起
+
+299
+00:10:01,870 --> 00:10:02,830
+that we end up with a low
+我们最终可以得到一个低
+
+300
+00:10:02,900 --> 00:10:03,990
+bias and a low variance
+误差和低方差
+
+301
+00:10:04,990 --> 00:10:06,920
+learning algorithm and this
+的学习算法 这
+
+302
+00:10:07,140 --> 00:10:08,300
+allows us to do well
+使得我们能够
+
+303
+00:10:08,710 --> 00:10:10,150
+on the test set.
+很好地测试测试数据集
+
+304
+00:10:10,430 --> 00:10:12,140
+And fundamentally it's a key ingredients
+从根本上来说 这是一个关键
+
+305
+00:10:13,020 --> 00:10:14,560
+of assuming that the features
+的假设:特征值
+
+306
+00:10:14,940 --> 00:10:16,750
+have enough information and we
+有足够的信息量 且我们
+
+307
+00:10:16,900 --> 00:10:17,960
+have a rich class of functions
+有一类很好的函数
+
+308
+00:10:18,400 --> 00:10:19,580
+that's why it guarantees low bias,
+这是为什么能保证低误差的关键所在
+
+309
+00:10:20,760 --> 00:10:21,750
+and then it having a massive
+它有大量的
+
+310
+00:10:22,110 --> 00:10:25,010
+training set that that's what guarantees more variance.
+训练数据集 这能保证得到更多的方差值
+
+311
+00:10:27,150 --> 00:10:28,310
+So this gives us a
+因此这给我们提出了
+
+312
+00:10:28,410 --> 00:10:29,820
+set of conditions rather hopefully
+一些可能的条件
+
+313
+00:10:30,090 --> 00:10:31,610
+some understanding of what's the
+一些对于
+
+314
+00:10:31,870 --> 00:10:33,730
+sort of problem where if
+问题的认识 如果
+
+315
+00:10:33,860 --> 00:10:34,790
+you have a lot of data
+你有大量的数据
+
+316
+00:10:34,960 --> 00:10:36,150
+and you train a learning
+而且你训练了一种
+
+317
+00:10:36,380 --> 00:10:38,930
+algorithm with lot of parameters, that might
+带有很多参数的学习算法 那么这将
+
+318
+00:10:39,120 --> 00:10:39,870
+be a good way to give
+会是一个很好的方式来提供
+
+319
+00:10:40,060 --> 00:10:42,490
+a high performance learning algorithm
+一个高性能的学习算法
+
+320
+00:10:43,480 --> 00:10:44,140
+and really, I think the key test that
+我觉得关键的测试
+
+321
+00:10:44,230 --> 00:10:45,520
+I often ask myself are
+我常常问自己 是
+
+322
+00:10:45,820 --> 00:10:47,100
+first, can a human experts
+首先 一个人类专家
+
+323
+00:10:47,200 --> 00:10:48,360
+look at the features x and
+看到了特征值x 且
+
+324
+00:10:48,880 --> 00:10:49,890
+confidently predict the value of
+能很有信心的预测出
+
+325
+00:10:50,030 --> 00:10:51,080
+y. Because that's sort of
+y值吗? 因为这可以
+
+326
+00:10:51,210 --> 00:10:53,050
+a certification that y
+证明y
+
+327
+00:10:53,320 --> 00:10:55,040
+can be predicted accurately from
+可以根据特征值
+
+328
+00:10:55,140 --> 00:10:57,010
+the features x and second,
+x被准确地预测出来 其次
+
+329
+00:10:57,510 --> 00:10:58,630
+can we actually get a large
+我们实际上能得到一组
+
+330
+00:10:58,820 --> 00:11:00,150
+training set, and train the
+庞大的训练集并且在这个
+
+331
+00:11:00,350 --> 00:11:01,470
+learning algorithm with a lot of
+训练集中训练一个有
+
+332
+00:11:01,540 --> 00:11:03,290
+parameters in the training
+很多参数的学习算法吗?
+
+333
+00:11:03,520 --> 00:11:04,420
+set and if you can't do both
+如果你不能做到这两者
+
+334
+00:11:04,870 --> 00:11:06,300
+then that's more often give
+那么这更多时候
+
+335
+00:11:06,460 --> 00:11:08,570
+you a very kind performance learning algorithm.
+你会得到一个性能很好的学习算法
+
diff --git a/srt/12 - 1 - Optimization Objective (15 min).srt b/srt/12 - 1 - Optimization Objective (15 min).srt
new file mode 100644
index 00000000..54d1220c
--- /dev/null
+++ b/srt/12 - 1 - Optimization Objective (15 min).srt
@@ -0,0 +1,2166 @@
+1
+00:00:00,570 --> 00:00:01,860
+By now, you see the range
+到目前为止
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,090 --> 00:00:04,860
+of different learning algorithms. Within supervised learning,
+你已经见过一系列不同的学习算法
+
+3
+00:00:05,280 --> 00:00:06,810
+the performance of many supervised learning algorithms
+在监督学习中 许多学习算法的性能
+
+4
+00:00:07,300 --> 00:00:08,830
+will be pretty similar
+都非常类似
+
+5
+00:00:09,650 --> 00:00:10,740
+and when that is less more often be
+因此 重要的不是
+
+6
+00:00:11,040 --> 00:00:12,140
+whether you use
+你该选择使用
+
+7
+00:00:12,440 --> 00:00:13,450
+learning algorithm A or learning algorithm
+学习算法A还是学习算法B
+
+8
+00:00:13,660 --> 00:00:15,020
+B but when that
+而更重要的是
+
+9
+00:00:15,190 --> 00:00:16,190
+is small there will often be
+应用这些算法时
+
+10
+00:00:16,360 --> 00:00:17,100
+things like the amount of data
+所创建的
+
+11
+00:00:17,330 --> 00:00:18,530
+you are creating these algorithms on.
+大量数据
+
+12
+00:00:19,280 --> 00:00:20,480
+That's always your skill in
+在应用这些算法时
+
+13
+00:00:20,600 --> 00:00:21,990
+applying this algorithms. Seems like
+表现情况通常依赖于你的水平 比如
+
+14
+00:00:23,150 --> 00:00:24,480
+your choice of the features that you
+你为学习算法
+
+15
+00:00:24,660 --> 00:00:25,790
+designed to give the learning
+所设计的
+
+16
+00:00:26,010 --> 00:00:27,030
+algorithms and how you
+特征量的选择
+
+17
+00:00:27,200 --> 00:00:28,530
+choose the regularization parameter
+以及如何选择正则化参数
+
+18
+00:00:29,190 --> 00:00:31,690
+and things like that. But there's
+诸如此类的事
+
+19
+00:00:31,930 --> 00:00:34,110
+one more algorithm that is very
+还有一个
+
+20
+00:00:34,380 --> 00:00:35,460
+powerful and its very
+更加强大的算法
+
+21
+00:00:35,580 --> 00:00:37,400
+widely used both within industry
+广泛的应用于
+
+22
+00:00:38,050 --> 00:00:39,590
+and in Academia. And that's called the
+工业界和学术界
+
+23
+00:00:39,850 --> 00:00:41,080
+support vector machine, and compared to
+它被称为支持向量机(Support Vector Machine)
+
+24
+00:00:41,200 --> 00:00:42,600
+both the logistic regression and neural networks, the
+与逻辑回归和神经网络相比
+
+25
+00:00:46,770 --> 00:00:48,190
+support vector machine or the SVM
+支持向量机 或者简称SVM
+
+26
+00:00:48,440 --> 00:00:50,120
+sometimes gives a cleaner
+在学习复杂的非线性方程时
+
+27
+00:00:50,890 --> 00:00:52,040
+and sometimes more powerful way
+提供了一种更为清晰
+
+28
+00:00:52,480 --> 00:00:53,250
+of learning complex nonlinear functions.
+更加强大的方式
+
+29
+00:00:54,970 --> 00:00:56,300
+And so I'd like to take the next
+因此 在接下来的视频中
+
+30
+00:00:56,480 --> 00:00:57,850
+videos to
+我会探讨
+
+31
+00:00:57,890 --> 00:01:00,100
+talk about that.
+这一算法
+
+32
+00:01:00,400 --> 00:01:01,400
+Later in this course, I will do
+在稍后的课程中
+
+33
+00:01:01,540 --> 00:01:02,710
+a quick survey of the range
+我也会对监督学习算法
+
+34
+00:01:03,100 --> 00:01:04,340
+of different supervised learning algorithms just
+进行简要的总结
+
+35
+00:01:05,200 --> 00:01:06,790
+to very briefly describe them
+当然 仅仅是作简要描述
+
+36
+00:01:07,430 --> 00:01:08,870
+but the support vector machine, given
+但对于支持向量机
+
+37
+00:01:09,370 --> 00:01:10,840
+its popularity and how popular
+鉴于该算法的强大和受欢迎度
+
+38
+00:01:10,980 --> 00:01:11,920
+it is, this will be
+在本课中 我会花许多时间来讲解它
+
+39
+00:01:12,060 --> 00:01:13,800
+the last of the supervised learning algorithms
+它也是我们所介绍的
+
+40
+00:01:14,440 --> 00:01:16,710
+that I'll spend a significant amount of time on in this course.
+最后一个监督学习算法
+
+41
+00:01:19,260 --> 00:01:20,440
+As with our development of ever
+正如我们之前开发的学习算法
+
+42
+00:01:20,670 --> 00:01:22,280
+learning algorithms, we are going to start by talking
+我们从
+
+43
+00:01:22,650 --> 00:01:23,940
+about the optimization objective,
+优化目标开始
+
+44
+00:01:24,750 --> 00:01:26,420
+so let's get started on
+那么 我们开始学习
+
+45
+00:01:26,620 --> 00:01:27,920
+this algorithm.
+这个算法
+
+46
+00:01:29,420 --> 00:01:30,960
+In order to describe the support
+为了描述支持向量机
+
+47
+00:01:31,270 --> 00:01:32,570
+vector machine, I'm actually going
+事实上 我将会
+
+48
+00:01:32,610 --> 00:01:34,020
+to start with logistic regression
+从逻辑回归开始
+
+49
+00:01:34,990 --> 00:01:35,990
+and show how we can modify
+展示我们如何
+
+50
+00:01:36,820 --> 00:01:37,630
+it a bit and get what
+一点一点修改
+
+51
+00:01:38,240 --> 00:01:39,260
+is essentially the support vector machine.
+来得到本质上的支持向量机
+
+52
+00:01:40,290 --> 00:01:41,740
+So, in logistic regression we have
+那么 在逻辑回归中
+
+53
+00:01:41,950 --> 00:01:43,680
+our familiar form of
+我们已经熟悉了
+
+54
+00:01:43,740 --> 00:01:46,000
+the hypotheses there and the
+这里的假设函数形式
+
+55
+00:01:46,450 --> 00:01:48,590
+sigmoid activation function shown on the right.
+和右边的S型激励函数
+
+56
+00:01:50,390 --> 00:01:51,330
+And in order to explain
+然而 为了解释
+
+57
+00:01:51,800 --> 00:01:52,650
+some of the math, I'm going
+一些数学知识
+
+58
+00:01:52,850 --> 00:01:55,960
+to use z to denote failure of transpose x here.
+我将用 z 表示 θ 转置乘以 x
+
+59
+00:01:57,620 --> 00:01:58,650
+Now let's think about what
+现在 让一起考虑下
+
+60
+00:01:58,900 --> 00:02:01,150
+we will like the logistic regression to do.
+我们想要逻辑回归做什么
+
+61
+00:02:01,270 --> 00:02:02,800
+If we have an example with
+如果有一个
+
+62
+00:02:03,070 --> 00:02:04,360
+y equals 1, and by
+y=1 的样本
+
+63
+00:02:04,540 --> 00:02:05,480
+this I mean an example
+我的意思是
+
+64
+00:02:06,100 --> 00:02:07,100
+in either a training set
+不管是在训练集中 或是在测试集中
+
+65
+00:02:07,440 --> 00:02:11,780
+or the test set, you know, order cross valuation set where y is equal to 1 then
+又或者在交叉验证集中 总之是 y=1
+
+66
+00:02:12,030 --> 00:02:14,300
+we are sort of hoping that h of x will be close to 1.
+现在 我们希望 h(x) 趋近1
+
+67
+00:02:14,380 --> 00:02:15,760
+So, right, we are hoping to
+因为 我们想要
+
+68
+00:02:16,140 --> 00:02:17,330
+correctly classify that example
+正确地将此样本分类
+
+69
+00:02:18,520 --> 00:02:19,390
+and what, having h of x
+这就意味着
+
+70
+00:02:19,510 --> 00:02:20,710
+close to 1, what that means
+当 h(x) 趋近于1时
+
+71
+00:02:20,850 --> 00:02:22,080
+is that theta transpose x
+θ 转置乘以 x
+
+72
+00:02:22,360 --> 00:02:23,380
+must be much larger
+应当
+
+73
+00:02:23,770 --> 00:02:24,990
+than 0, so there's
+远大于0 这里的
+
+74
+00:02:25,330 --> 00:02:26,680
+greater than, greater than sign, that
+大于大于号 >>
+
+75
+00:02:26,900 --> 00:02:28,220
+means much, much greater
+意思是
+
+76
+00:02:28,530 --> 00:02:30,880
+than 0 and that's
+远远大于0
+
+77
+00:02:31,120 --> 00:02:32,840
+because it is z, that is theta transpose
+这是因为 由于 z 表示
+
+78
+00:02:32,960 --> 00:02:34,750
+x
+θ 转置乘以 x
+
+79
+00:02:34,940 --> 00:02:35,910
+is when z is much bigger than
+当 z 远大于
+
+80
+00:02:36,010 --> 00:02:37,240
+0, is far to the
+0时 即到了
+
+81
+00:02:37,310 --> 00:02:39,060
+right of this figure that, you know, the
+该图的右边 你不难发现
+
+82
+00:02:39,360 --> 00:02:42,430
+output of logistic regression becomes close to 1.
+此时逻辑回归的输出将趋近于1
+
+83
+00:02:44,510 --> 00:02:45,580
+Conversely, if we have
+相反地 如果我们
+
+84
+00:02:45,630 --> 00:02:46,870
+an example where y is
+有另一个样本
+
+85
+00:02:47,000 --> 00:02:48,470
+equal to 0 then what
+即 y=0
+
+86
+00:02:48,750 --> 00:02:49,620
+were hoping for is that the hypothesis
+我们希望假设函数
+
+87
+00:02:50,420 --> 00:02:51,890
+will output the value to
+的输出值
+
+88
+00:02:52,010 --> 00:02:53,850
+close to 0 and that corresponds to theta transpose x
+将趋近于0 这对应于 θ 转置乘以 x
+
+89
+00:02:54,650 --> 00:02:55,990
+or z pretty much
+或者就是 z 会
+
+90
+00:02:56,250 --> 00:02:57,080
+less than 0 because
+远小于0
+
+91
+00:02:57,440 --> 00:02:58,720
+that corresponds to
+因为对应的
+
+92
+00:02:59,160 --> 00:03:01,250
+hypothesis of outputting a value close to 0. If
+假设函数的输出值趋近0
+
+93
+00:03:02,180 --> 00:03:03,590
+you look at the
+如果你进一步
+
+94
+00:03:03,760 --> 00:03:06,300
+cost function of logistic regression, what
+观察逻辑回归的代价函数
+
+95
+00:03:06,440 --> 00:03:07,470
+you find is that each
+你会发现
+
+96
+00:03:07,710 --> 00:03:09,400
+example x, y,
+每个样本 (x, y)
+
+97
+00:03:10,190 --> 00:03:11,520
+contributes a term like
+都会为总代价函数
+
+98
+00:03:11,700 --> 00:03:14,320
+this to the overall cost function.
+增加这里的一项
+
+99
+00:03:15,450 --> 00:03:16,900
+All right. So, for the overall cost function, we usually, we will
+因此 对于总代价函数
+
+100
+00:03:17,390 --> 00:03:18,600
+also have a sum over
+通常会有对所有的训练样本求和
+
+101
+00:03:18,890 --> 00:03:21,430
+all the training examples and 1 over m term.
+并且这里还有一个1/m项
+
+102
+00:03:22,450 --> 00:03:22,740
+But this
+但是
+
+103
+00:03:23,240 --> 00:03:24,150
+expression here. That's
+在逻辑回归中
+
+104
+00:03:24,470 --> 00:03:25,450
+the term that a single
+这里的这一项
+
+105
+00:03:26,220 --> 00:03:28,490
+training example contributes to
+就是表示一个训练样本
+
+106
+00:03:28,780 --> 00:03:31,550
+the overall objective function for logistic regression.
+所对应的表达式
+
+107
+00:03:33,250 --> 00:03:34,350
+Now, if I take the definition
+现在 如果我将完整定义的
+
+108
+00:03:35,190 --> 00:03:36,120
+for the full of my hypothesis
+假设函数
+
+109
+00:03:37,030 --> 00:03:38,700
+and plug it in, over here,
+代入这里
+
+110
+00:03:39,790 --> 00:03:40,710
+the one I get is that
+那么 我们就会得到
+
+111
+00:03:40,920 --> 00:03:43,130
+each training example contributes this term, right?
+每一个训练样本都影响这一项
+
+112
+00:03:44,270 --> 00:03:45,480
+Ignoring the 1 over
+现在 先忽略1/m这一项
+
+113
+00:03:45,720 --> 00:03:47,130
+m but it contributes that term
+但是这一项
+
+114
+00:03:47,470 --> 00:03:49,470
+to be my overall cost function for
+是影响整个总代价函数
+
+115
+00:03:49,680 --> 00:03:52,260
+logistic regression. Now let's
+中的这一项的
+
+116
+00:03:52,820 --> 00:03:54,310
+consider the 2 cases
+现在 一起来考虑两种情况
+
+117
+00:03:54,700 --> 00:03:55,970
+of when y is equal to 1
+一种是y等于1的情况
+
+118
+00:03:56,040 --> 00:03:57,250
+and when y is equal to 0.
+一种是y等于0的情况
+
+119
+00:03:57,820 --> 00:03:59,040
+In the first case, let's
+在第一种情况中
+
+120
+00:03:59,170 --> 00:04:00,260
+suppose that y is equal
+假设 y
+
+121
+00:04:00,520 --> 00:04:01,960
+to 1. In that case,
+等于1 此时
+
+122
+00:04:02,440 --> 00:04:04,850
+only this first row in
+在目标函数中
+
+123
+00:04:04,980 --> 00:04:06,910
+the objective matters because this
+只需有第一项起作用
+
+124
+00:04:07,130 --> 00:04:08,830
+1 minus y term will be equal
+因为y等于1时
+
+125
+00:04:09,210 --> 00:04:10,510
+to 0 if y is equal to 1.
+(1-y) 项将等于0
+
+126
+00:04:13,640 --> 00:04:15,340
+So, when y is equal to
+因此 当在y等于
+
+127
+00:04:15,400 --> 00:04:17,130
+1 when in an example, x,
+1的样本中时
+
+128
+00:04:17,310 --> 00:04:18,240
+y when y is equal to
+即在 (x, y) 中
+
+129
+00:04:18,420 --> 00:04:19,840
+1, what we get is this
+y等于1
+
+130
+00:04:20,010 --> 00:04:21,340
+term minus log 1
+我们得到
+
+131
+00:04:21,560 --> 00:04:22,370
+over 1 plus e to the negative
+-log(1/(1+e^z) ) 这样一项
+
+132
+00:04:22,860 --> 00:04:25,050
+z. Where, similar to the last slide,
+这里同上一张幻灯片一致
+
+133
+00:04:25,330 --> 00:04:26,480
+I'm using z to denote
+我用 z
+
+134
+00:04:27,490 --> 00:04:29,430
+data transpose x. And
+表示 θ 转置乘以 x
+
+135
+00:04:29,640 --> 00:04:30,930
+of course, in the cost we
+当然 在代价函数中
+
+136
+00:04:31,040 --> 00:04:32,130
+actually had this minus y
+y 前面有负号
+
+137
+00:04:32,380 --> 00:04:33,490
+but we just said that you know, if y is
+我们只是这样表示
+
+138
+00:04:33,540 --> 00:04:34,790
+equal to 1. So that's equal
+如果y等于1 代价函数中
+
+139
+00:04:35,020 --> 00:04:36,500
+to 1. I just simplified it
+这一项也等于1 这样做
+
+140
+00:04:36,580 --> 00:04:38,010
+a way in the expression that
+是为了简化
+
+141
+00:04:38,300 --> 00:04:39,820
+I have written down here.
+此处的表达式
+
+142
+00:04:41,950 --> 00:04:43,030
+And if we plot this function,
+如果画出
+
+143
+00:04:43,580 --> 00:04:45,080
+as a function of z, what
+关于 z 的函数
+
+144
+00:04:45,230 --> 00:04:46,320
+you find is that you get
+你会看到
+
+145
+00:04:47,160 --> 00:04:48,630
+this curve shown on the
+左下角的
+
+146
+00:04:49,220 --> 00:04:50,290
+lower left of this line
+这条曲线
+
+147
+00:04:51,120 --> 00:04:52,290
+and thus we also see
+我们同样可以看到
+
+148
+00:04:52,640 --> 00:04:53,590
+that when z is equal
+当 z 增大时
+
+149
+00:04:53,860 --> 00:04:54,930
+to large that is to when
+也就是相当于
+
+150
+00:04:55,440 --> 00:04:56,930
+theta transpose x is large
+θ 转置乘以x 增大时
+
+151
+00:04:57,800 --> 00:04:58,790
+that corresponds to a
+z 对应的值
+
+152
+00:04:58,890 --> 00:04:59,900
+value of z that gives
+会变的非常小
+
+153
+00:05:00,100 --> 00:05:02,050
+us a very small value, a very
+对整个代价函数而言
+
+154
+00:05:03,000 --> 00:05:04,650
+small contribution to the
+影响也非常小
+
+155
+00:05:04,740 --> 00:05:06,120
+cost function and this
+这也就解释了
+
+156
+00:05:06,270 --> 00:05:07,790
+kind of explains why when
+为什么
+
+157
+00:05:08,260 --> 00:05:10,020
+logistic regression sees a positive example
+逻辑回归在观察到
+
+158
+00:05:10,640 --> 00:05:12,200
+with y equals 1 it tries
+正样本 y=1 时
+
+159
+00:05:12,860 --> 00:05:14,220
+to set theta transpose x
+试图将 θ^T*x
+
+160
+00:05:14,650 --> 00:05:15,810
+to be very large because that
+设置的非常大
+
+161
+00:05:15,980 --> 00:05:17,440
+corresponds to this term
+因为 在代价函数中的
+
+162
+00:05:18,300 --> 00:05:21,490
+in a cost function being small. Now, to build
+这一项会变的非常小 现在
+
+163
+00:05:21,760 --> 00:05:23,640
+the Support Vector Machine, here is what we are going to do.
+开始建立支持向量机 我们从这里开始
+
+164
+00:05:23,740 --> 00:05:24,780
+We are going to take this cost function, this
+我们会从这个代价函数开始
+
+165
+00:05:25,740 --> 00:05:29,420
+minus log 1 over 1 plus e to the negative z and modify it a little bit.
+也就是 -log(1/(1+e^z)) 一点一点修改
+
+166
+00:05:31,270 --> 00:05:32,450
+Let me take this point
+让我取这里的
+
+167
+00:05:33,590 --> 00:05:35,120
+1 over here and let
+z=1 点
+
+168
+00:05:36,150 --> 00:05:37,200
+me draw the course function that I'm going to
+我先画出将要用的代价函数
+
+169
+00:05:37,280 --> 00:05:38,510
+use, the new cost function is gonna
+新的代价函数将会
+
+170
+00:05:38,870 --> 00:05:40,320
+be flat from here on out
+水平的从这里到右边 (图外)
+
+171
+00:05:42,000 --> 00:05:42,980
+and then I'm going to draw something
+然后我再画一条
+
+172
+00:05:43,170 --> 00:05:45,720
+that grows as a straight
+同逻辑回归
+
+173
+00:05:46,280 --> 00:05:49,230
+line similar to logistic
+非常相似的直线
+
+174
+00:05:49,530 --> 00:05:50,710
+regression but this is going to be the
+但是 在这里
+
+175
+00:05:50,950 --> 00:05:52,740
+straight line in this posh inning.
+是一条直线
+
+176
+00:05:52,870 --> 00:05:55,040
+So the curve that
+也就是 我用紫红色画的曲线
+
+177
+00:05:55,190 --> 00:05:57,580
+I just drew in magenta. The curve that I just drew purple and magenta.
+就是这条紫红色的曲线
+
+178
+00:05:58,090 --> 00:05:59,580
+So, it's a pretty
+那么 到了这里
+
+179
+00:05:59,730 --> 00:06:01,840
+close approximation to the
+已经非常接近
+
+180
+00:06:02,310 --> 00:06:03,480
+cost function used by logistic
+逻辑回归中
+
+181
+00:06:03,900 --> 00:06:05,060
+regression except that it is
+使用的代价函数了
+
+182
+00:06:05,130 --> 00:06:06,590
+now made out of two line segments. This
+只是这里是由两条线段组成
+
+183
+00:06:07,490 --> 00:06:09,110
+is flat potion on the right
+即位于右边的水平部分
+
+184
+00:06:09,430 --> 00:06:11,590
+and then this is a straight
+和位于左边的
+
+185
+00:06:11,860 --> 00:06:14,340
+line portion on the
+直线部分
+
+186
+00:06:14,630 --> 00:06:16,460
+left. And don't worry too much about the slope of the straight line portion.
+先别过多的考虑左边直线部分的斜率
+
+187
+00:06:16,930 --> 00:06:18,930
+It doesn't matter that
+这并不是很重要
+
+188
+00:06:19,180 --> 00:06:21,630
+much but that's the
+但是 这里
+
+189
+00:06:21,730 --> 00:06:23,910
+new cost function we're going to use where y is equal to 1 and
+我们将使用的新的代价函数 是在 y=1 的前提下的
+
+190
+00:06:24,100 --> 00:06:25,240
+you can imagine you
+你也许能想到
+
+191
+00:06:25,340 --> 00:06:28,310
+should do something pretty similar to logistic regression
+这应该能做同逻辑回归中类似的事情
+
+192
+00:06:29,190 --> 00:06:30,470
+but it turns out that this will give the
+但事实上
+
+193
+00:06:30,750 --> 00:06:32,630
+support vector machine computational advantage
+在之后的的优化问题中
+
+194
+00:06:33,690 --> 00:06:34,470
+that will give us later on
+这会变得更坚定 并且为支持向量机
+
+195
+00:06:34,890 --> 00:06:37,190
+an easier optimization problem, that
+带来计算上的优势
+
+196
+00:06:37,570 --> 00:06:39,670
+will be easier for stock trades and so on.
+例如 更容易计算股票交易的问题 等等
+
+197
+00:06:41,050 --> 00:06:41,990
+We just talked about the case
+目前 我们只是讨论了
+
+198
+00:06:42,120 --> 00:06:43,300
+of y equals to 1. The other
+y=1 的情况 另外
+
+199
+00:06:43,370 --> 00:06:44,420
+case is if y is equal
+一种情况是当
+
+200
+00:06:44,660 --> 00:06:46,120
+to 0. In that case,
+y=0 时 此时
+
+201
+00:06:47,090 --> 00:06:47,870
+if you look at the cost
+如果你仔细观察代价函数
+
+202
+00:06:48,510 --> 00:06:49,880
+then only this second term
+只留下了第二项
+
+203
+00:06:50,220 --> 00:06:51,470
+will apply because the first
+因为第一项
+
+204
+00:06:51,610 --> 00:06:52,800
+term goes a way
+被消除了
+
+205
+00:06:53,330 --> 00:06:54,490
+where if y is equal to 0 then nearly
+如果当 y=0 时 那么
+
+206
+00:06:54,640 --> 00:06:55,670
+it was 0 here. So
+这一项也就是0了
+
+207
+00:06:55,800 --> 00:06:56,640
+your left only with the second
+所以上述表达式
+
+208
+00:06:57,040 --> 00:06:58,100
+term of the expression above
+只留下了第二项
+
+209
+00:06:59,150 --> 00:07:00,600
+and so the cost of an
+因此 这个样本的代价
+
+210
+00:07:00,710 --> 00:07:01,960
+example or the contribution
+或是代价函数的贡献
+
+211
+00:07:01,980 --> 00:07:03,620
+of the cost function is going
+将会由
+
+212
+00:07:03,840 --> 00:07:04,850
+to be given by this term
+这一项表示
+
+213
+00:07:05,180 --> 00:07:06,620
+over here and if you
+并且 如果你将
+
+214
+00:07:06,710 --> 00:07:07,860
+plug that as a function
+这一项作为 z 的函数
+
+215
+00:07:08,560 --> 00:07:09,750
+z. So, I have here z on the
+那么 这里就会得到横轴z
+
+216
+00:07:09,990 --> 00:07:11,290
+horizontal axis, you end up
+现在 你完成了
+
+217
+00:07:11,400 --> 00:07:13,370
+with this group and for
+支持向量机中的部分内容
+
+218
+00:07:13,470 --> 00:07:14,570
+the support vector machine, once
+同样地 再来一次
+
+219
+00:07:14,790 --> 00:07:15,540
+again we're going to replace
+我们要替代这一条蓝色的线
+
+220
+00:07:16,250 --> 00:07:17,860
+this blue line with something similar
+用相似的方法
+
+221
+00:07:18,380 --> 00:07:20,060
+and see if we can
+如果我们用
+
+222
+00:07:20,670 --> 00:07:22,220
+replace it with a new cost, there
+一个新的代价函数来代替
+
+223
+00:07:23,480 --> 00:07:24,910
+is flat out here. There's 0 out here and then
+即这条从0点开始的水平直线
+
+224
+00:07:25,020 --> 00:07:26,230
+it grows as a straight
+然后是一条斜线
+
+225
+00:07:27,900 --> 00:07:27,900
+line like so.
+像这样
+
+226
+00:07:29,070 --> 00:07:29,710
+So, let me give
+那么 现在让我给
+
+227
+00:07:29,860 --> 00:07:31,950
+these two functions names.
+这两个方程命名
+
+228
+00:07:32,830 --> 00:07:33,910
+This function on the left, I'm
+左边的函数
+
+229
+00:07:34,080 --> 00:07:35,850
+going to call
+我称之为
+
+230
+00:07:37,140 --> 00:07:38,360
+cost subscript 1 of z.
+cost1(z)
+
+231
+00:07:38,800 --> 00:07:39,650
+And this function on the right, I'm going to call
+同时 在右边函数
+
+232
+00:07:39,870 --> 00:07:41,700
+cost subscript 0
+我称它为 cost0(z)
+
+233
+00:07:42,980 --> 00:07:44,260
+of z. And the subscript just refers
+这里的下标是指
+
+234
+00:07:44,860 --> 00:07:46,740
+to the cost corresponding to
+在代价函数中
+
+235
+00:07:47,070 --> 00:07:48,570
+y is equal to 1 versus y is equal to 0.
+对应的 y=1 和 y=0 的情况
+
+236
+00:07:49,930 --> 00:07:51,470
+Armed with these definitions, we are
+拥有了这些定义后
+
+237
+00:07:51,580 --> 00:07:54,730
+now ready to build the support vector machine.
+现在 我们就开始构建支持向量机
+
+238
+00:07:55,000 --> 00:07:56,030
+Here is the cost function
+这是我们在逻辑回归中使用
+
+239
+00:07:56,300 --> 00:07:57,230
+j of theta that we have for
+代价函数 J(θ)
+
+240
+00:07:57,340 --> 00:07:58,440
+logistic regression. In case
+也许这个方程
+
+241
+00:07:58,770 --> 00:07:59,760
+this the equation looks a
+看起来不是非常熟悉
+
+242
+00:07:59,860 --> 00:08:02,220
+bit unfamiliar is because previously we
+这是因为 之前
+
+243
+00:08:02,360 --> 00:08:04,270
+had a minor sign outside, but
+有个负号在方程外面
+
+244
+00:08:04,800 --> 00:08:05,820
+here what I did was I
+但是 这里我所做的是
+
+245
+00:08:05,930 --> 00:08:07,010
+instead moved the minor signs
+将负号移到了
+
+246
+00:08:07,610 --> 00:08:08,800
+inside this expression. So it
+表达式的里面
+
+247
+00:08:08,950 --> 00:08:09,920
+just, you know, makes it look a
+这样做使得方程
+
+248
+00:08:10,080 --> 00:08:12,970
+little more different. For the support
+看起来有些不同
+
+249
+00:08:13,340 --> 00:08:14,670
+vector machine what we are
+对于支持向量机而言
+
+250
+00:08:14,730 --> 00:08:16,550
+going to do is essentially take
+实质上 我们要将这一
+
+251
+00:08:16,820 --> 00:08:18,460
+this, and replace this with
+替换为
+
+252
+00:08:19,080 --> 00:08:21,260
+cost 1 of z,
+cost1(z)
+
+253
+00:08:21,740 --> 00:08:23,060
+that is cost 1 of theta transpose x.
+也就是cost1(θ^T*x)
+
+254
+00:08:23,320 --> 00:08:25,240
+I'm going
+同样地 我也将
+
+255
+00:08:25,300 --> 00:08:27,250
+to take this and replace it with cost
+这一项替换为cost0(z)
+
+256
+00:08:28,640 --> 00:08:31,420
+0 of z. This is cost 0 of
+也就是代价
+
+257
+00:08:32,060 --> 00:08:34,090
+theta transpose x
+cost0(θ^T*x)
+
+258
+00:08:35,030 --> 00:08:36,680
+where the cost 1 function is
+这里的代价函数 cost1
+
+259
+00:08:37,000 --> 00:08:37,740
+what we had on the previous
+就是之前所提到的那条线
+
+260
+00:08:38,170 --> 00:08:39,930
+line that looks like this and
+看起来是这样的
+
+261
+00:08:40,890 --> 00:08:42,540
+the cost 0 function, again what
+此外 代价函数 cost0
+
+262
+00:08:42,680 --> 00:08:44,420
+we have on the previous line that
+也是上面所介绍过的那条线
+
+263
+00:08:44,910 --> 00:08:46,730
+looks like this.
+看起来是这样
+
+264
+00:08:46,860 --> 00:08:48,080
+So, what we have for the support
+因此 对于支持向量机
+
+265
+00:08:48,420 --> 00:08:49,360
+vector machine is an minimizationminimalization
+我们得到了这里的最小化问题
+
+266
+00:08:49,910 --> 00:08:52,220
+problem of one of
+即 1/m 乘以
+
+267
+00:08:52,340 --> 00:08:55,210
+1 over m, sum over
+从1加到第 m 个
+
+268
+00:08:55,400 --> 00:08:58,650
+my training examples of y(i) times cost
+训练样本 y(i) 再乘以
+
+269
+00:08:59,090 --> 00:09:01,050
+1 of theta transpose
+cost1(θ^T*x(i))
+
+270
+00:09:01,300 --> 00:09:03,910
+x(i) plus 1 minus
+加上1减去
+
+271
+00:09:04,650 --> 00:09:06,640
+y(i) times cost zero of theta transpose x(i).
+y(i) 乘以 cost0(θ^T*x(i))
+
+272
+00:09:07,220 --> 00:09:10,490
+And then
+然后
+
+273
+00:09:10,990 --> 00:09:13,470
+plus my usual regularization
+再加上正则化参数
+
+274
+00:09:17,120 --> 00:09:23,280
+parameter like so. Now
+像这样
+
+275
+00:09:24,130 --> 00:09:25,280
+by convention for the Support
+现在 按照支持向量机的惯例
+
+276
+00:09:25,570 --> 00:09:27,610
+Vector Machine, we actually write
+事实上 我们的书写
+
+277
+00:09:27,790 --> 00:09:29,510
+things slightly differently. We parametrize
+会稍微有些不同
+
+278
+00:09:30,570 --> 00:09:31,690
+this just very slightly differently.
+代价函数的参数表示也会稍微有些不同
+
+279
+00:09:31,850 --> 00:09:33,720
+First, we're going
+首先 我们要
+
+280
+00:09:34,130 --> 00:09:35,360
+to get rid of the 1
+除去 1/m 这一项
+
+281
+00:09:35,670 --> 00:09:36,860
+over m terms and this just,
+当然 这仅仅是
+
+282
+00:09:37,130 --> 00:09:38,480
+this just happens
+仅仅是由于
+
+283
+00:09:38,770 --> 00:09:40,380
+to be a slightly different convention that people
+人们使用支持向量机时
+
+284
+00:09:40,640 --> 00:09:41,930
+use for support vector machines
+对比于逻辑回归而言
+
+285
+00:09:42,140 --> 00:09:43,400
+compared to for logistic regression. But here's what
+不同的习惯所致 但这里我所说的意思是
+
+286
+00:09:44,160 --> 00:09:46,180
+I mean, you know, what I'm going
+你知道 我将要做的是
+
+287
+00:09:46,670 --> 00:09:47,960
+to do is I am just gonna get
+仅仅除去
+
+288
+00:09:48,210 --> 00:09:49,450
+rid of this 1 over m
+1/m 这一项
+
+289
+00:09:50,070 --> 00:09:50,860
+terms and this should give
+但是 这也会得出
+
+290
+00:09:51,070 --> 00:09:53,030
+me the same optimal value for theta, right.
+同样的θ最优值 好的
+
+291
+00:09:53,620 --> 00:09:55,020
+Because 1 over m is just a constant.
+因为 1/m 仅是个常量
+
+292
+00:09:56,420 --> 00:09:57,550
+So, you know, whether I solve
+因此 你知道
+
+293
+00:09:57,930 --> 00:09:59,410
+this minimization problem with 1
+在这个最小化问题中
+
+294
+00:09:59,580 --> 00:10:00,430
+over m in front or not,
+无论前面是否有 1/m 这一项
+
+295
+00:10:01,100 --> 00:10:02,010
+I should end up with the same
+最终我所得到的
+
+296
+00:10:02,490 --> 00:10:03,510
+optimal value of theta.
+最优值θ都是一样的
+
+297
+00:10:04,590 --> 00:10:05,450
+Here is what I mean, to
+这里我的意思是
+
+298
+00:10:05,590 --> 00:10:07,000
+give you a concrete example,
+先给你举一个实例
+
+299
+00:10:08,010 --> 00:10:09,170
+suppose I had a minimization
+假定有一最小化问题
+
+300
+00:10:09,370 --> 00:10:11,040
+problem that you know minimize over
+即要求当 (u-5)^2+1
+
+301
+00:10:11,460 --> 00:10:14,700
+a real number u of u minus 5 squared,
+取得最小值时的 u 值
+
+302
+00:10:17,080 --> 00:10:18,540
+plus 1, right. Well, the
+好的
+
+303
+00:10:18,620 --> 00:10:20,040
+minimum of this happens, happens
+这时最小值为
+
+304
+00:10:20,440 --> 00:10:21,900
+to know the minimum of this is u equals 5.
+当 u=5 时取得最小值
+
+305
+00:10:23,090 --> 00:10:23,980
+Now if I want to take
+现在 如果我们想要
+
+306
+00:10:24,120 --> 00:10:25,800
+this objective function and multiply
+将这个目标函数
+
+307
+00:10:26,430 --> 00:10:28,240
+it by 10, so
+乘上常数10
+
+308
+00:10:28,770 --> 00:10:29,850
+here my minimization problem is
+这里我的最小化问题就变成了
+
+309
+00:10:30,570 --> 00:10:33,510
+minimum of u of 10, u minus
+求使得 10×(u-5)^2+10
+
+310
+00:10:33,960 --> 00:10:35,270
+5 squared plus 10.
+最小的值u
+
+311
+00:10:35,920 --> 00:10:37,650
+Well the value of u
+然而 这里的u值
+
+312
+00:10:37,670 --> 00:10:40,350
+that minimizes this is still u equals 5, right.
+使得这里最小的u值仍为5
+
+313
+00:10:40,940 --> 00:10:42,540
+So, multiplying something that
+因此 将一些常数
+
+314
+00:10:42,640 --> 00:10:44,160
+you are minimizing over by some
+乘以你的最小化项
+
+315
+00:10:44,360 --> 00:10:45,540
+constant, 10 in this case,
+例如 这里的常数10
+
+316
+00:10:46,010 --> 00:10:47,710
+it does not change the value
+这并不会改变
+
+317
+00:10:48,290 --> 00:10:51,450
+of u that gives us, that minimizes this function.
+最小化该方程时得到u值
+
+318
+00:10:52,650 --> 00:10:53,680
+So the same way what I've
+因此 这里我所做的
+
+319
+00:10:53,830 --> 00:10:55,120
+done by crossing out this
+是删去常量m
+
+320
+00:10:55,430 --> 00:10:56,940
+m is, all I
+也是相同的
+
+321
+00:10:56,990 --> 00:10:58,770
+am doing is multiplying my objective
+现在 我将目标函数
+
+322
+00:10:59,240 --> 00:11:00,650
+function by some constant m
+乘上一个常量 m
+
+323
+00:11:00,940 --> 00:11:01,920
+and it doesn't change the value
+并不会改变
+
+324
+00:11:02,360 --> 00:11:04,310
+of theta that achieves the minimum.
+取得最小值时的 θ 值
+
+325
+00:11:05,480 --> 00:11:07,190
+The second bit of notational change,
+第二点概念上的变化
+
+326
+00:11:07,470 --> 00:11:08,560
+we're just designating the most
+我们只是指在使用
+
+327
+00:11:08,740 --> 00:11:10,630
+standard convention, when using as
+支持向量机时 一些如下的标准惯例
+
+328
+00:11:11,170 --> 00:11:13,250
+the SVM, instead of logistic regression as a following.
+而不是逻辑回归
+
+329
+00:11:14,210 --> 00:11:15,880
+So, for logistic regression, we had
+因此 对于逻辑回归
+
+330
+00:11:16,520 --> 00:11:18,270
+two terms to our objective function.
+在目标函数中 我们有两项
+
+331
+00:11:19,340 --> 00:11:20,500
+The first is this term
+第一个是这一项
+
+332
+00:11:20,920 --> 00:11:22,020
+which is the cost that comes
+是来自于
+
+333
+00:11:22,450 --> 00:11:23,910
+from the training set and the
+训练样本的代价
+
+334
+00:11:23,990 --> 00:11:25,730
+second is this term, which
+第二个是这一项
+
+335
+00:11:26,140 --> 00:11:28,330
+is the regularization term
+是我们的正则化项
+
+336
+00:11:28,380 --> 00:11:29,460
+and what we had, we had to
+我们不得不去
+
+337
+00:11:29,870 --> 00:11:30,900
+control the trade off between
+用这一项来平衡
+
+338
+00:11:31,270 --> 00:11:32,600
+these by saying, you know, that we
+这就相当于
+
+339
+00:11:32,810 --> 00:11:34,760
+wanted to minimize A plus
+我们想要最小化 A 加上
+
+340
+00:11:35,760 --> 00:11:38,240
+and then my regularization parameter lambda,
+正则化参数 λ
+
+341
+00:11:39,370 --> 00:11:42,280
+and then times some other
+然后乘以
+
+342
+00:11:42,430 --> 00:11:43,430
+term B, right? Where as I
+其他项 B 对吧?
+
+343
+00:11:43,510 --> 00:11:44,970
+am using A to denote
+这里的 A 表示
+
+344
+00:11:45,080 --> 00:11:46,160
+this first term, and I am
+这里的第一项
+
+345
+00:11:46,390 --> 00:11:48,280
+using B to denote that
+同时 我用 B 表示
+
+346
+00:11:48,490 --> 00:11:49,560
+second term, may be without the
+第二项 但不包括 λ
+
+347
+00:11:49,650 --> 00:11:52,440
+lambda, and instead of
+我们不是
+
+348
+00:11:53,140 --> 00:11:56,090
+prioritizing this as A plus lambda B,
+优化这里的 A+λ×B
+
+349
+00:11:56,270 --> 00:11:57,950
+we could, and so what we
+我们所做的
+
+350
+00:11:58,200 --> 00:11:59,670
+did was by setting different
+是通过设置
+
+351
+00:12:00,010 --> 00:12:02,210
+values for this regularization parameter lambda.
+不同正则参数 λ 达到优化目的
+
+352
+00:12:03,060 --> 00:12:04,180
+We could trade off the relative
+这样 我们就能够权衡
+
+353
+00:12:04,670 --> 00:12:05,720
+way between how much we
+对应的项
+
+354
+00:12:05,900 --> 00:12:06,780
+want to fit the training set well,
+是使得训练样本拟合的更好
+
+355
+00:12:07,560 --> 00:12:09,390
+as minimizing A, versus how
+即最小化 A
+
+356
+00:12:09,510 --> 00:12:12,930
+much we care about keeping the values of the parameters small.
+还是保证正则参数足够小
+
+357
+00:12:13,470 --> 00:12:14,530
+So that would be
+也即是
+
+358
+00:12:14,640 --> 00:12:16,170
+for the parameters B. For the Support
+对于B项而言
+
+359
+00:12:16,380 --> 00:12:17,620
+Vector Machine, just by convention
+但对于支持向量机 按照惯例
+
+360
+00:12:18,250 --> 00:12:19,150
+we're going to use a different
+我们将使用一个不同的参数
+
+361
+00:12:19,570 --> 00:12:21,960
+parameter. So instead of using lambda here
+为了替换这里使用的 λ
+
+362
+00:12:22,180 --> 00:12:23,220
+to control the relative
+来权衡这两项
+
+363
+00:12:23,640 --> 00:12:24,730
+waiting between you know, the first and second terms,
+你知道 就是第一项和第二项
+
+364
+00:12:24,810 --> 00:12:26,260
+we are
+我们
+
+365
+00:12:26,300 --> 00:12:27,370
+still going to use a different
+依照惯例使用
+
+366
+00:12:27,710 --> 00:12:29,070
+parameter which by convention
+一个不同的参数
+
+367
+00:12:29,290 --> 00:12:31,530
+is called C and
+称为C
+
+368
+00:12:31,730 --> 00:12:33,550
+we instead are going to minimize C times
+同时改为优化目标
+
+369
+00:12:34,430 --> 00:12:39,160
+A plus B. So
+C×A+B
+
+370
+00:12:39,380 --> 00:12:41,210
+for logistic regression if we
+因此 在逻辑回归中
+
+371
+00:12:41,340 --> 00:12:42,730
+send a very large value of
+如果给定 λ
+
+372
+00:12:42,990 --> 00:12:43,980
+lambda, that means to give
+一个非常大的值
+
+373
+00:12:44,260 --> 00:12:45,970
+B a very high weight. Here
+意味着给予B更大的权重
+
+374
+00:12:46,590 --> 00:12:47,640
+is that if we set C
+而这里 就对应于将C
+
+375
+00:12:47,960 --> 00:12:49,750
+to be a very small value. That
+设定为非常小的值
+
+376
+00:12:50,070 --> 00:12:51,510
+corresponds to giving B
+那么 相应的将会给 B
+
+377
+00:12:51,800 --> 00:12:53,530
+much larger weight than C than A.
+比给 A 更大的权重
+
+378
+00:12:54,610 --> 00:12:55,730
+So this is just a different
+因此 这只是
+
+379
+00:12:55,890 --> 00:12:57,330
+way of controlling the trade off
+一种不同的方式来控制这种权衡
+
+380
+00:12:57,630 --> 00:12:58,970
+or just a different way of
+或者一种不同的方法
+
+381
+00:12:59,060 --> 00:13:01,530
+parametrizing how much we care about optimizing the first term versus how much we care about optimizing the second term.
+即用参数来决定 是更关心第一项的优化 还是更关心第二项的优化
+
+382
+00:13:05,290 --> 00:13:06,250
+And if you want you can
+当然你也可以
+
+383
+00:13:06,380 --> 00:13:07,620
+think of this as the parameter
+把这里的参数C
+
+384
+00:13:08,180 --> 00:13:09,580
+C playing a role
+考虑成 1/λ
+
+385
+00:13:09,800 --> 00:13:11,570
+similar to 1 over
+同 1/λ 所扮演的
+
+386
+00:13:11,890 --> 00:13:13,900
+lambda and it's
+角色相同
+
+387
+00:13:14,080 --> 00:13:16,100
+not that it's two equations
+并且这两个方程
+
+388
+00:13:16,720 --> 00:13:17,900
+or these two expressions will be
+或这两个表达式并不相同
+
+389
+00:13:18,000 --> 00:13:19,500
+equal, it's equals 1 over
+因为 C 等于 1/λ
+
+390
+00:13:19,650 --> 00:13:21,350
+lambda and it's not that these two equations or these two expressions will be equal. It's equals t 1 over lambda. That's not the case where it bothers that if C is equal to 1 over lambda then
+但是也并不全是这样 如果当C等于 1/λ 时
+
+391
+00:13:22,260 --> 00:13:24,510
+these
+这两个
+
+392
+00:13:24,710 --> 00:13:26,670
+two optimization objectives should
+优化目标应当
+
+393
+00:13:26,940 --> 00:13:28,260
+give you the same value, same
+得到相同的值
+
+394
+00:13:28,500 --> 00:13:29,460
+optimal value of theta. So
+相同的最优值θ 因此
+
+395
+00:13:30,350 --> 00:13:31,180
+just filling that
+就用它们来代替
+
+396
+00:13:31,400 --> 00:13:33,030
+in. I'm gonna cross out lambda here
+那么 我现在删掉这里的 λ
+
+397
+00:13:33,730 --> 00:13:34,940
+and write in the constant C there.
+并且用常数 C 来代替这里
+
+398
+00:13:35,030 --> 00:13:37,930
+So,that's gives
+因此 这就得到了
+
+399
+00:13:38,170 --> 00:13:40,830
+us our overall optimization objective
+在支持向量机中
+
+400
+00:13:41,280 --> 00:13:42,650
+function for the Support Vector
+我们的整个优化目标函数
+
+401
+00:13:42,900 --> 00:13:43,970
+Machine and where you
+然后最小化
+
+402
+00:13:44,080 --> 00:13:46,200
+minimize that function then what
+这个目标函数
+
+403
+00:13:46,340 --> 00:13:47,410
+you have is the parameters
+得到 SVM 学习到的
+
+404
+00:13:48,230 --> 00:13:52,800
+learned by SVM. Finally on
+参数C 最后
+
+405
+00:13:52,940 --> 00:13:54,690
+light of logistic regression, the Support
+有别于逻辑回归
+
+406
+00:13:54,840 --> 00:13:56,110
+Vector Machine doesn't output the
+有别于逻辑回归
+
+407
+00:13:56,220 --> 00:13:57,850
+probability. Instead what we
+输出的概率 在这里
+
+408
+00:13:57,970 --> 00:13:58,910
+have is, we have this cost
+我们的代价函数
+
+409
+00:13:59,190 --> 00:14:00,600
+function which we minimize to
+当最小化代价函数
+
+410
+00:14:00,730 --> 00:14:02,770
+get the parameters theta and what
+获得参数θ时
+
+411
+00:14:02,910 --> 00:14:03,900
+the Support Vector Machine does,
+支持向量机所做的是
+
+412
+00:14:05,130 --> 00:14:05,970
+is it just makes the prediction
+它来直接预测
+
+413
+00:14:07,050 --> 00:14:08,650
+of y being equal 1
+y的值等于1
+
+414
+00:14:08,690 --> 00:14:10,390
+or 0 directly. So the hypothesis,
+还是等于0 因此 这个假设函数
+
+415
+00:14:11,310 --> 00:14:12,920
+where I predict, 1, if
+会预测1
+
+416
+00:14:14,150 --> 00:14:15,630
+theta transpose x is
+当 θ^T*x 大于
+
+417
+00:14:15,890 --> 00:14:17,680
+greater than or equal to
+或者等于0时
+
+418
+00:14:18,230 --> 00:14:20,060
+0 and I'll predict 0 otherwise.
+或者等于0时
+
+419
+00:14:20,320 --> 00:14:21,560
+And so, having learned the
+所以学习
+
+420
+00:14:21,610 --> 00:14:23,010
+parameters theta, this is
+参数 θ
+
+421
+00:14:23,360 --> 00:14:25,980
+the form of the hypothesis for the support vector machine.
+就是支持向量机假设函数的形式
+
+422
+00:14:26,850 --> 00:14:27,870
+So, that was a
+那么 这就是
+
+423
+00:14:27,980 --> 00:14:29,670
+mathematical definition of what
+支持向量机
+
+424
+00:14:29,840 --> 00:14:31,520
+a support vector machine does.
+数学上的定义
+
+425
+00:14:31,750 --> 00:14:32,870
+In the next few videos, let's
+再接下来的视频中
+
+426
+00:14:33,100 --> 00:14:33,900
+try to get back to
+让我们再回去
+
+427
+00:14:34,260 --> 00:14:36,030
+intuition about what this
+从直观的角度看看优化目标
+
+428
+00:14:36,480 --> 00:14:37,660
+optimization objective leads to and
+实际上是在做什么
+
+429
+00:14:37,820 --> 00:14:38,840
+whether the source of the hypothesis
+以及 SVM 的假设函数
+
+430
+00:14:39,720 --> 00:14:41,300
+a SVM will learn and also
+将会学习什么
+
+431
+00:14:41,700 --> 00:14:43,060
+talk about how to modify
+同时 也会谈谈 如何
+
+432
+00:14:43,600 --> 00:14:44,640
+this just a little bit to
+做些许修改
+
+433
+00:14:44,920 --> 00:14:46,280
+learn complex, nonlinear functions.
+学习更加复杂、非线性的函数
+
diff --git a/srt/12 - 2 - Large Margin Intuition (11 min).srt b/srt/12 - 2 - Large Margin Intuition (11 min).srt
new file mode 100644
index 00000000..029fb0e1
--- /dev/null
+++ b/srt/12 - 2 - Large Margin Intuition (11 min).srt
@@ -0,0 +1,1540 @@
+1
+00:00:00,750 --> 00:00:02,160
+Sometimes people talk about support
+人们有时将支持向量机
+
+2
+00:00:02,520 --> 00:00:04,380
+vector machines, as large margin
+看做是大间距分类器
+
+3
+00:00:04,990 --> 00:00:06,950
+classifiers, in this video I'd
+在这一部分
+
+4
+00:00:07,080 --> 00:00:08,030
+like to tell you what that
+我将介绍其中的含义
+
+5
+00:00:08,410 --> 00:00:09,500
+means, and this will
+这有助于我们
+
+6
+00:00:09,780 --> 00:00:10,520
+also give us a useful
+直观理解
+
+7
+00:00:11,030 --> 00:00:12,780
+picture of what an SVM
+SVM模型的
+
+8
+00:00:13,020 --> 00:00:17,460
+hypothesis may look like.
+假设是什么样的
+
+9
+00:00:18,070 --> 00:00:19,290
+Here's my cost function for the support vector machine
+这是我的支持向量机模型的代价函数
+
+10
+00:00:21,310 --> 00:00:22,290
+where here on the left
+在左边这里
+
+11
+00:00:22,790 --> 00:00:24,300
+I've plotted my cost 1
+我画出了关于 z 的代价函数 cost1(z)
+
+12
+00:00:24,560 --> 00:00:28,100
+of z function that I used for positive examples and on the right I've plotted my
+此函数用于正样本 而在右边这里我画出了
+
+13
+00:00:30,080 --> 00:00:31,510
+zero of 'Z' function, where I have
+关于 z 的代价函数 cost0(z)
+
+14
+00:00:31,950 --> 00:00:33,850
+'Z' here on the horizontal axis.
+横轴表示 z
+
+15
+00:00:34,380 --> 00:00:35,520
+Now, let's think about what
+现在让我们考虑一下
+
+16
+00:00:35,650 --> 00:00:38,380
+it takes to make these cost functions small.
+最小化这些代价函数的必要条件是什么
+
+17
+00:00:39,660 --> 00:00:40,970
+If you have a positive example,
+如果你有一个正样本
+
+18
+00:00:41,950 --> 00:00:43,170
+so if y is equal to
+y等于1
+
+19
+00:00:43,490 --> 00:00:45,060
+1, then cost 1 of
+则只有在 z 大于等于1时
+
+20
+00:00:45,200 --> 00:00:46,750
+Z is zero only when
+代价函数 cost1(z)
+
+21
+00:00:47,700 --> 00:00:50,070
+Z is greater than or equal to 1.
+才等于0
+
+22
+00:00:50,180 --> 00:00:51,370
+So in other words, if you
+换句话说
+
+23
+00:00:51,510 --> 00:00:52,860
+have a positive example, we really
+如果你有一个正样本
+
+24
+00:00:53,110 --> 00:00:54,550
+want theta transpose x to be greater
+我们会希望 θ 转置乘以 x
+
+25
+00:00:54,870 --> 00:00:55,760
+than or equal to 1
+大于等于1
+
+26
+00:00:56,450 --> 00:00:58,030
+and conversely if y is
+反之
+
+27
+00:00:58,150 --> 00:00:59,300
+equal to zero, look this
+如果 y 是等于0的
+
+28
+00:00:59,510 --> 00:01:00,490
+cost zero of z function,
+我们观察一下
+
+29
+00:01:01,560 --> 00:01:03,000
+then it's only in
+函数cost0(z)
+
+30
+00:01:03,200 --> 00:01:04,310
+this region where z is
+它只有在
+
+31
+00:01:04,460 --> 00:01:05,810
+less than equal to 1
+z小于等于1
+
+32
+00:01:06,150 --> 00:01:07,320
+we have the cost is zero
+的区间里
+
+33
+00:01:07,610 --> 00:01:10,150
+as z is equals to zero,
+函数值为0
+
+34
+00:01:10,640 --> 00:01:12,270
+and this is an interesting property of the support
+这是支持向量机的
+
+35
+00:01:12,560 --> 00:01:13,630
+vector machine right, which is
+一个有趣性质 不是么
+
+36
+00:01:13,800 --> 00:01:15,060
+that, if you have a positive
+事实上
+
+37
+00:01:15,440 --> 00:01:17,650
+example so if y is equal to one,
+如果你有一个正样本 y等于1
+
+38
+00:01:18,370 --> 00:01:19,250
+then all we really need
+则其实我们仅仅要求
+
+39
+00:01:19,550 --> 00:01:21,950
+is that theta transpose x is greater than equal to zero.
+θ 转置乘以 x 大于等于0
+
+40
+00:01:22,970 --> 00:01:25,270
+And that would mean that we classify correctly
+就能将该样本恰当分出
+
+41
+00:01:25,860 --> 00:01:26,950
+because if theta transpose x is greater than zero our
+这是因为如果 θ 转置乘以 x 比0大的话
+
+42
+00:01:27,510 --> 00:01:28,980
+hypothesis will predict zero.
+我们的模型代价函数值为0
+
+43
+00:01:29,840 --> 00:01:30,710
+And similarly, if you have
+类似地 如果你有一个负样本
+
+44
+00:01:31,340 --> 00:01:34,090
+a negative example, then really all you want is that theta transpose x is
+则仅需要 θ 转置乘以x
+
+45
+00:01:34,850 --> 00:01:37,290
+less than zero and that will make sure we got the example right.
+小于等于0 就会将负例正确分离
+
+46
+00:01:37,670 --> 00:01:40,230
+But the support vector machine wants a bit more than that.
+但是 支持向量机的要求更高
+
+47
+00:01:40,580 --> 00:01:43,360
+It says, you know, don't just barely get the example right.
+不仅仅要能正确分开输入的样本
+
+48
+00:01:44,320 --> 00:01:45,990
+So then don't just
+即不仅仅
+
+49
+00:01:46,240 --> 00:01:47,580
+have it just a little bit bigger than zero. What
+要求 θ 转置乘以 x 大于0
+
+50
+00:01:47,890 --> 00:01:48,870
+i really want is for this to be
+我们需要的是
+
+51
+00:01:49,060 --> 00:01:50,370
+quite a lot bigger than zero
+比0值大很多
+
+52
+00:01:50,490 --> 00:01:51,430
+say maybe
+比如
+
+53
+00:01:51,680 --> 00:01:52,530
+bit greater or equal to one
+大于等于1
+
+54
+00:01:52,870 --> 00:01:54,400
+and I want this to be much less than zero.
+我也想这个比0小很多
+
+55
+00:01:54,800 --> 00:01:55,970
+Maybe I want it less than or
+比如我希望它
+
+56
+00:01:56,230 --> 00:01:58,140
+equal to -1.
+小于等于-1
+
+57
+00:01:58,830 --> 00:02:00,000
+And so this builds in an
+这就相当于在支持向量机中嵌入了
+
+58
+00:02:00,120 --> 00:02:01,660
+extra safety factor or safety
+一个额外的安全因子
+
+59
+00:02:02,070 --> 00:02:03,630
+margin factor into the support vector machine.
+或者说安全的间距因子
+
+60
+00:02:04,030 --> 00:02:05,700
+Logistic regression
+当然 逻辑回归
+
+61
+00:02:06,340 --> 00:02:07,620
+does something similar too of course,
+做了类似的事情
+
+62
+00:02:07,820 --> 00:02:08,900
+but let's see what
+但是让我们看一下
+
+63
+00:02:09,110 --> 00:02:10,350
+happens or let's see what
+在支持向量机中
+
+64
+00:02:10,460 --> 00:02:11,290
+the consequences of this are, in the
+这个因子会
+
+65
+00:02:11,360 --> 00:02:13,180
+context of the support vector machine.
+导致什么结果
+
+66
+00:02:14,830 --> 00:02:15,740
+Concretely, what I'd like to do next is
+具体而言 我接下来
+
+67
+00:02:16,010 --> 00:02:17,760
+consider a case
+会考虑一个特例
+
+68
+00:02:17,900 --> 00:02:19,130
+case where we set
+我们将这个常数 C
+
+69
+00:02:19,460 --> 00:02:21,240
+this constant C to be
+设置成
+
+70
+00:02:21,400 --> 00:02:23,340
+a very large value, so let's
+一个非常大的值
+
+71
+00:02:23,530 --> 00:02:24,700
+imagine we set C to
+比如我们假设
+
+72
+00:02:24,820 --> 00:02:28,080
+a very large value, may be a hundred thousand, some huge number.
+C的值为100000 或者其它非常大的数
+
+73
+00:02:29,370 --> 00:02:31,290
+Let's see what the support vector machine will do.
+然后来观察支持向量机会给出什么结果
+
+74
+00:02:31,580 --> 00:02:33,510
+If C is very,
+如果 C 非常大
+
+75
+00:02:33,820 --> 00:02:35,340
+very large, then when minimizing
+则最小化代价函数的时候
+
+76
+00:02:36,350 --> 00:02:38,080
+this optimization objective, we're going
+我们将会很希望
+
+77
+00:02:38,300 --> 00:02:39,640
+to be highly motivated to choose
+找到一个
+
+78
+00:02:39,950 --> 00:02:41,240
+a value, so that this
+使第一项为0的
+
+79
+00:02:41,380 --> 00:02:43,180
+first term is equal to zero.
+最优解
+
+80
+00:02:44,810 --> 00:02:46,250
+So let's try to
+因此
+
+81
+00:02:46,670 --> 00:02:48,320
+understand the optimization problem in
+让我们
+
+82
+00:02:48,430 --> 00:02:49,820
+the context of, what would
+尝试在
+
+83
+00:02:50,050 --> 00:02:51,520
+it take to make this
+代价项的第一项
+
+84
+00:02:51,880 --> 00:02:53,060
+first term in the objective
+为0的情形下理解
+
+85
+00:02:53,470 --> 00:02:54,890
+equal to zero, because you
+该优化问题
+
+86
+00:02:55,000 --> 00:02:56,100
+know, maybe we'll set C to
+比如我们可以把 C
+
+87
+00:02:56,250 --> 00:02:59,420
+some huge constant, and this
+设置成了非常大的常数
+
+88
+00:02:59,590 --> 00:03:00,780
+will hope, this should give us
+这将给我们
+
+89
+00:03:01,300 --> 00:03:02,920
+additional intuition about what
+一些关于支持向量机
+
+90
+00:03:03,110 --> 00:03:05,520
+sort of hypotheses a support vector machine learns.
+模型的直观感受
+
+91
+00:03:06,440 --> 00:03:07,720
+So we saw already that
+我们已经看到
+
+92
+00:03:08,140 --> 00:03:09,260
+whenever you have a training
+输入一个训练样本
+
+93
+00:03:09,480 --> 00:03:11,350
+example with a label
+标签为
+
+94
+00:03:11,690 --> 00:03:13,850
+of y=1 if you
+y=1
+
+95
+00:03:13,950 --> 00:03:15,050
+want to make that first term
+你想令第一项为0
+
+96
+00:03:15,240 --> 00:03:16,280
+zero, what you need is
+你需要做的是
+
+97
+00:03:16,450 --> 00:03:17,680
+is to find a value of theta
+找到一个 θ
+
+98
+00:03:17,990 --> 00:03:20,380
+so that theta transpose x i is greater
+使得 θ 转置乘以 x
+
+99
+00:03:20,690 --> 00:03:22,800
+than or equal to 1.
+大于等于1
+
+100
+00:03:23,220 --> 00:03:24,250
+And similarly, whenever we have an example,
+类似地 对于一个训练样本
+
+101
+00:03:24,960 --> 00:03:26,910
+with label zero, in order
+标签为 y=0
+
+102
+00:03:27,240 --> 00:03:28,060
+to make sure that the cost,
+为了使
+
+103
+00:03:29,000 --> 00:03:30,520
+cost zero of Z, in order to
+cost0(z) 函数
+
+104
+00:03:30,610 --> 00:03:31,530
+make sure that cost is
+这个函数
+
+105
+00:03:31,790 --> 00:03:33,250
+zero we need that theta transpose x i
+值为0 我们需要 θ 转置
+
+106
+00:03:33,810 --> 00:03:36,180
+is less than or
+乘以x 的值
+
+107
+00:03:37,900 --> 00:03:38,740
+equal to -1.
+小于等于-1
+
+108
+00:03:39,510 --> 00:03:40,770
+So, if we think
+因此
+
+109
+00:03:41,050 --> 00:03:43,030
+of our optimization problem as
+现在考虑我们的优化问题
+
+110
+00:03:43,360 --> 00:03:45,000
+now, really choosing parameters
+选择参数
+
+111
+00:03:45,710 --> 00:03:46,750
+and show that this first
+使得第一项
+
+112
+00:03:47,020 --> 00:03:48,170
+term is equal to zero,
+等于0
+
+113
+00:03:49,130 --> 00:03:50,230
+what we're left with is
+就会导致下面的
+
+114
+00:03:50,330 --> 00:03:51,670
+the following optimization problem.
+优化问题
+
+115
+00:03:52,050 --> 00:03:53,720
+We're going to minimize that first
+因为我们将
+
+116
+00:03:53,980 --> 00:03:55,360
+term zero, so C
+选择参数使第一项为0
+
+117
+00:03:55,590 --> 00:03:56,710
+times zero, because we're going
+因此这个函数的第一项为0
+
+118
+00:03:56,870 --> 00:03:58,040
+to choose parameters so that's equal
+因此是 C 乘以0
+
+119
+00:03:58,150 --> 00:03:59,710
+to zero, plus one half
+加上二分之一
+
+120
+00:04:00,330 --> 00:04:01,330
+and then you know that
+乘以第二项
+
+121
+00:04:01,460 --> 00:04:05,440
+second term and this
+这里第一项
+
+122
+00:04:05,620 --> 00:04:06,880
+first term is 'C' times zero,
+是C乘以0
+
+123
+00:04:07,160 --> 00:04:08,020
+so let's just cross that
+因此可以将其删去
+
+124
+00:04:08,130 --> 00:04:11,210
+out because I know that's going to be zero.
+因为我知道它是0
+
+125
+00:04:11,380 --> 00:04:12,570
+And this will be subject to the constraint
+这将遵从以下的约束
+
+126
+00:04:13,400 --> 00:04:15,410
+that theta transpose x(i)
+θ 转置乘以 x(i)
+
+127
+00:04:16,390 --> 00:04:17,560
+is greater than or equal to
+大于等于1
+
+128
+00:04:18,700 --> 00:04:20,930
+one, if y(i)
+如果 y (i)
+
+129
+00:04:22,180 --> 00:04:24,150
+Is equal to one and
+是等于1 的
+
+130
+00:04:24,940 --> 00:04:26,560
+theta transpose x(i) is less than
+θ 转置乘以x(i)
+
+131
+00:04:26,690 --> 00:04:28,060
+or equal to minus one
+小于等于-1
+
+132
+00:04:29,030 --> 00:04:31,680
+whenever you have
+如果样本i是
+
+133
+00:04:32,110 --> 00:04:34,460
+a negative example and it
+一个负样本
+
+134
+00:04:34,540 --> 00:04:35,520
+turns out that when you
+这样 当你
+
+135
+00:04:35,660 --> 00:04:37,930
+solve this optimization problem, when you
+求解这个优化问题的时候
+
+136
+00:04:38,070 --> 00:04:39,440
+minimize this as a function of the parameters theta
+当你最小化这个关于变量 θ 的函数的时候
+
+137
+00:04:40,710 --> 00:04:42,090
+you get a very interesting decision
+你会得到一个非常有趣的决策边界
+
+138
+00:04:42,590 --> 00:04:44,870
+boundary. Concretely, if you
+具体而言
+
+139
+00:04:45,010 --> 00:04:46,470
+look at a data set
+如果你考察
+
+140
+00:04:46,750 --> 00:04:49,660
+like this with positive and negative examples, this data
+这样一个数据集 其中有正样本 也有负样本
+
+141
+00:04:50,920 --> 00:04:52,430
+is linearly separable and by
+可以看到 这个数据集是线性可分的
+
+142
+00:04:52,710 --> 00:04:54,960
+that, I mean that there exists, you know, a straight line,
+我的意思是 存在一条直线把正负样本分开
+
+143
+00:04:55,530 --> 00:04:56,830
+altough there is many a different straight lines,
+当然有多条不同的直线
+
+144
+00:04:56,920 --> 00:04:57,810
+they can separate the positive and
+可以把
+
+145
+00:04:58,720 --> 00:05:01,060
+negative examples perfectly.
+正样本和负样本完全分开
+
+146
+00:05:01,560 --> 00:05:02,710
+For example, here is one decision boundary
+比如 这就是一个决策边界
+
+147
+00:05:04,270 --> 00:05:05,430
+that separates the positive and
+可以把正样本
+
+148
+00:05:05,570 --> 00:05:06,840
+negative examples, but somehow that
+和负样本分开
+
+149
+00:05:07,030 --> 00:05:07,810
+doesn't look like a very
+但是多多少少这个
+
+150
+00:05:07,900 --> 00:05:09,680
+natural one, right? Or by
+看起来并不是非常自然 是么?
+
+151
+00:05:09,810 --> 00:05:11,050
+drawing an even worse one, you know
+或者我们可以画一条更差的决策界
+
+152
+00:05:11,230 --> 00:05:13,540
+here's another decision boundary that
+这是另一条决策边界
+
+153
+00:05:13,710 --> 00:05:14,830
+separates the positive and negative examples
+可以将正样本和负样本分开
+
+154
+00:05:14,900 --> 00:05:15,960
+but just barely.
+但仅仅是勉强分开
+
+155
+00:05:16,120 --> 00:05:18,530
+But neither of those seem like particularly good choices.
+这些决策边界看起来都不是特别好的选择
+
+156
+00:05:20,420 --> 00:05:22,880
+The Support Vector Machines will instead choose this
+支持向量机将会选择
+
+157
+00:05:23,140 --> 00:05:26,450
+decision boundary, which I'm drawing in black.
+这个黑色的决策边界
+
+158
+00:05:29,010 --> 00:05:30,030
+And that seems like a much better decision boundary
+相较于之前
+
+159
+00:05:30,760 --> 00:05:32,310
+than either of
+我用粉色或者绿色
+
+160
+00:05:32,420 --> 00:05:34,450
+the ones that I drew in magenta or in green.
+画的决策界 这条黑色的看起来好得多
+
+161
+00:05:34,750 --> 00:05:35,790
+The black line seems like a more
+黑线看起来
+
+162
+00:05:36,050 --> 00:05:37,840
+robust separator, it does
+是更稳健的决策界
+
+163
+00:05:38,610 --> 00:05:39,710
+a better job of separating the positive and negative examples.
+在分离正样本和负样本上它显得的更好
+
+164
+00:05:39,800 --> 00:05:42,830
+And mathematically, what that does is,
+数学上来讲 这是什么意思呢
+
+165
+00:05:43,530 --> 00:05:45,680
+this black decision boundary has a larger distance.
+这条黑线有更大的距离
+
+166
+00:05:49,160 --> 00:05:50,580
+That distance is called the margin, when I
+这个距离叫做间距 (margin)
+
+167
+00:05:50,760 --> 00:05:51,790
+draw up this two extra
+当画出这两条
+
+168
+00:05:52,380 --> 00:05:54,320
+blue lines, we see
+额外的蓝线
+
+169
+00:05:54,540 --> 00:05:56,010
+that the black decision boundary has
+我们看到黑色的决策界
+
+170
+00:05:56,240 --> 00:05:59,990
+some larger minimum distance from any of my training examples,
+和训练样本之间有更大的最短距离
+
+171
+00:06:00,120 --> 00:06:01,350
+whereas the magenta and the green lines
+然而粉线和蓝线
+
+172
+00:06:01,580 --> 00:06:02,600
+they come awfully close to the training examples.
+离训练样本就非常近
+
+173
+00:06:04,640 --> 00:06:06,100
+and then that seems to do a less a good job separating
+在分离样本的时候就会
+
+174
+00:06:06,500 --> 00:06:08,910
+the positive and negative classes than my black line.
+比黑线表现差
+
+175
+00:06:09,850 --> 00:06:11,500
+And so
+因此
+
+176
+00:06:11,800 --> 00:06:13,600
+this distance is called
+这个距离叫做
+
+177
+00:06:13,960 --> 00:06:16,500
+the margin of the
+支持向量机的
+
+178
+00:06:16,600 --> 00:06:21,300
+support vector machine and this
+间距
+
+179
+00:06:21,500 --> 00:06:22,480
+gives the SVM a certain
+而这是支持向量机
+
+180
+00:06:22,940 --> 00:06:24,010
+robustness, because it tries
+具有鲁棒性的原因
+
+181
+00:06:24,360 --> 00:06:25,530
+to separate the data with as
+因为它努力
+
+182
+00:06:25,700 --> 00:06:27,440
+a large a margin as possible.
+用一个最大间距来分离样本
+
+183
+00:06:29,210 --> 00:06:30,250
+So the support vector machine is
+因此支持向量机
+
+184
+00:06:30,380 --> 00:06:31,650
+sometimes also called a large
+有时被称为
+
+185
+00:06:31,830 --> 00:06:33,930
+margin classifier and this
+大间距分类器
+
+186
+00:06:34,170 --> 00:06:36,180
+is actually a consequence of
+而这其实是
+
+187
+00:06:36,430 --> 00:06:39,370
+the optimization problem we wrote down on the previous slide.
+求解上一页幻灯片上优化问题的结果
+
+188
+00:06:40,140 --> 00:06:40,950
+I know that you might be
+我知道你也许
+
+189
+00:06:41,100 --> 00:06:42,250
+wondering how is it that
+想知道
+
+190
+00:06:42,400 --> 00:06:43,900
+the optimization problem I wrote
+求解上一页幻灯片中的优化问题
+
+191
+00:06:44,070 --> 00:06:45,080
+down in the previous slide, how
+为什么会产生这个结果
+
+192
+00:06:45,280 --> 00:06:47,270
+does that lead to this large margin classifier.
+它是如何产生这个大间距分类器的呢
+
+193
+00:06:48,350 --> 00:06:49,700
+I know I haven't explained that yet.
+我知道我还没有解释这一点
+
+194
+00:06:50,520 --> 00:06:51,570
+And in the next video
+在下一段视频里
+
+195
+00:06:51,810 --> 00:06:53,340
+I'm going to sketch a
+我将会从直观上
+
+196
+00:06:53,500 --> 00:06:55,180
+little bit of the intuition about why
+略述 为什么
+
+197
+00:06:55,430 --> 00:06:57,080
+that optimization problem gives us
+这个优化问题
+
+198
+00:06:57,570 --> 00:06:59,630
+this large margin classifier. But
+会产生大间距分类器
+
+199
+00:06:59,790 --> 00:07:00,860
+this is a useful feature to
+总之这个图示
+
+200
+00:07:00,970 --> 00:07:01,780
+keep in mind if you are
+有助于你
+
+201
+00:07:01,920 --> 00:07:03,150
+trying to understand what are the
+理解
+
+202
+00:07:03,290 --> 00:07:05,600
+sorts of hypothesis that an SVM will choose.
+支持向量机模型的做法
+
+203
+00:07:06,140 --> 00:07:07,200
+That is, trying to separate the
+即努力将正样本和负样本
+
+204
+00:07:07,270 --> 00:07:10,310
+positive and negative examples with as big a margin as possible.
+用最大的间距分开
+
+205
+00:07:12,890 --> 00:07:13,950
+I want to say one last thing
+在本节课中
+
+206
+00:07:14,180 --> 00:07:15,930
+about large margin classifiers in
+关于大间距分类器
+
+207
+00:07:16,070 --> 00:07:17,900
+this intuition, so we
+我想讲最后一点
+
+208
+00:07:18,030 --> 00:07:19,340
+wrote out this large margin classification
+我们将这个大间距分类器
+
+209
+00:07:20,010 --> 00:07:21,040
+setting in the case
+中的正则化因子
+
+210
+00:07:21,420 --> 00:07:23,640
+of when C, that regularization concept,
+常数C
+
+211
+00:07:24,160 --> 00:07:25,190
+was very large, I think
+设置的非常大
+
+212
+00:07:25,390 --> 00:07:27,750
+I set that to a hundred thousand or something.
+我记得我将其设置为了100000
+
+213
+00:07:28,310 --> 00:07:29,760
+So given a dataset
+因此对这样的一个数据集
+
+214
+00:07:30,110 --> 00:07:31,630
+like this, maybe we'll choose
+也许我们将选择
+
+215
+00:07:32,110 --> 00:07:34,000
+that decision boundary that
+这样的决策界从而最大间距地
+
+216
+00:07:34,140 --> 00:07:36,210
+separate the positive and negative examples on large margin.
+分离开正样本和负样本
+
+217
+00:07:37,370 --> 00:07:39,020
+Now, the SVM is actually sligthly
+事实上 支持向量机现在
+
+218
+00:07:39,370 --> 00:07:41,120
+more sophisticated than this large
+要比这个大间距分类器所体现的
+
+219
+00:07:41,440 --> 00:07:42,920
+margin view might suggest.
+更成熟
+
+220
+00:07:43,630 --> 00:07:45,130
+And in particular, if all you're
+尤其是当你使用
+
+221
+00:07:45,310 --> 00:07:46,490
+doing is use a large
+大间距分类器的时候
+
+222
+00:07:46,680 --> 00:07:48,850
+margin classifier then your
+你的学习算法
+
+223
+00:07:49,020 --> 00:07:50,270
+learning algorithms can be sensitive
+会受异常点 (outlier) 的影响
+
+224
+00:07:50,920 --> 00:07:52,260
+to outliers, so lets just
+比如我们加入
+
+225
+00:07:52,450 --> 00:07:53,990
+add an extra positive example
+一个额外的正样本
+
+226
+00:07:54,520 --> 00:07:56,540
+like that shown on the screen.
+在这里
+
+227
+00:07:57,230 --> 00:07:58,830
+If he had one example then
+如果你加了这个样本
+
+228
+00:07:58,950 --> 00:08:00,060
+it seems as if to separate
+为了将样本
+
+229
+00:08:00,300 --> 00:08:01,410
+data with a large margin,
+用最大间距分开
+
+230
+00:08:02,680 --> 00:08:04,300
+maybe I'll end up learning
+也许我最终
+
+231
+00:08:05,270 --> 00:08:07,260
+a decision boundary like that, right?
+会得到一条类似这样的决策界 对么?
+
+232
+00:08:07,540 --> 00:08:09,130
+that is the magenta line and
+就是这条粉色的线
+
+233
+00:08:09,180 --> 00:08:10,210
+it's really not clear that based
+仅仅基于
+
+234
+00:08:10,440 --> 00:08:11,950
+on the single outlier based on
+一个异常值
+
+235
+00:08:12,180 --> 00:08:13,560
+a single example and it's
+仅仅基于一个样本
+
+236
+00:08:13,790 --> 00:08:14,720
+really not clear that it's
+就将
+
+237
+00:08:14,890 --> 00:08:16,460
+actually a good idea to change
+我的决策界
+
+238
+00:08:17,060 --> 00:08:17,980
+my decision boundary from the black
+从这条黑线变到这条粉线
+
+239
+00:08:18,290 --> 00:08:19,960
+one over to the magenta one.
+这实在是不明智的
+
+240
+00:08:20,980 --> 00:08:23,430
+So, if C, if
+而如果正则化参数 C
+
+241
+00:08:23,640 --> 00:08:25,740
+the regularization parameter C were very
+设置的非常大
+
+242
+00:08:25,970 --> 00:08:27,110
+large, then this is
+这事实上正是
+
+243
+00:08:27,300 --> 00:08:28,130
+actually what SVM will do, it will
+支持向量机将会做的
+
+244
+00:08:28,360 --> 00:08:29,820
+change the decision boundary
+它将决策界
+
+245
+00:08:30,270 --> 00:08:31,530
+from the black to the
+从黑线
+
+246
+00:08:31,650 --> 00:08:33,650
+magenta one but if
+变到了粉线
+
+247
+00:08:33,810 --> 00:08:35,390
+C were reasonably small if
+但是如果 C 设置的小一点
+
+248
+00:08:35,550 --> 00:08:36,720
+you were to use the C,
+如果你将 C
+
+249
+00:08:37,320 --> 00:08:39,090
+not too large then you
+设置的不要太大
+
+250
+00:08:39,260 --> 00:08:40,400
+still end up with this
+则你最终会得到
+
+251
+00:08:40,610 --> 00:08:44,500
+black decision boundary.
+这条黑线
+
+252
+00:08:44,830 --> 00:08:46,880
+And of course if the data were not linearly separable so if you had some positive
+当然数据如果不是线性可分的
+
+253
+00:08:47,250 --> 00:08:48,790
+examples in here, or if
+如果你在这里
+
+254
+00:08:49,170 --> 00:08:50,440
+you had some negative examples
+有一些正样本 或者
+
+255
+00:08:50,980 --> 00:08:52,300
+in here then the SVM
+你在这里有一些负样本
+
+256
+00:08:52,570 --> 00:08:53,830
+will also do the right thing.
+则支持向量机也会将它们恰当分开
+
+257
+00:08:54,260 --> 00:08:55,710
+And so this picture of
+因此
+
+258
+00:08:56,060 --> 00:08:57,770
+a large margin classifier that's
+大间距分类器的描述
+
+259
+00:08:58,090 --> 00:08:59,410
+really, that's really the
+真的
+
+260
+00:08:59,530 --> 00:09:01,720
+picture that gives better intuition
+仅仅是从直观上给出了
+
+261
+00:09:01,970 --> 00:09:03,440
+only for the case of when the
+正则化参数 C
+
+262
+00:09:03,560 --> 00:09:05,050
+regulations parameter C is
+非常大的情形
+
+263
+00:09:05,190 --> 00:09:07,170
+very large, and just
+同时
+
+264
+00:09:07,420 --> 00:09:08,810
+to remind you this corresponds
+要提醒你 C 的作用
+
+265
+00:09:09,650 --> 00:09:11,300
+C plays a role similar to
+类似于
+
+266
+00:09:11,850 --> 00:09:13,600
+one over Lambda, where Lambda
+λ 分之一 λ 是
+
+267
+00:09:13,930 --> 00:09:15,950
+is the regularization parameter
+我们之前使用过
+
+268
+00:09:16,110 --> 00:09:17,970
+we had previously. And so it's
+的正则化参数
+
+269
+00:09:18,080 --> 00:09:18,880
+only of one over Lambda
+这只是C非常大的情形
+
+270
+00:09:19,080 --> 00:09:21,060
+is very large or equivalently
+或者等价地
+
+271
+00:09:21,280 --> 00:09:23,110
+if Lambda is very small that
+λ 非常小的情形
+
+272
+00:09:23,560 --> 00:09:24,640
+you end up with things like
+你最终会得到
+
+273
+00:09:24,850 --> 00:09:27,600
+this Magenta decision boundary, but
+类似粉线这样的决策界
+
+274
+00:09:28,870 --> 00:09:29,560
+in practice when applying support vector machines,
+但是实际上 应用支持向量机的时候
+
+275
+00:09:30,190 --> 00:09:31,620
+when C is not very very large
+当 C 不是
+
+276
+00:09:31,910 --> 00:09:33,180
+like that,
+非常非常大的时候
+
+277
+00:09:34,840 --> 00:09:36,390
+it can do a better job ignoring
+它可以忽略掉一些异常点的影响
+
+278
+00:09:36,980 --> 00:09:38,590
+the few outliers like here. And
+得到更好的决策界
+
+279
+00:09:39,150 --> 00:09:40,320
+also do fine and do reasonable things
+甚至当你的数据不是线性可分的时候
+
+280
+00:09:40,620 --> 00:09:44,400
+even if your data is not linearly separable.
+支持向量机也可以给出好的结果
+
+281
+00:09:44,690 --> 00:09:46,810
+But when we talk about bias and variance in the context of support vector machines
+我们稍后会介绍一点
+
+282
+00:09:46,980 --> 00:09:47,990
+which will do
+支持向量机的偏差和方差
+
+283
+00:09:48,170 --> 00:09:50,170
+a little bit later, hopefully all
+希望在那时候
+
+284
+00:09:50,410 --> 00:09:51,990
+of of this trade-offs involving the regularization
+关于如何处理参数的这种平衡会变得
+
+285
+00:09:52,410 --> 00:09:53,710
+parameter will become clearer at
+更加清晰
+
+286
+00:09:53,830 --> 00:09:55,280
+that time. So I hope
+我希望
+
+287
+00:09:55,580 --> 00:09:57,290
+that gives some intuition about
+这节课给出了一些
+
+288
+00:09:57,600 --> 00:09:59,680
+how this support vector machine functions as
+关于为什么支持向量机
+
+289
+00:09:59,850 --> 00:10:01,810
+a large margin classifier that
+被看做大间距分类器的直观理解
+
+290
+00:10:01,950 --> 00:10:03,040
+tries to separate the data with
+它用最大间距将样本区分开
+
+291
+00:10:03,610 --> 00:10:05,210
+a large margin, technically this
+尽管从技术上讲
+
+292
+00:10:06,140 --> 00:10:07,160
+picture of this view is true
+这只有当
+
+293
+00:10:07,460 --> 00:10:08,710
+only when the parameter C is very large, which is
+参数C是非常大的时候是真的
+
+294
+00:10:10,230 --> 00:10:11,720
+a useful way to think about support vector machines.
+但是它对于理解支持向量机是有益的
+
+295
+00:10:13,120 --> 00:10:14,450
+There was one missing step in
+本节课中 我们略去了一步
+
+296
+00:10:14,560 --> 00:10:15,990
+this video which is, why is
+那就是我们在幻灯片中
+
+297
+00:10:16,110 --> 00:10:17,670
+it that the optimization problem we
+给出的优化问题
+
+298
+00:10:17,770 --> 00:10:18,770
+wrote down on these
+为什么会是这样的
+
+299
+00:10:19,040 --> 00:10:19,930
+slides, how does that actually
+它是如何
+
+300
+00:10:20,740 --> 00:10:22,490
+lead to the large margin classifier, I
+得出大间距分类器的
+
+301
+00:10:22,600 --> 00:10:23,520
+didn't do that in this video,
+我在本节中没有讲解
+
+302
+00:10:23,930 --> 00:10:25,830
+in the next video I
+在下一节课中
+
+303
+00:10:25,870 --> 00:10:26,940
+will sketch a little bit
+我将略述
+
+304
+00:10:27,120 --> 00:10:28,370
+more of the math behind that
+这些问题背后的数学原理
+
+305
+00:10:28,750 --> 00:10:29,750
+to explain
+来解释
+
+306
+00:10:29,850 --> 00:10:31,660
+that separate reasoning of how
+这个优化问题
+
+307
+00:10:31,930 --> 00:10:33,410
+the optimization problem we wrote out
+是如何
+
+308
+00:10:33,840 --> 00:10:34,990
+results in a large margin classifier.
+得到一个大间距分类器的
+
diff --git a/srt/12 - 3 - Mathematics Behind Large Margin Classification (Optional) (20 min).srt b/srt/12 - 3 - Mathematics Behind Large Margin Classification (Optional) (20 min).srt
new file mode 100644
index 00000000..cf82c308
--- /dev/null
+++ b/srt/12 - 3 - Mathematics Behind Large Margin Classification (Optional) (20 min).srt
@@ -0,0 +1,2641 @@
+1
+00:00:00,680 --> 00:00:01,740
+In this video, I'd like to
+在本节课中 我将
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,900 --> 00:00:02,960
+tell you a bit about the
+介绍一些
+
+3
+00:00:03,210 --> 00:00:04,680
+math behind large margin classification.
+大间隔分类背后的数学原理
+
+4
+00:00:05,960 --> 00:00:08,390
+This video is optional, so please feel free to skip it.
+本节为选学部分 你完全可以跳过它
+
+5
+00:00:09,260 --> 00:00:10,380
+It may also give you better
+但是听听这节课可能让你对
+
+6
+00:00:10,650 --> 00:00:11,980
+intuition about how the
+支持向量机中的优化问题
+
+7
+00:00:12,460 --> 00:00:13,830
+optimization problem of the
+以及如何得到
+
+8
+00:00:13,940 --> 00:00:15,540
+support vex machine, how that
+大间距分类器
+
+9
+00:00:15,860 --> 00:00:17,150
+leads to large margin classifiers.
+产生更好的直观理解
+
+10
+00:00:21,180 --> 00:00:22,530
+In order to get started, let
+首先
+
+11
+00:00:22,600 --> 00:00:23,730
+me first remind you of a
+让我来给大家复习一下
+
+12
+00:00:23,970 --> 00:00:26,490
+couple of properties of what vector inner products look like.
+关于向量内积的知识
+
+13
+00:00:28,310 --> 00:00:29,280
+Let's say I have two vectors
+假设我有两个向量
+
+14
+00:00:29,900 --> 00:00:32,180
+U and V, that look like this.
+u 和 v 我将它们写在这里
+
+15
+00:00:32,950 --> 00:00:34,180
+So both two dimensional vectors.
+两个都是二维向量
+
+16
+00:00:35,460 --> 00:00:36,940
+Then let's see what U
+我们看一下
+
+17
+00:00:37,440 --> 00:00:39,550
+transpose V looks like.
+u 转置乘以 v 的结果
+
+18
+00:00:40,160 --> 00:00:42,180
+And U transpose V is
+u 转置乘以 v
+
+19
+00:00:42,300 --> 00:00:43,720
+also called the inner products
+也叫做向量 u 和 v
+
+20
+00:00:44,490 --> 00:00:45,880
+between the vectors U and V.
+之间的内积
+
+21
+00:00:48,360 --> 00:00:49,960
+Use a two dimensional vector, so
+由于是二维向量 我可以
+
+22
+00:00:50,380 --> 00:00:51,940
+I can on plot it on this figure.
+将它们画在这个图上
+
+23
+00:00:52,760 --> 00:00:53,860
+So let's say
+我们说
+
+24
+00:00:54,040 --> 00:00:55,850
+that's the vector U. And
+这就是向量 u
+
+25
+00:00:55,960 --> 00:00:56,930
+what I mean by that is
+即
+
+26
+00:00:57,110 --> 00:00:59,160
+if on the horizontal axis that
+在横轴上
+
+27
+00:00:59,360 --> 00:01:00,820
+value takes whatever value
+取值为某个u1
+
+28
+00:01:01,560 --> 00:01:03,280
+U1 is and on the
+而在纵轴上
+
+29
+00:01:03,350 --> 00:01:04,820
+vertical axis the height
+高度是
+
+30
+00:01:05,100 --> 00:01:06,360
+of that is whatever U2
+某个 u2 作为U的
+
+31
+00:01:07,340 --> 00:01:08,530
+is the second component
+第二个分量
+
+32
+00:01:08,990 --> 00:01:12,580
+of the vector U. Now, one
+现在
+
+33
+00:01:12,860 --> 00:01:13,760
+quantity that will be nice
+很容易计算的
+
+34
+00:01:14,040 --> 00:01:15,430
+to have is the norm
+一个量就是向量 u 的
+
+35
+00:01:16,500 --> 00:01:17,540
+of the vector U. So, these
+范数
+
+36
+00:01:17,860 --> 00:01:19,390
+are, you know, double bars on
+这是双竖线
+
+37
+00:01:19,540 --> 00:01:20,380
+the left and right that denotes
+左边一个 右边一个
+
+38
+00:01:20,800 --> 00:01:22,610
+the norm or length of
+表示 u 的范数
+
+39
+00:01:22,730 --> 00:01:23,930
+U. So this just means; really the
+即 u 的长度
+
+40
+00:01:24,200 --> 00:01:27,330
+euclidean length of the
+即向量 u 的欧几里得长度
+
+41
+00:01:27,410 --> 00:01:30,800
+vector U. And this
+根据
+
+42
+00:01:31,350 --> 00:01:33,600
+is Pythagoras theorem is just
+毕达哥拉斯定理 等于
+
+43
+00:01:33,940 --> 00:01:35,420
+equal to U1
+它等于 u1 平方
+
+44
+00:01:35,620 --> 00:01:37,300
+squared plus U2
+加上 u2 平方
+
+45
+00:01:37,530 --> 00:01:40,190
+squared square root, right?
+开根号
+
+46
+00:01:40,300 --> 00:01:42,780
+And this is the length of the vector U. That's a real number.
+这是向量 u 的长度 它是一个实数
+
+47
+00:01:43,730 --> 00:01:44,750
+Just say you know, what is the length
+现在你知道了
+
+48
+00:01:45,080 --> 00:01:46,120
+of this, what is the
+这个的长度是多少
+
+49
+00:01:46,220 --> 00:01:48,900
+length of this vector down here.
+这个向量的长度写在这里了
+
+50
+00:01:49,680 --> 00:01:50,490
+What is the length of this
+我刚刚画的这个
+
+51
+00:01:50,760 --> 00:01:52,990
+arrow that I just drew, is the normal view?
+向量的长度就知道了
+
+52
+00:01:56,020 --> 00:01:57,300
+Now let's go back and
+现在让我们回头来看
+
+53
+00:01:57,450 --> 00:01:59,660
+look at the vector V because we want to compute the inner product.
+向量v 因为我们想计算内积
+
+54
+00:02:00,430 --> 00:02:01,380
+So V will be some other
+v 是另一个向量
+
+55
+00:02:01,520 --> 00:02:03,150
+vector with, you know,
+它的两个分量 v1 和 v2
+
+56
+00:02:03,310 --> 00:02:06,900
+some value V1, V2.
+是已知的
+
+57
+00:02:08,340 --> 00:02:10,490
+And so, the vector
+向量 v
+
+58
+00:02:10,880 --> 00:02:15,050
+V will look like that, towards V like so.
+可以画在这里
+
+59
+00:02:16,920 --> 00:02:18,260
+Now let's go back
+现在让我们
+
+60
+00:02:18,640 --> 00:02:19,880
+and look at how to compute
+来看看如何计算
+
+61
+00:02:20,400 --> 00:02:21,610
+the inner product between U
+u 和 v 之间的内积
+
+62
+00:02:21,860 --> 00:02:23,320
+and V. Here's how you can do it.
+这就是具体做法
+
+63
+00:02:24,010 --> 00:02:25,780
+Let me take the vector V and
+我们将向量 v
+
+64
+00:02:26,200 --> 00:02:28,440
+project it down onto the
+投影到
+
+65
+00:02:28,550 --> 00:02:29,700
+vector U. So I'm going
+向量 u 上
+
+66
+00:02:29,930 --> 00:02:31,900
+to take a orthogonal projection or
+我们做一个直角投影
+
+67
+00:02:31,970 --> 00:02:33,700
+a 90 degree projection, and project
+或者说一个90度投影
+
+68
+00:02:33,920 --> 00:02:35,490
+it down onto U like so.
+将其投影到 u 上
+
+69
+00:02:36,650 --> 00:02:37,410
+And what I'm going to do
+接下来我度量
+
+70
+00:02:38,130 --> 00:02:39,480
+measure length of this
+这条红线的
+
+71
+00:02:40,210 --> 00:02:41,520
+red line that I just drew here.
+长度
+
+72
+00:02:41,720 --> 00:02:42,620
+So, I'm going to call the length of
+我称这条红线的
+
+73
+00:02:42,730 --> 00:02:44,670
+that red line P. So, P
+长度为 p 因此 p
+
+74
+00:02:45,530 --> 00:02:46,830
+is the length or is
+就是长度 或者说是
+
+75
+00:02:46,890 --> 00:02:48,230
+the magnitude of the projection
+向量 v 投影到
+
+76
+00:02:49,670 --> 00:02:51,670
+of the vector V onto the
+向量 u 上的量
+
+77
+00:02:51,790 --> 00:02:54,380
+vector U. Let me just write that down.
+我将它写下来
+
+78
+00:02:54,560 --> 00:02:55,600
+So, P is the length
+p 是 v
+
+79
+00:02:57,500 --> 00:03:02,150
+of the projection of the
+投影到
+
+80
+00:03:02,260 --> 00:03:05,800
+vector V onto the
+向量 u 上的
+
+81
+00:03:05,920 --> 00:03:08,210
+vector U. And it is
+长度
+
+82
+00:03:08,430 --> 00:03:10,510
+possible to show that unit
+因此可以
+
+83
+00:03:10,790 --> 00:03:12,710
+product U transpose V, that
+将 u 转置乘以 v
+
+84
+00:03:12,870 --> 00:03:13,540
+this is going to be equal
+写作
+
+85
+00:03:13,840 --> 00:03:16,330
+to P times the
+p 乘以
+
+86
+00:03:16,430 --> 00:03:18,020
+norm or the length of
+u 的范数或者说
+
+87
+00:03:18,110 --> 00:03:21,130
+the vector U. So, this
+u的长度
+
+88
+00:03:21,460 --> 00:03:23,400
+is one way to compute the inner product.
+这是计算内积的一种方法
+
+89
+00:03:24,070 --> 00:03:25,590
+And if you actually do
+如果你从几何上
+
+90
+00:03:25,780 --> 00:03:27,160
+the geometry figure out what
+画出 p 的值
+
+91
+00:03:27,330 --> 00:03:29,280
+P is and figure out what the norm of U is.
+同时画出 u 的范数
+
+92
+00:03:29,900 --> 00:03:30,690
+This should give you the same
+你也会同样地
+
+93
+00:03:31,050 --> 00:03:32,330
+way, the same answer as
+计算出内积
+
+94
+00:03:32,680 --> 00:03:33,840
+the other way of computing unit product.
+答案是一样的
+
+95
+00:03:34,860 --> 00:03:34,860
+Right.
+对吧
+
+96
+00:03:35,070 --> 00:03:36,140
+Which is if you take U
+另一个计算公式是
+
+97
+00:03:36,280 --> 00:03:38,150
+transpose V then U transposes
+u 转置乘以 v 就是
+
+98
+00:03:39,000 --> 00:03:40,930
+this U1 U2, its a
+[u1 u2] 这个一行两列的矩阵
+
+99
+00:03:41,090 --> 00:03:42,650
+one by two matrix, 1
+乘以
+
+100
+00:03:43,220 --> 00:03:45,250
+times V. And so
+v 因此
+
+101
+00:03:45,620 --> 00:03:46,930
+this should actually give you
+可以得到
+
+102
+00:03:47,490 --> 00:03:50,630
+U1, V1 plus U2, V2.
+u1×v1 加上 u2×v2
+
+103
+00:03:51,700 --> 00:03:53,140
+And so the theorem of
+根据线性代数的知识
+
+104
+00:03:53,310 --> 00:03:55,010
+linear algebra that these two
+这两个公式
+
+105
+00:03:55,180 --> 00:03:56,880
+formulas give you the same answer.
+会给出同样的结果
+
+106
+00:03:57,890 --> 00:03:58,720
+And by the way, U transpose
+顺便说一句
+
+107
+00:03:59,290 --> 00:04:01,010
+V is also equal to
+u 转置乘以 v
+
+108
+00:04:01,320 --> 00:04:03,490
+V transpose U. So if
+等于 v 转置乘以 u
+
+109
+00:04:03,650 --> 00:04:04,510
+you were to do the same process
+因此如果你将 u 和 v 交换位置
+
+110
+00:04:05,050 --> 00:04:06,860
+in reverse, instead of projecting
+将 u 投影到 v 上
+
+111
+00:04:07,120 --> 00:04:08,130
+V onto U, you could project
+而不是将 v 投影到 u 上
+
+112
+00:04:08,520 --> 00:04:09,940
+U onto V. Then, you know, do
+然后做同样地计算
+
+113
+00:04:10,160 --> 00:04:12,410
+the same process, but with the rows of U and V reversed.
+只是把 u 和 v 的位置交换一下
+
+114
+00:04:13,170 --> 00:04:14,390
+And you would actually, you should
+你事实上可以
+
+115
+00:04:14,710 --> 00:04:16,900
+actually get the same number whatever that number is.
+得到同样的结果
+
+116
+00:04:17,540 --> 00:04:18,790
+And just to clarify what's
+申明一点
+
+117
+00:04:18,990 --> 00:04:20,850
+going on in this equation the
+在这个等式中
+
+118
+00:04:21,030 --> 00:04:21,920
+norm of U is a real
+u 的范数是一个实数
+
+119
+00:04:22,100 --> 00:04:25,260
+number and P is also a real number.
+p也是一个实数
+
+120
+00:04:25,760 --> 00:04:28,720
+And so U transpose V is
+因此 u 转置乘以 v
+
+121
+00:04:29,410 --> 00:04:32,350
+the regular multiplication as two real numbers of
+就是两个实数
+
+122
+00:04:33,040 --> 00:04:34,440
+the length of P times the normal view.
+正常相乘
+
+123
+00:04:35,580 --> 00:04:36,960
+Just one last detail, which is
+最后一点
+
+124
+00:04:37,190 --> 00:04:38,240
+if you look at the norm of
+需要注意的就是p值
+
+125
+00:04:38,330 --> 00:04:40,250
+P, P is actually signed so to the right.
+p事实上是有符号的
+
+126
+00:04:41,350 --> 00:04:43,240
+And it can either be positive or negative.
+即它可能是正值 也可能是负值
+
+127
+00:04:44,350 --> 00:04:45,530
+So let me say what I mean
+我的意思是说
+
+128
+00:04:45,650 --> 00:04:46,740
+by that, if U
+如果 u
+
+129
+00:04:47,170 --> 00:04:49,360
+is a vector that looks like
+是一个类似这样的向量
+
+130
+00:04:49,640 --> 00:04:51,360
+this and V is a vector that looks like this.
+v 是一个类似这样的向量
+
+131
+00:04:52,380 --> 00:04:53,890
+So if the angle between U
+u 和 v 之间的
+
+132
+00:04:54,130 --> 00:04:55,770
+and V is greater than ninety degrees.
+夹角大于90度
+
+133
+00:04:56,620 --> 00:04:57,960
+Then if I project V onto
+则如果将 v
+
+134
+00:04:58,270 --> 00:05:00,220
+U, what I get
+投影到 u 上 会得到
+
+135
+00:05:00,420 --> 00:05:01,590
+is a projection it looks like
+这样的一个投影
+
+136
+00:05:01,720 --> 00:05:03,860
+this and so that length
+这是 p 的长度
+
+137
+00:05:04,110 --> 00:05:05,490
+P. And in this
+在这个情形下
+
+138
+00:05:05,670 --> 00:05:06,900
+case, I will still have
+我们仍然有
+
+139
+00:05:07,670 --> 00:05:09,510
+that U transpose V is
+u 转置乘以 v
+
+140
+00:05:09,660 --> 00:05:11,720
+equal to P times the
+是等于 p 乘以
+
+141
+00:05:11,800 --> 00:05:14,070
+norm of U. Except in
+u 的范数
+
+142
+00:05:14,200 --> 00:05:16,600
+this example P will be negative.
+唯一一点不同的是 p 在这里是负的
+
+143
+00:05:19,150 --> 00:05:20,990
+So, you know, in inner products if the angle
+在内积计算中 如果 u 和 v 之间的夹角
+
+144
+00:05:21,320 --> 00:05:22,540
+between U and V is less
+小于90度
+
+145
+00:05:22,790 --> 00:05:23,820
+than ninety degrees, then P
+那么那条红线的长度
+
+146
+00:05:24,100 --> 00:05:26,480
+is the positive length for that red line
+p 是正值
+
+147
+00:05:27,130 --> 00:05:28,420
+whereas if the angle of this
+然而如果
+
+148
+00:05:28,720 --> 00:05:29,640
+angle of here is greater
+这个夹角
+
+149
+00:05:30,000 --> 00:05:31,890
+than 90 degrees then P
+大于90度 则p
+
+150
+00:05:32,130 --> 00:05:33,880
+here will be negative of
+将会是负的
+
+151
+00:05:34,130 --> 00:05:37,260
+the length of the super line of that little line segment right over there.
+就是这个小线段的长度是负的
+
+152
+00:05:37,650 --> 00:05:38,750
+So the inner product between two
+因此两个向量之间的内积
+
+153
+00:05:38,900 --> 00:05:40,130
+vectors can also be negative
+也是负的
+
+154
+00:05:40,820 --> 00:05:42,900
+if the angle between them is greater than 90 degrees.
+如果它们之间的夹角大于90度
+
+155
+00:05:43,770 --> 00:05:45,100
+So that's how vector inner
+这就是关于向量内积的知识
+
+156
+00:05:45,310 --> 00:05:46,490
+products work. We're going to
+我们接下来将会
+
+157
+00:05:46,930 --> 00:05:47,960
+use these properties of vector
+使用这些关于向量内积的
+
+158
+00:05:48,280 --> 00:05:49,610
+inner product to try
+性质 试图来
+
+159
+00:05:49,840 --> 00:05:51,880
+to understand the support
+理解支持向量机
+
+160
+00:05:52,400 --> 00:05:54,490
+vector machine optimization objective over there. Here
+中的目标函数
+
+161
+00:05:54,630 --> 00:05:58,620
+is the optimization objective for the
+这就是我们先前给出的
+
+162
+00:05:58,650 --> 00:06:00,900
+support vector machine that we worked out earlier. Just for
+支持向量机模型中的目标函数
+
+163
+00:06:01,100 --> 00:06:02,070
+the purpose of this slide I
+为了讲解方便
+
+164
+00:06:02,120 --> 00:06:04,520
+am going to make one simplification or
+我做一点简化
+
+165
+00:06:04,910 --> 00:06:08,220
+once just to make the objective easy
+仅仅是为了让目标函数
+
+166
+00:06:08,670 --> 00:06:10,110
+to analyze and what I'm going to do is
+更容易被分析 我接下来忽略掉截距
+
+167
+00:06:10,270 --> 00:06:14,160
+ignore the indeceptrums. So, we'll just ignore theta 0 and set that to be equal to 0. To
+令 θ0 等于 0
+
+168
+00:06:16,510 --> 00:06:22,950
+make things easier to plot, I'm also going to set N the number of features to be equal to 2. So, we have only 2 features,
+这样更容易画示意图 我将特征数 n 置为2 因此我们仅有
+
+169
+00:06:23,980 --> 00:06:24,710
+X1 and X2.
+两个特征 x1 和 x2
+
+170
+00:06:26,510 --> 00:06:27,980
+Now, let's look at the objective function.
+现在 我们来看一下目标函数
+
+171
+00:06:28,470 --> 00:06:29,910
+The optimization objective of the
+支持向量机的优化目标函数
+
+172
+00:06:30,160 --> 00:06:32,130
+SVM. What we have only two features.
+当我们仅有两个特征
+
+173
+00:06:32,630 --> 00:06:33,710
+When N is equal to 2.
+即 n=2 时
+
+174
+00:06:34,170 --> 00:06:35,340
+This can be written,
+这个式子可以写作
+
+175
+00:06:36,130 --> 00:06:37,900
+one half of
+二分之一
+
+176
+00:06:38,040 --> 00:06:40,080
+theta one squared plus theta two squared.
+θ1 平方加上 θ2 平方
+
+177
+00:06:40,620 --> 00:06:42,870
+Because we only have two parameters, theta one and thetaa two.
+我们只有两个参数 θ1 和θ2
+
+178
+00:06:45,240 --> 00:06:46,730
+What I'm going to do is rewrite this a bit.
+接下来我重写一下
+
+179
+00:06:46,940 --> 00:06:47,900
+I'm going to write this as one
+我将其重写成
+
+180
+00:06:48,090 --> 00:06:49,980
+half of theta one
+二分之一 θ1 平方
+
+181
+00:06:50,190 --> 00:06:51,860
+squared plus theta two squared and
+加上 θ2 平方
+
+182
+00:06:52,050 --> 00:06:54,160
+the square root squared.
+开平方根后再平方
+
+183
+00:06:54,820 --> 00:06:55,760
+And the reason I can do that,
+我这么做的根据是
+
+184
+00:06:56,100 --> 00:06:58,990
+is because for any number, you know, W, right, the
+对于任何数 w
+
+185
+00:07:00,830 --> 00:07:02,480
+square roots of W and
+w的平方根 再取平方
+
+186
+00:07:02,570 --> 00:07:03,930
+then squared, that's just equal
+得到的就是
+
+187
+00:07:04,080 --> 00:07:05,650
+to W. So square roots
+w 本身 因此平方根 然后平方
+
+188
+00:07:05,840 --> 00:07:07,250
+and squared should give you the same thing.
+并不会改变值的大小
+
+189
+00:07:08,600 --> 00:07:09,500
+What you may notice is that
+你可能注意到
+
+190
+00:07:09,730 --> 00:07:11,870
+this term inside is that's
+括号里面的这一项
+
+191
+00:07:12,290 --> 00:07:13,450
+equal to the norm
+是向量 θ
+
+192
+00:07:14,530 --> 00:07:16,460
+or the length of the
+的范数
+
+193
+00:07:16,690 --> 00:07:18,250
+vector theta and what
+或者说是向量 θ 的长度
+
+194
+00:07:18,430 --> 00:07:20,020
+I mean by that is that
+我的意思是
+
+195
+00:07:20,200 --> 00:07:21,640
+if we write out the
+如果我们将
+
+196
+00:07:21,700 --> 00:07:22,590
+vector theta like this, as
+向量 θ 写出来
+
+197
+00:07:23,080 --> 00:07:24,320
+you know theta one, theta two.
+θ1 θ2
+
+198
+00:07:25,260 --> 00:07:26,260
+Then this term that I've just
+那么我刚刚画红线的这一项
+
+199
+00:07:26,690 --> 00:07:28,230
+underlined in red, that's exactly
+就是向量 θ
+
+200
+00:07:28,640 --> 00:07:30,480
+the length, or the norm, of the vector theta.
+的长度或范数
+
+201
+00:07:30,900 --> 00:07:32,180
+We are calling the definition
+这里我们用的是之前
+
+202
+00:07:32,950 --> 00:07:35,050
+of the norm of the vector that we have on the previous line.
+学过的向量范数的定义
+
+203
+00:07:36,140 --> 00:07:37,040
+And in fact this is actually
+事实上这就
+
+204
+00:07:37,400 --> 00:07:38,320
+equal to the length of the
+等于向量 θ
+
+205
+00:07:38,370 --> 00:07:39,760
+vector theta, whether you write
+的长度
+
+206
+00:07:40,020 --> 00:07:41,620
+it as theta zero, theta 1, theta 2.
+当然你可以将其写作 θ0 θ1 θ2
+
+207
+00:07:42,280 --> 00:07:45,230
+That is, if theta zero is equal to zero, as I assume here.
+如果 θ0等于0
+
+208
+00:07:45,860 --> 00:07:46,770
+Or just the length of theta
+那就是
+
+209
+00:07:46,900 --> 00:07:48,680
+1, theta 2; but for
+θ1 θ2 的长度
+
+210
+00:07:48,830 --> 00:07:50,450
+this line I am going to ignore theta 0.
+在这里我将忽略 θ0
+
+211
+00:07:50,940 --> 00:07:52,710
+So let me just, you know, treat theta
+将 θ 仅仅写作这样
+
+212
+00:07:53,150 --> 00:07:54,730
+as this, let me just
+这样来写 θ
+
+213
+00:07:54,960 --> 00:07:56,360
+write theta, the normal
+θ 的范数
+
+214
+00:07:56,720 --> 00:07:58,480
+theta as this theta 1,
+仅仅和 θ1 θ2 有关
+
+215
+00:07:58,620 --> 00:08:00,180
+theta 2 only, but the
+但是
+
+216
+00:08:00,260 --> 00:08:01,220
+math works out either way,
+数学上不管你是否包含 θ0
+
+217
+00:08:01,460 --> 00:08:03,790
+whether we include theta zero here or not.
+其实并没有差别
+
+218
+00:08:03,970 --> 00:08:05,870
+So it's not going to matter for the rest of our derivation.
+因此在我们接下来的推导中去掉θ0不会有影响
+
+219
+00:08:07,630 --> 00:08:09,120
+And so finally this means
+这意味着
+
+220
+00:08:09,390 --> 00:08:11,440
+that my optimization objective is equal
+我们的目标函数是
+
+221
+00:08:11,750 --> 00:08:13,100
+to one half of the
+等于二分之一
+
+222
+00:08:13,190 --> 00:08:14,610
+norm of theta squared.
+θ范数的平方
+
+223
+00:08:16,190 --> 00:08:17,230
+So all the support vector machine
+因此支持向量机
+
+224
+00:08:17,530 --> 00:08:19,010
+is doing in the optimization
+做的全部事情就是
+
+225
+00:08:19,910 --> 00:08:21,500
+objective is it's minimizing the
+极小化参数向量 θ
+
+226
+00:08:21,590 --> 00:08:23,100
+squared norm of the square
+范数的平方 或者说
+
+227
+00:08:23,470 --> 00:08:24,840
+length of the parameter vector theta.
+长度的平方
+
+228
+00:08:28,330 --> 00:08:29,160
+Now what I'd like to do
+现在我将要
+
+229
+00:08:29,370 --> 00:08:30,790
+is look at these terms, theta
+看看这些项
+
+230
+00:08:31,090 --> 00:08:33,670
+transpose X and understand better what they're doing.
+θ 转置乘以x 更深入地理解它们的含义
+
+231
+00:08:34,230 --> 00:08:36,600
+So given the parameter vector theta and given
+给定参数向量θ 给定一个样本 x
+
+232
+00:08:36,930 --> 00:08:39,880
+and example x, what is this is equal to?
+这等于什么呢?
+
+233
+00:08:40,820 --> 00:08:42,120
+And on the previous slide, we
+在前一页幻灯片上
+
+234
+00:08:42,230 --> 00:08:44,070
+figured out what U transpose
+我们画出了
+
+235
+00:08:44,870 --> 00:08:45,850
+V looks like, with different
+在不同情形下
+
+236
+00:08:46,110 --> 00:08:47,880
+vectors U and V. And so we're
+u转置乘以v的示意图
+
+237
+00:08:48,130 --> 00:08:50,340
+going to take those definitions, you know, with theta
+我们将会使用这些概念
+
+238
+00:08:50,980 --> 00:08:52,300
+and X(i) playing the
+θ 和 x(i) 就
+
+239
+00:08:52,410 --> 00:08:53,310
+roles of U and V.
+类似于 u 和 v
+
+240
+00:08:54,400 --> 00:08:57,430
+And let's see what that picture looks like.
+让我们看一下示意图
+
+241
+00:08:57,860 --> 00:08:59,160
+So, let's say I plot. Let's say I look at
+我们考察一个
+
+242
+00:08:59,430 --> 00:09:01,130
+just a single training example. Let's say I
+单一的训练样本
+
+243
+00:09:01,230 --> 00:09:03,360
+have a positive example the drawing
+我有一个正样本在这里
+
+244
+00:09:03,720 --> 00:09:05,050
+was across there and let's say that is
+用一个叉来表示这个样本
+
+245
+00:09:05,800 --> 00:09:09,310
+my example X(i), what
+x(i)
+
+246
+00:09:09,500 --> 00:09:10,970
+that really means is
+意思是 在
+
+247
+00:09:12,100 --> 00:09:13,510
+plotted on the horizontal axis
+水平轴上
+
+248
+00:09:14,450 --> 00:09:16,210
+some value X(i) 1
+取值为 x(i)1
+
+249
+00:09:17,140 --> 00:09:19,620
+and on the vertical axis
+在竖直轴上
+
+250
+00:09:21,240 --> 00:09:22,290
+X(i) 2.
+取值为 x(i)2
+
+251
+00:09:22,650 --> 00:09:24,070
+That's how I plot my training examples.
+这就是我画出的训练样本
+
+252
+00:09:25,400 --> 00:09:27,160
+And although we haven't been really
+尽管我没有将其
+
+253
+00:09:27,320 --> 00:09:28,310
+thinking of this as a vector, what
+真的看做向量
+
+254
+00:09:28,570 --> 00:09:29,600
+this really is, this is a
+它事实上
+
+255
+00:09:29,650 --> 00:09:30,910
+vector from the origin
+就是一个
+
+256
+00:09:31,610 --> 00:09:33,520
+from 0, 0 out to
+始于原点
+
+257
+00:09:34,560 --> 00:09:36,210
+the location of this training example.
+终点位置在这个训练样本点的向量
+
+258
+00:09:37,830 --> 00:09:39,460
+And now let's say we have
+现在 我们有
+
+259
+00:09:39,980 --> 00:09:41,850
+a parameter vector and
+一个参数向量
+
+260
+00:09:42,080 --> 00:09:43,620
+I'm going to plot
+我会将它也
+
+261
+00:09:43,800 --> 00:09:45,720
+that as vector, as well.
+画成向量
+
+262
+00:09:46,390 --> 00:09:48,410
+What I mean by that is if I plot theta 1
+我将 θ1 画在这里
+
+263
+00:09:49,100 --> 00:09:53,530
+here and theta 2 there
+将 θ2 画在这里
+
+264
+00:09:56,230 --> 00:09:57,050
+so what is the inner
+那么内积
+
+265
+00:09:57,290 --> 00:09:58,940
+product theta transpose X(i).
+θ 转置乘以 x(i) 将会是什么呢
+
+266
+00:09:59,220 --> 00:10:01,240
+While using our earlier method,
+使用我们之前的方法
+
+267
+00:10:01,990 --> 00:10:03,360
+the way we compute that is we
+我们计算的方式就是
+
+268
+00:10:04,310 --> 00:10:06,170
+take my example and
+我将训练样本
+
+269
+00:10:06,320 --> 00:10:08,710
+project it onto my parameter vector theta.
+投影到参数向量 θ
+
+270
+00:10:09,830 --> 00:10:10,700
+And then I'm going to look
+然后我来看一看
+
+271
+00:10:10,950 --> 00:10:13,070
+at the length of this segment
+这个线段的长度
+
+272
+00:10:13,680 --> 00:10:14,660
+that I'm coloring in, in red.
+我将它画成红色
+
+273
+00:10:15,090 --> 00:10:16,500
+And I'm going to
+我将它称为
+
+274
+00:10:16,670 --> 00:10:19,480
+call that P superscript I
+p 上标 (i)
+
+275
+00:10:20,330 --> 00:10:21,330
+to denote that this is a
+用来表示这是
+
+276
+00:10:21,610 --> 00:10:22,920
+projection of the i-th training example
+第 i 个训练样本
+
+277
+00:10:24,860 --> 00:10:25,540
+onto the parameter vector theta.
+在参数向量 θ 上的投影
+
+278
+00:10:26,900 --> 00:10:28,140
+And so what we have is
+根据我们之前幻灯片的内容
+
+279
+00:10:28,350 --> 00:10:30,790
+that theta transpose X(i) is
+我们知道的是
+
+280
+00:10:30,920 --> 00:10:32,830
+equal to following what
+θ 转置乘以 x(i)
+
+281
+00:10:32,960 --> 00:10:34,210
+we have on the previous slide, this
+等于
+
+282
+00:10:34,430 --> 00:10:35,350
+is going to be equal to
+将会等于
+
+283
+00:10:36,560 --> 00:10:40,000
+P times the
+p 乘以
+
+284
+00:10:40,090 --> 00:10:42,090
+length of the norm of the vector theta.
+向量 θ 的长度 或 范数
+
+285
+00:10:43,410 --> 00:10:44,690
+And this is of course also equal to
+这就等于
+
+286
+00:10:44,750 --> 00:10:46,660
+theta 1 x1
+θ1 乘以 x1
+
+287
+00:10:47,920 --> 00:10:50,610
+plus theta 2 x2. So each
+加上 θ2 x2
+
+288
+00:10:50,810 --> 00:10:52,360
+of these is, you know, an equally
+这两种方式是等价的
+
+289
+00:10:52,680 --> 00:10:54,080
+valid way of computing the
+都可以用来计算
+
+290
+00:10:54,180 --> 00:10:56,160
+inner product between theta and X(i).
+θ 和 x(i) 之间的内积
+
+291
+00:10:57,780 --> 00:10:57,780
+Okay.
+好
+
+292
+00:10:58,140 --> 00:10:59,040
+So where does this leave us?
+这告诉了我们什么呢
+
+293
+00:10:59,280 --> 00:11:00,770
+What this means is that, this
+这里表达的意思是
+
+294
+00:11:01,020 --> 00:11:02,890
+constrains that theta transpose X(i)
+这个 θ 转置乘以 x(i)
+
+295
+00:11:03,130 --> 00:11:05,330
+be greater than or equal to one or less than minus one.
+大于等于1 或者小于-1的
+
+296
+00:11:06,110 --> 00:11:06,860
+What this means is that it
+约束是可以被
+
+297
+00:11:06,970 --> 00:11:07,830
+can replace the use of constraints
+p(i)乘以x大于等于1
+
+298
+00:11:08,610 --> 00:11:12,000
+that P(i) times X
+这个约束
+
+299
+00:11:12,320 --> 00:11:13,500
+be greater than or equal to one.
+所代替的
+
+300
+00:11:13,680 --> 00:11:16,280
+Because theta transpose X(i) is
+因为 θ 转置乘以 x(i)
+
+301
+00:11:16,400 --> 00:11:19,470
+equal to P(i) times the norm of theta.
+等于 p(i) 乘以 θ 的范数
+
+302
+00:11:21,250 --> 00:11:23,080
+So writing that into our optimization objective.
+将其写入我们的优化目标
+
+303
+00:11:23,910 --> 00:11:24,870
+This is what we get
+我们将会得到
+
+304
+00:11:25,130 --> 00:11:26,290
+where I have, instead of
+没有了约束
+
+305
+00:11:27,090 --> 00:11:28,400
+theta transpose X(i), I now
+θ 转置乘以x(i)
+
+306
+00:11:28,620 --> 00:11:30,920
+have this P(i) times the norm of theta.
+而变成了 p(i) 乘以 θ 的范数
+
+307
+00:11:31,970 --> 00:11:32,970
+And just to remind you we
+需要提醒一点
+
+308
+00:11:33,090 --> 00:11:34,240
+worked out earlier too that
+我们之前曾讲过
+
+309
+00:11:34,460 --> 00:11:36,310
+this optimization objective can be
+这个优化目标函数可以
+
+310
+00:11:36,510 --> 00:11:38,130
+written as one half times
+被写成二分之一乘以
+
+311
+00:11:38,500 --> 00:11:39,910
+the norm of theta squared.
+θ 平方的范数
+
+312
+00:11:41,730 --> 00:11:43,490
+So, now let's consider
+现在让我们考虑
+
+313
+00:11:44,210 --> 00:11:45,550
+the training example that we
+下面这里的
+
+314
+00:11:45,700 --> 00:11:47,100
+have at the bottom and
+训练样本
+
+315
+00:11:47,450 --> 00:11:49,620
+for now, continuing to use the simplification that
+现在 继续使用之前的简化 即
+
+316
+00:11:50,180 --> 00:11:51,340
+theta 0 is equal to 0.
+θ0 等于0
+
+317
+00:11:52,030 --> 00:11:54,810
+Let's see what decision boundary the support vector machine will choose.
+我们来看一下支持向量机会选择什么样的决策界
+
+318
+00:11:55,860 --> 00:11:57,710
+Here's one option, let's say
+这是一种选择
+
+319
+00:11:57,870 --> 00:11:59,190
+the support vector machine were to
+我们假设支持向量机会
+
+320
+00:11:59,340 --> 00:12:01,750
+choose this decision boundary.
+选择这个决策边界
+
+321
+00:12:02,690 --> 00:12:05,110
+This is not a very good choice because it has very small margins.
+这不是一个非常好的选择 因为它的间距很小
+
+322
+00:12:05,530 --> 00:12:08,210
+This decision boundary comes very close to the training examples.
+这个决策界离训练样本的距离很近
+
+323
+00:12:09,810 --> 00:12:12,360
+Let's see why the support vector machine will not do this.
+我们来看一下为什么支持向量机不会选择它
+
+324
+00:12:14,130 --> 00:12:15,420
+For this choice of parameters
+对于这样选择的参数 θ
+
+325
+00:12:16,410 --> 00:12:18,280
+it's possible to show that the
+可以看到
+
+326
+00:12:19,030 --> 00:12:21,250
+parameter vector theta is actually
+参数向量 θ 事实上
+
+327
+00:12:21,760 --> 00:12:23,350
+at 90 degrees to the decision boundary.
+是和决策界是90度正交的
+
+328
+00:12:24,060 --> 00:12:25,440
+And so, that green decision boundary
+因此这个绿色的决策界
+
+329
+00:12:26,250 --> 00:12:27,550
+corresponds to a parameter vector
+对应着一个参数向量 θ
+
+330
+00:12:27,920 --> 00:12:29,650
+theta that points in that direction.
+指向这个方向
+
+331
+00:12:30,730 --> 00:12:32,280
+And by the way, the simplification that
+顺便提一句 θ0 等于0
+
+332
+00:12:32,510 --> 00:12:34,120
+theta 0 equals 0 that
+的简化仅仅意味着
+
+333
+00:12:34,300 --> 00:12:35,410
+just means that the decision boundary
+决策界必须
+
+334
+00:12:35,910 --> 00:12:37,960
+must pass through the origin, (0,0) over there.
+通过原点 (0,0)
+
+335
+00:12:38,330 --> 00:12:40,350
+So now, let's
+现在让我们看一下
+
+336
+00:12:40,690 --> 00:12:41,780
+look at what this implies
+这对于优化目标函数
+
+337
+00:12:41,840 --> 00:12:43,590
+for the optimization objective.
+意味着什么
+
+338
+00:12:45,260 --> 00:12:46,420
+Let's say that this example here.
+比如这个样本
+
+339
+00:12:47,460 --> 00:12:48,560
+Let's say that's my first example, you know,
+我们假设它是我的第一个样本
+
+340
+00:12:50,480 --> 00:12:50,650
+X1.
+x(1)
+
+341
+00:12:51,690 --> 00:12:52,630
+If we look at the projection
+如果我考察这个样本
+
+342
+00:12:53,320 --> 00:12:54,870
+of this example onto my parameters theta.
+到参数 θ 的投影
+
+343
+00:12:56,180 --> 00:12:56,520
+That's the projection.
+这就是投影
+
+344
+00:12:57,660 --> 00:12:59,230
+And so that little red line segment.
+这个短的红线段
+
+345
+00:13:00,450 --> 00:13:01,720
+That is equal to P1.
+就等于p(1)
+
+346
+00:13:02,380 --> 00:13:04,650
+And that is going to be pretty small, right.
+它非常短 对么
+
+347
+00:13:05,860 --> 00:13:08,590
+And similarly, if this
+类似地 这个样本
+
+348
+00:13:09,610 --> 00:13:10,710
+example here, if this happens
+如果它恰好是
+
+349
+00:13:11,170 --> 00:13:12,620
+to be X2, that's my second example.
+x(2) 是我的第二个训练样本
+
+350
+00:13:13,880 --> 00:13:16,620
+Then, if I look at the projection of this this example onto theta.
+则它到 θ 的投影在这里
+
+351
+00:13:18,080 --> 00:13:18,170
+You know.
+你知道的
+
+352
+00:13:18,440 --> 00:13:20,460
+Then, let me draw this one in magenta.
+我将它画成粉色
+
+353
+00:13:21,600 --> 00:13:23,690
+This little magenta line segment, that's
+这个短的粉色线段
+
+354
+00:13:24,000 --> 00:13:25,820
+going to be P2. That's
+它是 p(2)
+
+355
+00:13:26,070 --> 00:13:27,370
+the projection of the second example
+第二个样本到
+
+356
+00:13:27,770 --> 00:13:28,870
+onto my, onto the direction
+我的参数向量 θ
+
+357
+00:13:30,100 --> 00:13:32,650
+of my parameter vector theta which goes like this.
+的投影
+
+358
+00:13:33,870 --> 00:13:34,250
+And so, this little
+因此
+
+359
+00:13:35,270 --> 00:13:35,270
+projection line segment is getting pretty small.
+这个投影非常短
+
+360
+00:13:36,850 --> 00:13:38,420
+P2 will actually be a negative number, right so P2 is
+p(2) 事实上是一个负值
+
+361
+00:13:38,560 --> 00:13:42,490
+in the opposite direction.
+p(2) 是在相反的方向
+
+362
+00:13:43,710 --> 00:13:45,250
+This vector has greater
+这个向量
+
+363
+00:13:45,560 --> 00:13:47,130
+than 90 degree angle with my
+和参数向量 θ 的夹角
+
+364
+00:13:47,270 --> 00:13:48,980
+parameter vector theta, it's going to be less than 0.
+大于90度 p(2) 的值小于0
+
+365
+00:13:50,280 --> 00:13:51,580
+And so what we're finding is that
+我们会发现
+
+366
+00:13:51,850 --> 00:13:54,880
+these terms P(i) are
+这些 p(i)
+
+367
+00:13:55,200 --> 00:13:57,230
+going to be pretty small numbers.
+将会是非常小的数
+
+368
+00:13:58,210 --> 00:13:59,080
+So if we look at
+因此当我们考察
+
+369
+00:13:59,110 --> 00:14:01,650
+the optimization objective and see, well, for positive examples
+优化目标函数的时候 对于正样本而言
+
+370
+00:14:02,490 --> 00:14:04,860
+we need P(i) times
+我们需要 p(i) 乘以
+
+371
+00:14:05,220 --> 00:14:07,590
+the norm of theta to be bigger than either one.
+θ 的范数大于等于1
+
+372
+00:14:08,670 --> 00:14:10,640
+But if P(i) over
+但是如果 p(i) 在这里
+
+373
+00:14:10,860 --> 00:14:12,140
+here, if P1 over here
+如果 p(1) 在这里
+
+374
+00:14:12,770 --> 00:14:14,160
+is pretty small, that means
+非常小 那就意味着
+
+375
+00:14:14,410 --> 00:14:15,580
+that we need the norm of
+我们需要 θ 的范数
+
+376
+00:14:15,650 --> 00:14:18,420
+theta to be pretty large, right? If
+非常大 对么
+
+377
+00:14:19,830 --> 00:14:20,840
+P1 of theta is small
+因为如果 p(1) 很小
+
+378
+00:14:21,790 --> 00:14:23,110
+and we want P1 you
+而我们希望 p(1)
+
+379
+00:14:23,410 --> 00:14:24,600
+know times in all of theta
+乘以 θ
+
+380
+00:14:24,920 --> 00:14:25,890
+to be bigger than either one, well
+大于等于1
+
+381
+00:14:26,400 --> 00:14:27,300
+the only way for that
+令其实现的
+
+382
+00:14:27,510 --> 00:14:28,440
+to be true for the profit that
+唯一的办法就是
+
+383
+00:14:28,650 --> 00:14:29,750
+these two numbers to be large
+这两个数较大
+
+384
+00:14:30,020 --> 00:14:31,120
+if P1 is small, as we
+如果 p(1) 小
+
+385
+00:14:31,240 --> 00:14:32,980
+said we want the norm of theta to be large.
+我们就希望 θ 的范数大
+
+386
+00:14:34,150 --> 00:14:36,450
+And similarly for our
+类似地
+
+387
+00:14:36,650 --> 00:14:38,560
+negative example, we need P2
+对于负样本而言 我们需要 p(2)
+
+388
+00:14:39,750 --> 00:14:41,070
+times the norm of
+乘以
+
+389
+00:14:41,350 --> 00:14:44,990
+theta to be
+θ 的范数
+
+390
+00:14:45,160 --> 00:14:46,910
+less than or equal to minus one.
+小于等于-1
+
+391
+00:14:47,760 --> 00:14:48,540
+And we saw in this
+我们已经在这个样本中
+
+392
+00:14:48,710 --> 00:14:50,200
+example already that P2
+看到 p(2)
+
+393
+00:14:50,260 --> 00:14:51,520
+is going pretty small negative number,
+会是一个非常小的数
+
+394
+00:14:52,040 --> 00:14:53,290
+and so the only way for
+因此唯一的办法
+
+395
+00:14:53,420 --> 00:14:54,490
+that to happen as well is
+就是
+
+396
+00:14:54,530 --> 00:14:56,730
+for the norm of theta to be
+θ 的范数变大
+
+397
+00:14:57,010 --> 00:14:59,630
+large, but what
+但是我们
+
+398
+00:14:59,790 --> 00:15:00,900
+we are doing in the optimization
+的目标函数是
+
+399
+00:15:01,290 --> 00:15:02,400
+objective is we are
+希望
+
+400
+00:15:02,540 --> 00:15:03,840
+trying to find a setting
+找到一个参数 θ
+
+401
+00:15:04,170 --> 00:15:05,320
+of parameters where the norm
+它的范数
+
+402
+00:15:05,550 --> 00:15:07,100
+of theta is small, and so
+是小的
+
+403
+00:15:07,830 --> 00:15:09,040
+you know, so this doesn't
+因此 这看起来
+
+404
+00:15:09,330 --> 00:15:10,070
+seem like such a good direction
+不像是一个好的
+
+405
+00:15:10,610 --> 00:15:14,160
+for the parameter vector and theta.
+参数向量 θ 的选择
+
+406
+00:15:14,450 --> 00:15:15,510
+In contrast, just look at a different decision boundary.
+相反的 来看一个不同的决策边界
+
+407
+00:15:17,040 --> 00:15:19,500
+Here, let's say, this SVM chooses
+比如说 支持向量机选择了
+
+408
+00:15:20,510 --> 00:15:21,280
+that decision boundary.
+这个决策界
+
+409
+00:15:22,870 --> 00:15:23,980
+Now the is going to be very different.
+现在状况会有很大不同
+
+410
+00:15:24,420 --> 00:15:25,890
+If that is the decision boundary,
+如果这是决策界
+
+411
+00:15:26,190 --> 00:15:27,380
+here is the
+这就是
+
+412
+00:15:27,450 --> 00:15:28,770
+corresponding direction for theta.
+相对应的参数 θ 的方向
+
+413
+00:15:29,210 --> 00:15:30,920
+So, with the direction
+因此 在这个
+
+414
+00:15:31,000 --> 00:15:32,110
+boundary you know, that
+决策界之下
+
+415
+00:15:32,300 --> 00:15:33,560
+vertical line that corresponds
+垂直线是决策界
+
+416
+00:15:34,470 --> 00:15:35,960
+to it is possible to
+使用线性代数的知识
+
+417
+00:15:36,190 --> 00:15:37,880
+show using linear algebra that
+可以说明
+
+418
+00:15:38,070 --> 00:15:39,140
+the way to get that green decision
+这个绿色的决策界
+
+419
+00:15:39,460 --> 00:15:41,190
+boundary is have the vector of theta be
+有一个垂直于它的
+
+420
+00:15:41,390 --> 00:15:42,620
+at 90 degrees to it,
+向量 θ
+
+421
+00:15:43,610 --> 00:15:44,470
+and now if you look
+现在如果你考察
+
+422
+00:15:44,560 --> 00:15:45,570
+at the projection of your
+你的数据在横轴 x
+
+423
+00:15:45,710 --> 00:15:47,540
+data onto the vector
+上的投影
+
+424
+00:15:48,800 --> 00:15:50,010
+x, lets say its before
+比如 这个我之前提到的样本
+
+425
+00:15:50,010 --> 00:15:52,620
+this example is my example of x1. So when
+我的样本 x(1)
+
+426
+00:15:52,890 --> 00:15:54,600
+I project this on to x,
+当我将它投影到横轴x上
+
+427
+00:15:55,410 --> 00:15:59,110
+or onto theta, what I find is that this is P1.
+或说投影到θ上 就会得到这样的p(1)
+
+428
+00:16:00,650 --> 00:16:02,410
+That length there is P1.
+它的长度是 p(1)
+
+429
+00:16:03,750 --> 00:16:05,820
+The other example, that
+另一个样本
+
+430
+00:16:06,260 --> 00:16:08,620
+example is and I
+那个样本是x(2)
+
+431
+00:16:08,840 --> 00:16:11,300
+do the same projection and
+我做同样的投影
+
+432
+00:16:11,410 --> 00:16:12,580
+what I find is that this
+我会发现
+
+433
+00:16:12,780 --> 00:16:14,680
+length here is a
+这是 p(2) 的长度
+
+434
+00:16:15,610 --> 00:16:17,880
+P2 really that is going to be less than 0.
+它是负值
+
+435
+00:16:18,830 --> 00:16:19,940
+And you notice that now
+你会注意到
+
+436
+00:16:20,480 --> 00:16:22,490
+P1 and P2, these lengths
+现在 p(1) 和 p(2)
+
+437
+00:16:23,810 --> 00:16:24,740
+of the projections are going to
+这些投影长度
+
+438
+00:16:24,780 --> 00:16:26,800
+be much bigger, and so
+是长多了
+
+439
+00:16:27,440 --> 00:16:28,460
+if we still need to enforce
+如果我们仍然要满足这些约束
+
+440
+00:16:28,890 --> 00:16:30,700
+these constraints that P1 of
+p(1) 乘以 θ 的范数
+
+441
+00:16:30,800 --> 00:16:33,040
+the norm of theta is phase
+是比1大的
+
+442
+00:16:33,230 --> 00:16:35,670
+number one because P1 is so much bigger now.
+则因为 p(1) 变大了
+
+443
+00:16:36,580 --> 00:16:39,110
+The normal can be smaller.
+θ 的范数就可以变小了
+
+444
+00:16:41,960 --> 00:16:43,090
+And so, what this means is
+因此这意味着
+
+445
+00:16:43,210 --> 00:16:44,320
+that by choosing the decision
+通过选择
+
+446
+00:16:44,730 --> 00:16:45,760
+boundary shown on the right
+右边的决策界
+
+447
+00:16:46,010 --> 00:16:47,610
+instead of on the left, the
+而不是左边的那个
+
+448
+00:16:47,850 --> 00:16:49,000
+SVM can make the
+支持向量机可以
+
+449
+00:16:49,080 --> 00:16:50,560
+norm of the parameters theta
+使参数 θ 的范数
+
+450
+00:16:50,840 --> 00:16:52,420
+much smaller. So, if we can
+变小很多 因此如果我们想
+
+451
+00:16:52,550 --> 00:16:54,080
+make the norm of theta smaller and
+令 θ 的范数变小
+
+452
+00:16:54,260 --> 00:16:55,140
+therefore make the squared norm of
+从而令 θ 范数的平方
+
+453
+00:16:55,590 --> 00:16:57,080
+theta smaller, which is
+变小 就能让
+
+454
+00:16:57,210 --> 00:16:58,130
+why the SVM
+支持向量机
+
+455
+00:16:58,710 --> 00:17:00,920
+would choose this hypothesis on the right instead.
+选择右边的决策界
+
+456
+00:17:02,800 --> 00:17:04,260
+And this is how
+这就是
+
+457
+00:17:05,580 --> 00:17:07,160
+the SVM gives rise
+支持向量机如何能
+
+458
+00:17:07,500 --> 00:17:09,550
+to this large margin certification effect.
+有效地产生大间距分类的原因
+
+459
+00:17:10,700 --> 00:17:11,620
+Mainly, if you look
+看这条绿线
+
+460
+00:17:11,820 --> 00:17:13,250
+at this green line, if you look at this green
+这个绿色的决策界
+
+461
+00:17:13,490 --> 00:17:14,990
+hypothesis we want the
+我们希望
+
+462
+00:17:15,070 --> 00:17:16,250
+projections of my positive
+正样本和负样本投影到
+
+463
+00:17:17,190 --> 00:17:18,780
+and negative examples onto theta to be large, and
+θ 的值大
+
+464
+00:17:19,200 --> 00:17:20,360
+the only way for that to
+要做到这一点
+
+465
+00:17:20,710 --> 00:17:23,490
+hold true this is if surrounding the green line.
+的唯一方式就是选择这条绿线做决策界
+
+466
+00:17:24,950 --> 00:17:27,710
+There's this large margin, there's
+这是大间距决策界
+
+467
+00:17:27,880 --> 00:17:31,460
+this large gap that separates
+来区分开
+
+468
+00:17:33,970 --> 00:17:37,240
+positive and negative examples is
+正样本和负样本
+
+469
+00:17:38,020 --> 00:17:40,740
+really the magnitude of this gap.
+这个间距的值
+
+470
+00:17:41,080 --> 00:17:42,050
+The magnitude of this margin
+这个间距的值
+
+471
+00:17:43,040 --> 00:17:44,900
+is exactly the values of
+就是p(1) p(2) p(3)
+
+472
+00:17:45,060 --> 00:17:47,730
+P1, P2, P3 and so on.
+等等的值
+
+473
+00:17:47,890 --> 00:17:48,970
+And so by making the margin
+通过让间距变大
+
+474
+00:17:49,480 --> 00:17:51,270
+large, by these tyros P1,
+通过这些p(1) p(2) p(3) 等等
+
+475
+00:17:51,470 --> 00:17:53,650
+P2, P3 and so on that's
+的值 支持向量机
+
+476
+00:17:53,830 --> 00:17:55,520
+the SVM can end up with
+最终可以找到
+
+477
+00:17:55,670 --> 00:17:56,860
+a smaller value for the
+一个较小的 θ 范数
+
+478
+00:17:56,960 --> 00:17:59,450
+norm of theta which is what it is trying to do in the objective.
+这正是支持向量机中最小化目标函数的目的
+
+479
+00:18:00,250 --> 00:18:01,290
+And this is why this
+以上就是为什么
+
+480
+00:18:01,960 --> 00:18:03,300
+machine ends up with enlarge
+支持向量机最终会找到
+
+481
+00:18:03,790 --> 00:18:05,510
+margin classifiers because itss
+大间距分类器的原因
+
+482
+00:18:05,640 --> 00:18:07,570
+trying to maximize the norm
+因为它试图极大化这些
+
+483
+00:18:07,720 --> 00:18:08,910
+of these P1 which is the distance from
+p(i) 的范数 它们是
+
+484
+00:18:09,060 --> 00:18:10,450
+the training examples to the decision boundary.
+训练样本到决策边界的距离
+
+485
+00:18:14,360 --> 00:18:16,450
+Finally, we did this whole derivation
+最后一点 我们的推导
+
+486
+00:18:17,200 --> 00:18:18,590
+using this simplification that the
+自始至终使用了这个简化假设
+
+487
+00:18:18,750 --> 00:18:21,150
+parameter theta 0 must be equal to 0.
+就是参数 θ0 等于0
+
+488
+00:18:21,860 --> 00:18:23,440
+The effect of that as
+就像我之前提到的
+
+489
+00:18:23,560 --> 00:18:25,380
+I mentioned briefly, is that if
+这个的作用是
+
+490
+00:18:25,540 --> 00:18:26,560
+theta 0 is equal to
+θ0 等于 0
+
+491
+00:18:26,830 --> 00:18:28,280
+0 what that means
+的意思是
+
+492
+00:18:28,460 --> 00:18:29,770
+is that we are entertaining decision
+我们让决策界
+
+493
+00:18:30,200 --> 00:18:31,510
+boundaries that pass through the
+通过原点
+
+494
+00:18:31,750 --> 00:18:33,640
+origins of decision boundaries pass through
+让决策界通过原点
+
+495
+00:18:33,800 --> 00:18:35,510
+the origin like that, if you
+就像这样
+
+496
+00:18:35,710 --> 00:18:37,980
+allow theta zero to
+如果你令
+
+497
+00:18:38,080 --> 00:18:39,540
+be non 0 then what
+θ0 不是0的话
+
+498
+00:18:39,870 --> 00:18:41,190
+that means is that you entertain the
+含义就是你希望
+
+499
+00:18:41,380 --> 00:18:43,120
+decision boundaries that did not
+决策界不通过原点
+
+500
+00:18:43,390 --> 00:18:45,730
+cross through the origin, like that one I just drew.
+比如这样
+
+501
+00:18:46,380 --> 00:18:47,940
+And I'm not going to do
+我将不会做全部的推导
+
+502
+00:18:48,010 --> 00:18:49,540
+the full derivation that. It
+实际上
+
+503
+00:18:49,650 --> 00:18:50,600
+turns out that this same
+支持向量机产生大间距分类器的结论
+
+504
+00:18:51,060 --> 00:18:52,720
+large margin proof works in
+会被证明同样成立
+
+505
+00:18:52,780 --> 00:18:54,240
+pretty much in exactly the same way.
+证明方式是非常类似的
+
+506
+00:18:54,390 --> 00:18:56,100
+And there's a generalization of this
+是我们刚刚做的
+
+507
+00:18:56,850 --> 00:18:57,830
+argument that we just went through
+证明的推广
+
+508
+00:18:58,030 --> 00:18:59,400
+them long ago through that shows
+之前视频中说过
+
+509
+00:18:59,870 --> 00:19:01,540
+that even when theta
+即便 θ0
+
+510
+00:19:01,840 --> 00:19:03,690
+0 is non 0, what
+不等于0
+
+511
+00:19:03,960 --> 00:19:06,940
+the SVM is trying to do when you have this optimization objective.
+支持向量机要做的事情都是优化这个目标函数
+
+512
+00:19:08,200 --> 00:19:09,620
+Which again corresponds to the
+对应着
+
+513
+00:19:09,720 --> 00:19:11,570
+case of when C is very large.
+C值非常大的情况
+
+514
+00:19:14,010 --> 00:19:15,110
+But it is possible to
+但是可以说明的是
+
+515
+00:19:15,290 --> 00:19:16,510
+show that, you know, when theta
+即便 θ0 不等于 0
+
+516
+00:19:16,810 --> 00:19:18,420
+is not equal to 0 this
+支持向量机
+
+517
+00:19:18,620 --> 00:19:19,750
+support vector machine is still
+仍然会
+
+518
+00:19:20,100 --> 00:19:21,360
+finding is really trying
+找到
+
+519
+00:19:21,640 --> 00:19:22,650
+to find the large margin
+正样本和负样本之间的
+
+520
+00:19:23,040 --> 00:19:24,060
+separator that between the positive and negative
+大间距分隔
+
+521
+00:19:24,630 --> 00:19:28,200
+examples. So that
+总之
+
+522
+00:19:28,420 --> 00:19:31,060
+explains how this support vector machine is a large margin classifier.
+我们解释了为什么支持向量机是一个大间距分类器
+
+523
+00:19:32,920 --> 00:19:34,020
+In the next video we
+在下一节我们
+
+524
+00:19:34,190 --> 00:19:35,080
+will start to talk about how
+将开始讨论
+
+525
+00:19:35,400 --> 00:19:36,480
+to take some of these
+如何利用支持向量机的原理
+
+526
+00:19:36,710 --> 00:19:37,980
+SVM ideas and start to
+应用它们
+
+527
+00:19:38,190 --> 00:19:39,200
+apply them to build a complex
+建立一个复杂的
+
+528
+00:19:39,900 --> 00:19:41,370
+nonlinear classifiers.
+非线性分类器
+
diff --git a/srt/12 - 4 - Kernels I (16 min).srt b/srt/12 - 4 - Kernels I (16 min).srt
new file mode 100644
index 00000000..dca172e2
--- /dev/null
+++ b/srt/12 - 4 - Kernels I (16 min).srt
@@ -0,0 +1,2140 @@
+1
+00:00:00,080 --> 00:00:01,140
+In this video, I'd like
+在这次的课程中(字幕整理:中国海洋大学,黄海广,haiguang2000@qq.com)
+
+2
+00:00:01,370 --> 00:00:03,120
+to start adapting support vector
+我将对支持向量机算法做一些改变
+
+3
+00:00:03,390 --> 00:00:06,280
+machines in order to develop complex nonlinear classifiers.
+以构造复杂的非线性分类器
+
+4
+00:00:07,630 --> 00:00:10,410
+The main technique for doing that is something called kernels.
+我们用"Kernels(核函数)"来达到此目的
+
+5
+00:00:11,730 --> 00:00:13,690
+Let's see what this kernels are and how to use them.
+我们来看看核函数是什么 以及如何使用
+
+6
+00:00:15,860 --> 00:00:16,930
+If you have a training set that
+如果
+
+7
+00:00:17,030 --> 00:00:18,270
+looks like this, and you
+你有一个像这样的训练集
+
+8
+00:00:18,400 --> 00:00:20,000
+want to find a
+然后你希望拟合一个
+
+9
+00:00:20,150 --> 00:00:21,670
+nonlinear decision boundary to distinguish
+非线性的判别边界
+
+10
+00:00:22,270 --> 00:00:23,950
+the positive and negative examples, maybe
+来区别正负实例
+
+11
+00:00:24,350 --> 00:00:25,900
+a decision boundary that looks like that.
+可能是这样的一个判别边界
+
+12
+00:00:27,040 --> 00:00:27,950
+One way to do so is
+一种办法
+
+13
+00:00:28,230 --> 00:00:29,760
+to come up with a set
+是构造
+
+14
+00:00:29,970 --> 00:00:32,180
+of complex polynomial features, right? So, set of
+多项式特征变量 是吧
+
+15
+00:00:32,340 --> 00:00:33,420
+features that looks like this,
+也就是像这样的特征变量集合
+
+16
+00:00:34,140 --> 00:00:34,990
+so that you end up
+这样
+
+17
+00:00:35,140 --> 00:00:37,120
+with a hypothesis X that
+你就能得到一个假设X
+
+18
+00:00:38,050 --> 00:00:40,380
+predicts 1 if you know
+如果θ0加上θ1*X1
+
+19
+00:00:40,570 --> 00:00:41,790
+that theta 0 and plus theta 1 X1
+加上其他的多项式特征变量
+
+20
+00:00:41,860 --> 00:00:45,000
+plus dot dot dot all those polynomial features is
+之和大于0
+
+21
+00:00:45,180 --> 00:00:47,410
+greater than 0, and
+那么就预测为1
+
+22
+00:00:47,540 --> 00:00:49,170
+predict 0, otherwise.
+反之 则预测为0
+
+23
+00:00:51,070 --> 00:00:52,760
+And another way
+这种方法
+
+24
+00:00:52,980 --> 00:00:54,330
+of writing this, to introduce
+的另一种写法
+
+25
+00:00:54,840 --> 00:00:56,240
+a level of new notation that
+这里介绍一个新的概念
+
+26
+00:00:56,500 --> 00:00:57,860
+I'll use later, is that
+之后将会用到
+
+27
+00:00:58,200 --> 00:00:59,370
+we can think of a hypothesis
+我们可以把假设函数
+
+28
+00:00:59,730 --> 00:01:01,610
+as computing a decision boundary
+看成是用这个
+
+29
+00:01:02,120 --> 00:01:03,380
+using this. So, theta
+来计算判别边界 那么
+
+30
+00:01:03,820 --> 00:01:04,870
+0 plus theta 1 f1 plus
+θ0+θ1*f1+
+
+31
+00:01:05,070 --> 00:01:06,130
+theta 2, f2 plus theta
+θ2*f2+θ3*f3
+
+32
+00:01:06,610 --> 00:01:08,730
+3, f3 plus and so on.
+加上其他项
+
+33
+00:01:09,590 --> 00:01:12,790
+Where I'm going to
+在这里
+
+34
+00:01:13,050 --> 00:01:14,070
+use this new denotation
+我将用这几个新的符号
+
+35
+00:01:14,730 --> 00:01:15,930
+f1, f2, f3 and so
+f1 f2 f3等等
+
+36
+00:01:16,270 --> 00:01:17,610
+on to denote these new sort of features
+来表示一系列我将要计算的
+
+37
+00:01:19,350 --> 00:01:20,630
+that I'm computing, so f1 is
+新的特征变量
+
+38
+00:01:21,370 --> 00:01:24,250
+just X1, f2 is equal
+因此 f1就等于X1
+
+39
+00:01:24,600 --> 00:01:27,060
+to X2, f3 is
+f2就等于X2
+
+40
+00:01:27,140 --> 00:01:28,560
+equal to this one
+f3等于这个
+
+41
+00:01:28,770 --> 00:01:29,790
+here. So, X1X2. So,
+X1X2
+
+42
+00:01:29,900 --> 00:01:32,200
+f4 is equal to
+f4等于X1的平方
+
+43
+00:01:33,840 --> 00:01:35,590
+X1 squared where f5 is
+f5等于X2的平方
+
+44
+00:01:35,680 --> 00:01:37,740
+to be x2 squared and so
+等等
+
+45
+00:01:38,520 --> 00:01:39,780
+on and we seen previously that
+我们之前看到
+
+46
+00:01:40,350 --> 00:01:41,190
+coming up with these high
+通过加入这些
+
+47
+00:01:41,370 --> 00:01:42,870
+order polynomials is one
+高阶项
+
+48
+00:01:43,110 --> 00:01:44,390
+way to come up with lots more features,
+我们可以得到更多特征变量
+
+49
+00:01:45,470 --> 00:01:47,070
+the question is, is
+问题是
+
+50
+00:01:47,250 --> 00:01:48,600
+there a different choice of
+能不能选择别的特征变量
+
+51
+00:01:48,670 --> 00:01:51,350
+features or is there better sort of features than this high order
+或者有没有比这些高阶项更好的特征变量
+
+52
+00:01:51,690 --> 00:01:53,510
+polynomials because you know
+因为
+
+53
+00:01:53,830 --> 00:01:54,820
+it's not clear that this high
+我们并不知道
+
+54
+00:01:55,120 --> 00:01:56,350
+order polynomial is what we want,
+这些高阶项是不是我们真正需要的
+
+55
+00:01:56,860 --> 00:01:57,920
+and what we talked about
+我们之前谈到
+
+56
+00:01:58,170 --> 00:01:59,560
+computer vision talk about when
+计算机视觉的时候
+
+57
+00:01:59,780 --> 00:02:01,940
+the input is an image with lots of pixels.
+提到过这时的输入是一个有很多像素的图像
+
+58
+00:02:02,540 --> 00:02:04,670
+We also saw how using high order polynomials
+我们看到如果用高阶项作为特征变量
+
+59
+00:02:05,140 --> 00:02:06,360
+becomes very computationally
+运算量将是非常大的
+
+60
+00:02:07,320 --> 00:02:08,270
+expensive because there are
+因为
+
+61
+00:02:08,280 --> 00:02:09,830
+a lot of these higher order polynomial terms.
+有太多的高阶项需要被计算
+
+62
+00:02:11,240 --> 00:02:12,280
+So, is there a different or
+因此 我们是否有不同的选择
+
+63
+00:02:12,430 --> 00:02:13,160
+a better choice of the features
+或者是更好的选择来构造特征变量
+
+64
+00:02:14,110 --> 00:02:15,100
+that we can use to plug
+以用来
+
+65
+00:02:15,410 --> 00:02:16,770
+into this sort of
+嵌入到
+
+66
+00:02:17,500 --> 00:02:19,200
+hypothesis form.
+假设函数中
+
+67
+00:02:19,420 --> 00:02:20,470
+So, here is one idea for how to
+事实上 这里有一个可以构造
+
+68
+00:02:20,580 --> 00:02:23,580
+define new features f1, f2, f3.
+新特征f1 f2 f3的想法
+
+69
+00:02:24,970 --> 00:02:25,930
+On this line I am
+在这一行中
+
+70
+00:02:26,100 --> 00:02:27,600
+going to define only three new
+我只定义三个
+
+71
+00:02:27,890 --> 00:02:28,770
+features, but for real problems
+特征变量 但是对于实际问题而言
+
+72
+00:02:29,500 --> 00:02:30,650
+we can get to define a much larger number.
+我们可以定义非常多的特征变量
+
+73
+00:02:31,060 --> 00:02:32,060
+But here's what I'm going to do
+但是在这里
+
+74
+00:02:32,260 --> 00:02:33,400
+in this phase
+对于这里的
+
+75
+00:02:33,640 --> 00:02:34,980
+of features X1, X2, and
+特征X1 X2
+
+76
+00:02:35,400 --> 00:02:36,520
+I'm going to leave X0
+我不打算
+
+77
+00:02:36,720 --> 00:02:37,800
+out of this, the
+把X0放在这里
+
+78
+00:02:38,060 --> 00:02:39,230
+interceptor X0, but
+截距X0
+
+79
+00:02:39,330 --> 00:02:40,320
+in this phase X1 X2, I'm going to just,
+但是这里的X1 X2
+
+80
+00:02:42,550 --> 00:02:43,560
+you know, manually pick a few points, and then
+我打算手动选取一些点
+
+81
+00:02:43,750 --> 00:02:45,210
+call these points l1, we
+然后将这些点定义为l1
+
+82
+00:02:45,450 --> 00:02:46,720
+are going to pick
+再选一个
+
+83
+00:02:46,820 --> 00:02:49,560
+a different point, let's call
+不同的点
+
+84
+00:02:50,080 --> 00:02:51,390
+that l2 and let's pick
+把它定为l2
+
+85
+00:02:51,710 --> 00:02:52,880
+the third one and call
+再选第三个点
+
+86
+00:02:53,170 --> 00:02:55,800
+this one l3, and for
+定为l3
+
+87
+00:02:55,900 --> 00:02:56,830
+now let's just say that I'm
+现在 假设我打算
+
+88
+00:02:56,930 --> 00:02:59,220
+going to choose these three points manually.
+只手动选取三个点
+
+89
+00:02:59,870 --> 00:03:02,860
+I'm going to call these three points line ups, so line up one, two, three.
+将这三个点作为标记,标记1,标记2,标记3
+
+90
+00:03:03,720 --> 00:03:04,630
+What I'm going to do is
+我将要做的是
+
+91
+00:03:04,790 --> 00:03:07,190
+define my new features as follows, given
+这样定义新的特征变量
+
+92
+00:03:07,510 --> 00:03:10,070
+an example X, let me
+给出一个实例X
+
+93
+00:03:10,170 --> 00:03:13,130
+define my first feature f1
+将第一个特征变量f1
+
+94
+00:03:13,330 --> 00:03:16,010
+to be some
+定义为
+
+95
+00:03:16,260 --> 00:03:18,960
+measure of the similarity between
+一种相似度的度量
+
+96
+00:03:19,330 --> 00:03:21,460
+my training example X and
+度量实例X与
+
+97
+00:03:21,680 --> 00:03:26,270
+my first landmark and
+第一个标记的相似度
+
+98
+00:03:26,520 --> 00:03:27,840
+this specific formula that I'm
+我将要用来度量相似度的
+
+99
+00:03:27,950 --> 00:03:29,600
+going to use to measure similarity is
+这个公式
+
+100
+00:03:30,160 --> 00:03:31,830
+going to be this is E to
+是这样的 对括号的内容取exp
+
+101
+00:03:31,940 --> 00:03:34,220
+the minus the length of
+负号 X-l1的长度
+
+102
+00:03:34,470 --> 00:03:37,880
+X minus l1, squared, divided
+平方
+
+103
+00:03:38,320 --> 00:03:39,610
+by two sigma squared.
+除以2倍的σ平方
+
+104
+00:03:40,730 --> 00:03:41,640
+So, depending on whether or not
+取决于你之前是否看了
+
+105
+00:03:41,780 --> 00:03:43,420
+you watched the previous optional video,
+上一个可选视频
+
+106
+00:03:44,390 --> 00:03:48,140
+this notation, you know, this is
+这个记号表示
+
+107
+00:03:48,460 --> 00:03:49,340
+the length of the vector
+向量W的长度
+
+108
+00:03:49,680 --> 00:03:51,260
+W. And so, this thing
+因此
+
+109
+00:03:51,460 --> 00:03:53,760
+here, this X
+这里的
+
+110
+00:03:54,020 --> 00:03:55,990
+minus l1, this is
+X-l1
+
+111
+00:03:56,100 --> 00:03:57,440
+actually just the euclidean distance
+就是欧式距离取平方
+
+112
+00:03:58,610 --> 00:03:59,950
+squared, is the euclidean
+是点x与l1之间的
+
+113
+00:04:00,410 --> 00:04:03,240
+distance between the point x and the landmark l1.
+欧式距离
+
+114
+00:04:03,530 --> 00:04:04,610
+We will see more about this later.
+我们之后会更多地谈到这个
+
+115
+00:04:06,440 --> 00:04:07,990
+But that's my first feature, and
+这是我的第一个特征向量
+
+116
+00:04:08,120 --> 00:04:09,610
+my second feature f2 is
+然后是f2
+
+117
+00:04:09,750 --> 00:04:11,750
+going to be, you know,
+它等于
+
+118
+00:04:12,370 --> 00:04:14,040
+similarity function that measures
+对x和l2使用相似度函数
+
+119
+00:04:14,400 --> 00:04:17,310
+how similar X is to l2 and the game is going to be defined as
+度量x与l2的相似度
+
+120
+00:04:17,370 --> 00:04:19,360
+the following function.
+这个相似度函数同上
+
+121
+00:04:25,970 --> 00:04:27,320
+This is E to the minus of the square of the euclidean distance
+对如下值取exp
+
+122
+00:04:28,150 --> 00:04:29,050
+between X and the second
+X到第二个标记之间的欧式距离
+
+123
+00:04:29,820 --> 00:04:31,310
+landmark, that is what the enumerator is and
+这是分子
+
+124
+00:04:31,510 --> 00:04:32,660
+then divided by 2 sigma squared
+再除以2倍的σ平方
+
+125
+00:04:33,520 --> 00:04:35,280
+and similarly f3 is, you know,
+类似的 f3
+
+126
+00:04:35,850 --> 00:04:39,480
+similarity between X
+等于X与l3之间的
+
+127
+00:04:39,840 --> 00:04:41,860
+and l3, which is
+相似度
+
+128
+00:04:41,980 --> 00:04:44,510
+equal to, again, similar formula.
+公式同上
+
+129
+00:04:46,550 --> 00:04:48,070
+And what this similarity
+这个相似度函数是
+
+130
+00:04:48,830 --> 00:04:50,440
+function is, the mathematical term
+用数学术语来说
+
+131
+00:04:50,730 --> 00:04:52,030
+for this, is that this is
+它就是
+
+132
+00:04:52,160 --> 00:04:54,390
+going to be a kernel function.
+核函数
+
+133
+00:04:55,340 --> 00:04:56,810
+And the specific kernel I'm using
+这里我所说的核函数
+
+134
+00:04:57,140 --> 00:04:59,570
+here, this is actually called a Gaussian kernel.
+实际上是高斯核函数
+
+135
+00:05:00,630 --> 00:05:01,920
+And so this formula, this particular
+因此这个公式
+
+136
+00:05:02,500 --> 00:05:04,990
+choice of similarity function is called a Gaussian kernel.
+我们选择的这个相似度公式是高斯核函数
+
+137
+00:05:05,770 --> 00:05:07,220
+But the way the terminology goes is that, you know, in
+但是这个术语
+
+138
+00:05:07,360 --> 00:05:09,110
+the abstract these different
+其实概括了
+
+139
+00:05:09,600 --> 00:05:11,270
+similarity functions are called kernels and
+许多不同的相似度函数
+
+140
+00:05:11,600 --> 00:05:12,670
+we can have different similarity functions
+它们都称作核函数
+
+141
+00:05:13,750 --> 00:05:16,410
+and the specific example I'm giving here is called the Gaussian kernel.
+而我用的这个特定例子是高斯核函数
+
+142
+00:05:17,110 --> 00:05:18,400
+We'll see other examples of other kernels.
+之后我们会见到别的核函数
+
+143
+00:05:18,840 --> 00:05:21,100
+But for now just think of these as similarity functions.
+但是现在就把这个当做相似度函数
+
+144
+00:05:22,470 --> 00:05:24,100
+And so, instead of writing similarity between
+我们通常不需要写
+
+145
+00:05:24,500 --> 00:05:26,270
+X and l, sometimes we
+X和L的相似度
+
+146
+00:05:26,480 --> 00:05:28,380
+also write this a kernel denoted
+有时我们就直接这样写
+
+147
+00:05:29,070 --> 00:05:32,360
+you know, lower case k between x and one of my landmarks all right.
+小写的k 括号里是x和标记l
+
+148
+00:05:34,120 --> 00:05:36,120
+So let's see what a
+现在
+
+149
+00:05:36,650 --> 00:05:38,480
+criminals actually do and
+我们来看看核函数到底可以做什么
+
+150
+00:05:38,810 --> 00:05:40,640
+why these sorts of similarity
+为什么这些相似度函数
+
+151
+00:05:41,280 --> 00:05:44,540
+functions, why these expressions might make sense.
+这些表达式是正确的
+
+152
+00:05:46,690 --> 00:05:48,020
+So let's take my first landmark. My
+先来看看我们的第一个标记
+
+153
+00:05:48,330 --> 00:05:49,230
+landmark l1, which is
+标记l1
+
+154
+00:05:49,350 --> 00:05:51,370
+one of those points I chose on my figure just now.
+l1是我之前在图中选取的几个点中的其中一个
+
+155
+00:05:53,000 --> 00:05:54,160
+So the similarity of the kernel between x and l1 is given by this expression.
+因此x和l1之间的核函数相似度是这样表达的
+
+156
+00:05:57,530 --> 00:05:58,600
+Just to make sure, you know, we
+为了保证
+
+157
+00:05:58,690 --> 00:05:59,600
+are on the same page about what
+你知道
+
+158
+00:05:59,780 --> 00:06:01,860
+the numerator term is, the
+这个分子项是什么
+
+159
+00:06:01,960 --> 00:06:03,140
+numerator can also be
+这个分子也可以
+
+160
+00:06:03,330 --> 00:06:04,620
+written as a sum from
+写为
+
+161
+00:06:04,880 --> 00:06:06,470
+J equals 1 through N on sort of the distance.
+对这个距离求和 j从1到n
+
+162
+00:06:07,000 --> 00:06:08,700
+So this is the component wise distance
+这是向量X和l
+
+163
+00:06:09,270 --> 00:06:10,900
+between the vector X and
+各分量之间的距离
+
+164
+00:06:11,070 --> 00:06:12,050
+the vector l. And again
+再次地
+
+165
+00:06:12,380 --> 00:06:14,460
+for the purpose of these
+在这几张幻灯片中
+
+166
+00:06:14,720 --> 00:06:16,180
+slides I'm ignoring X0.
+我忽略了X0
+
+167
+00:06:16,680 --> 00:06:17,910
+So just ignoring the intercept
+因此我们暂时先不管截距项X0
+
+168
+00:06:18,220 --> 00:06:19,960
+term X0, which is always equal to 1.
+X0总是等于1
+
+169
+00:06:21,430 --> 00:06:22,470
+So, you know, this is
+那么 你现在明白
+
+170
+00:06:22,630 --> 00:06:25,780
+how you compute the kernel with similarity between X and a landmark.
+这就是你通过计算X和标记之间的相似度得到的核函数
+
+171
+00:06:27,270 --> 00:06:28,200
+So let's see what this function does.
+让我们来看看这个函数计算的是什么
+
+172
+00:06:29,110 --> 00:06:31,870
+Suppose X is close to one of the landmarks.
+假设X与其中一个标记点非常接近
+
+173
+00:06:33,320 --> 00:06:34,910
+Then this euclidean distance
+那么这个欧式距离
+
+174
+00:06:35,360 --> 00:06:36,690
+formula and the numerator will
+以及这个分子
+
+175
+00:06:36,990 --> 00:06:38,770
+be close to 0, right.
+就会接近于0 对吧
+
+176
+00:06:38,890 --> 00:06:40,070
+So, that is this term
+这是因为
+
+177
+00:06:40,580 --> 00:06:41,880
+here, the distance was great,
+这里的这个项 是距离的平方
+
+178
+00:06:42,170 --> 00:06:43,130
+the distance using X and 0
+X到l的距离
+
+179
+00:06:43,240 --> 00:06:45,130
+will be close to zero, and so
+接近于0
+
+180
+00:06:46,390 --> 00:06:47,440
+f1, this is a simple
+因此f1
+
+181
+00:06:47,710 --> 00:06:50,100
+feature, will be approximately E
+这个特征变量约等于
+
+182
+00:06:50,290 --> 00:06:52,760
+to the minus 0 and
+对-0取exp
+
+183
+00:06:52,800 --> 00:06:54,650
+then the numerator squared over 2 is equal to squared
+然后除以2倍的σ平方
+
+184
+00:06:55,650 --> 00:06:56,670
+so that E to the
+因此对0取exp
+
+185
+00:06:56,770 --> 00:06:58,070
+0, E to minus 0,
+对-0取exp
+
+186
+00:06:58,370 --> 00:06:59,810
+E to 0 is going to be close to one.
+约等于1
+
+187
+00:07:01,640 --> 00:07:03,480
+And I'll put the approximation symbol here
+我把约等号放在这里
+
+188
+00:07:03,700 --> 00:07:05,430
+because the distance may
+是因为这个距离
+
+189
+00:07:05,530 --> 00:07:06,930
+not be exactly 0, but
+不是严格地等于0
+
+190
+00:07:07,120 --> 00:07:08,040
+if X is closer to landmark
+但是X越接近于L
+
+191
+00:07:08,340 --> 00:07:09,190
+this term will be close
+那么这个项就会越接近于0
+
+192
+00:07:09,440 --> 00:07:12,070
+to 0 and so f1 would be close 1.
+因此f1越接近于1
+
+193
+00:07:13,400 --> 00:07:15,220
+Conversely, if X is
+相反地
+
+194
+00:07:15,520 --> 00:07:17,350
+far from 01 then this
+如果X离L1越远
+
+195
+00:07:17,550 --> 00:07:18,940
+first feature f1 will
+那么f1
+
+196
+00:07:19,820 --> 00:07:21,190
+be E to the minus
+就等于对一个非常大的数字
+
+197
+00:07:21,540 --> 00:07:24,040
+of some large number squared,
+的平方除以2倍σ平方
+
+198
+00:07:24,960 --> 00:07:25,980
+divided divided by two sigma
+再取exp
+
+199
+00:07:26,260 --> 00:07:27,690
+squared and E to
+然后
+
+200
+00:07:27,810 --> 00:07:28,800
+the minus of a large number
+对一个负的大数字取exp
+
+201
+00:07:29,630 --> 00:07:31,450
+is going to be close to 0.
+接近于0
+
+202
+00:07:33,320 --> 00:07:34,610
+So what these
+因此
+
+203
+00:07:34,750 --> 00:07:36,080
+features do is they measure how
+这些特征变量的作用是度量
+
+204
+00:07:36,290 --> 00:07:37,500
+similar X is from one
+X到标记L的相似度
+
+205
+00:07:37,670 --> 00:07:39,160
+of your landmarks and the feature
+并且
+
+206
+00:07:39,530 --> 00:07:40,290
+f is going to be close
+如果X离L非常相近
+
+207
+00:07:40,540 --> 00:07:42,360
+to one when X is
+那么特征变量f
+
+208
+00:07:42,540 --> 00:07:43,810
+close to your landmark and is
+就接近于1
+
+209
+00:07:44,020 --> 00:07:45,310
+going to be 0 or close
+如果X
+
+210
+00:07:45,380 --> 00:07:46,520
+to zero when X is
+离标记L非常远
+
+211
+00:07:46,790 --> 00:07:48,850
+far from your landmark.
+那么f会约等于0
+
+212
+00:07:49,320 --> 00:07:49,980
+Each of these landmarks.
+之前我所画的
+
+213
+00:07:50,590 --> 00:07:51,620
+On the previous line, I drew
+那几个标记点
+
+214
+00:07:52,250 --> 00:07:54,260
+three landmarks, l1, l2,l3.
+l1 l2 l3
+
+215
+00:07:56,190 --> 00:08:00,030
+Each of these landmarks, defines a new feature
+每一个标记点会定义一个新的特征变量
+
+216
+00:08:00,660 --> 00:08:02,270
+f1, f2 and f3.
+f1 f2 f3
+
+217
+00:08:02,680 --> 00:08:03,660
+That is, given the the
+也就是说
+
+218
+00:08:03,710 --> 00:08:05,160
+training example X, we can
+给出一个训练实例X
+
+219
+00:08:05,380 --> 00:08:06,750
+now compute three new
+我们就能计算三个新的特征变量
+
+220
+00:08:06,930 --> 00:08:08,720
+features: f1, f2, and
+f1 f2和f3
+
+221
+00:08:09,520 --> 00:08:11,010
+f3, given, you know, the three
+基于我之前给的
+
+222
+00:08:11,340 --> 00:08:13,530
+landmarks that I wrote just now.
+三个标记点
+
+223
+00:08:13,760 --> 00:08:15,030
+But first, let's look
+但是首先
+
+224
+00:08:15,240 --> 00:08:16,450
+at this exponentiation function, let's look
+我们来看看这个指数函数
+
+225
+00:08:16,710 --> 00:08:18,190
+at this similarity function and plot
+我们来看看这个相似度函数
+
+226
+00:08:18,570 --> 00:08:20,790
+in some figures and just, you know, understand
+我们画一些图
+
+227
+00:08:21,230 --> 00:08:22,460
+better what this really looks like.
+来更好地理解这些函数是什么样的
+
+228
+00:08:23,510 --> 00:08:26,320
+For this example, let's say I have two features X1 and X2.
+比如 假设我们有两个特征变量X1和X2
+
+229
+00:08:26,570 --> 00:08:27,430
+And let's say my first
+假设我们的第一个标记点
+
+230
+00:08:27,820 --> 00:08:29,290
+landmark, l1 is at
+l1
+
+231
+00:08:29,520 --> 00:08:32,550
+a location, 3 5. So
+位于(3,5)
+
+232
+00:08:33,650 --> 00:08:35,750
+and let's say I set sigma squared equals one for now.
+假设σ的平方等于1
+
+233
+00:08:36,500 --> 00:08:37,550
+If I plot what this feature
+如果我画出图
+
+234
+00:08:37,890 --> 00:08:40,420
+looks like, what I get is this figure.
+就是这样的
+
+235
+00:08:41,210 --> 00:08:42,510
+So the vertical axis, the height
+这个纵轴
+
+236
+00:08:42,760 --> 00:08:44,030
+of the surface is the value
+这个曲面的高度是
+
+237
+00:08:45,240 --> 00:08:46,280
+of f1 and down here
+f1的值
+
+238
+00:08:46,630 --> 00:08:48,490
+on the horizontal axis are, if
+再看看水平的坐标
+
+239
+00:08:48,710 --> 00:08:50,580
+I have some training example, and there
+如果我把训练实例画在这里
+
+240
+00:08:51,660 --> 00:08:53,050
+is x1 and there is x2.
+这是X1 这是X2
+
+241
+00:08:53,320 --> 00:08:54,940
+Given a certain training example, the
+给出一个特定的训练实例
+
+242
+00:08:55,120 --> 00:08:56,890
+training example here which shows
+选这里的一个实例
+
+243
+00:08:56,980 --> 00:08:58,140
+the value of x1 and x2
+可以看到X1和X2的值
+
+244
+00:08:58,140 --> 00:08:59,390
+at a height above the surface,
+这个高度
+
+245
+00:08:59,950 --> 00:09:02,220
+shows the corresponding value of
+可以看到这个f1相应的值
+
+246
+00:09:02,410 --> 00:09:03,830
+f1 and down below this is
+下面的这个图
+
+247
+00:09:03,960 --> 00:09:04,890
+the same figure I had showed,
+内容是一样的
+
+248
+00:09:05,040 --> 00:09:06,600
+using a quantifiable plot, with
+但我用的是一个等值线图
+
+249
+00:09:06,810 --> 00:09:08,320
+x1 on horizontal
+x1为水平轴
+
+250
+00:09:09,090 --> 00:09:10,340
+axis, x2 on horizontal
+x2为竖直轴
+
+251
+00:09:10,820 --> 00:09:12,500
+axis and so, this
+那么
+
+252
+00:09:12,820 --> 00:09:13,700
+figure on the bottom is just
+底下的这个图
+
+253
+00:09:13,940 --> 00:09:15,440
+a contour plot of the 3D surface.
+就是这个3D曲面的等值线图
+
+254
+00:09:16,540 --> 00:09:17,800
+You notice that when
+你会发现
+
+255
+00:09:18,030 --> 00:09:19,540
+X is equal to
+当X等于(3,5)的时候
+
+256
+00:09:19,820 --> 00:09:24,140
+3 5 exactly, then we
+这个时候
+
+257
+00:09:24,380 --> 00:09:25,680
+the f1 takes on the
+f1就等于1
+
+258
+00:09:25,760 --> 00:09:26,990
+value 1, because that's at
+因为
+
+259
+00:09:27,170 --> 00:09:29,400
+the maximum and X
+它在最大值上
+
+260
+00:09:29,860 --> 00:09:31,150
+moves away as X goes
+所以如果X往旁边移动
+
+261
+00:09:31,680 --> 00:09:33,650
+further away then this
+离这个点越远
+
+262
+00:09:33,860 --> 00:09:35,270
+feature takes on values
+那么从图中可以看到
+
+263
+00:09:36,460 --> 00:09:37,160
+that are close to 0.
+f1的值就越接近0
+
+264
+00:09:38,750 --> 00:09:40,120
+And so, this is really a feature,
+这就是特征变量f1
+
+265
+00:09:40,400 --> 00:09:42,100
+f1 measures, you know, how
+计算的内容
+
+266
+00:09:42,400 --> 00:09:43,680
+close X is to the first
+也就是X与第一个标记点
+
+267
+00:09:44,040 --> 00:09:46,050
+landmark and if
+的远近程度
+
+268
+00:09:46,520 --> 00:09:47,610
+varies between 0 and one
+这个值在0到1之间
+
+269
+00:09:47,790 --> 00:09:48,940
+depending on how close X
+具体取决于X
+
+270
+00:09:49,160 --> 00:09:50,650
+is to the first landmark l1.
+距离标记点l1到底有多近
+
+271
+00:09:52,360 --> 00:09:53,710
+Now the other was due on
+我在这张幻灯片上要讲的另一项内容是
+
+272
+00:09:53,920 --> 00:09:55,530
+this slide is show the effects
+我们可以看到改变σ平方的值
+
+273
+00:09:56,090 --> 00:09:59,740
+of varying this parameter sigma squared.
+能产生多大影响
+
+274
+00:10:00,040 --> 00:10:01,770
+So, sigma squared is the parameter of the
+σ平方是高斯核函数的参数
+
+275
+00:10:02,530 --> 00:10:04,120
+Gaussian kernel and as you vary it, you get slightly different effects.
+当你改变它的值的时 你会得到略微不同的结果
+
+276
+00:10:05,150 --> 00:10:06,380
+Let's set sigma squared to be
+假设我们让σ平方
+
+277
+00:10:06,650 --> 00:10:07,570
+equal to 0.5 and see
+等于0.5
+
+278
+00:10:07,710 --> 00:10:09,850
+what we get. We set sigma square to 0.5,
+看看我们能得到什么 将σ平方设为0.5
+
+279
+00:10:10,090 --> 00:10:11,170
+what you find is that the
+你会发现
+
+280
+00:10:11,430 --> 00:10:12,670
+kernel looks similar, except for the
+核函数看起来还是相似的
+
+281
+00:10:12,730 --> 00:10:14,200
+width of the bump becomes narrower.
+只是这个突起的宽度变窄了
+
+282
+00:10:14,790 --> 00:10:16,400
+The contours shrink a bit too.
+等值线图也收缩了一些
+
+283
+00:10:17,120 --> 00:10:18,360
+So if sigma squared equals to 0.5
+所以如果我们将σ平方设为0.5
+
+284
+00:10:18,740 --> 00:10:19,820
+then as you start
+我们从X=(3 5)
+
+285
+00:10:20,250 --> 00:10:21,650
+from X equals 3
+开始
+
+286
+00:10:21,910 --> 00:10:23,140
+5 and as you move away,
+往旁边移动
+
+287
+00:10:24,750 --> 00:10:26,370
+then the feature f1
+那么特征变量f1
+
+288
+00:10:27,050 --> 00:10:28,520
+falls to zero much more
+降到0的速度
+
+289
+00:10:28,730 --> 00:10:30,830
+rapidly and conversely,
+会变得很快 与此相反地
+
+290
+00:10:32,090 --> 00:10:33,930
+if you has increase since
+如果你增大了σ平方的值
+
+291
+00:10:34,670 --> 00:10:36,280
+where three in that
+我们假设σ平方等于3
+
+292
+00:10:36,510 --> 00:10:37,700
+case and as I
+在这个例子中
+
+293
+00:10:37,800 --> 00:10:39,090
+move away from, you know l. So
+如果我从点L往旁边移动
+
+294
+00:10:39,630 --> 00:10:40,770
+this point here is really
+这里的这个点
+
+295
+00:10:41,110 --> 00:10:42,410
+l, right, that's l1 is at
+就是L
+
+296
+00:10:42,610 --> 00:10:45,210
+location 3 5, right. So it's shown up here.
+l1所在的坐标为(3,5) 从这里可以看到
+
+297
+00:10:48,190 --> 00:10:49,480
+And if sigma squared is
+如果σ平方很大
+
+298
+00:10:49,660 --> 00:10:50,460
+large, then as you move
+那么
+
+299
+00:10:50,690 --> 00:10:54,040
+away from l1, the
+当你从点l1移走的时候
+
+300
+00:10:54,320 --> 00:10:56,170
+value of the feature falls
+特征变量的值减小的速度
+
+301
+00:10:56,740 --> 00:10:57,670
+away much more slowly.
+会变得比较慢
+
+302
+00:11:03,590 --> 00:11:05,200
+So, given this definition of
+因此 讲完了特征变量的定义
+
+303
+00:11:05,290 --> 00:11:06,730
+the features, let's see what
+我们来看看
+
+304
+00:11:06,960 --> 00:11:08,420
+source of hypothesis we can learn.
+我们能得到什么样的预测函数
+
+305
+00:11:09,550 --> 00:11:11,360
+Given the training example X, we
+给定一个训练实例X
+
+306
+00:11:11,480 --> 00:11:12,930
+are going to compute these features
+我们准备计算出三个特征变量
+
+307
+00:11:14,670 --> 00:11:16,360
+f1, f2, f3 and a
+f1 f2 f3
+
+308
+00:11:17,550 --> 00:11:18,980
+hypothesis is going to
+预测函数的预测值
+
+309
+00:11:19,040 --> 00:11:20,510
+predict one when theta 0
+会等于1 如果θ0加上
+
+310
+00:11:20,760 --> 00:11:22,050
+plus theta 1 f1 plus theta 2 f2,
+θ1*f1 加上 θ2*f2
+
+311
+00:11:22,330 --> 00:11:26,210
+and so on is greater than or equal to 0.
+等等的结果是大于或者等于0的
+
+312
+00:11:26,250 --> 00:11:27,100
+For this particular example, let's say
+对于这个特定的例子而言
+
+313
+00:11:27,290 --> 00:11:28,460
+that I've already found a learning
+假设我们已经找到了一个学习算法
+
+314
+00:11:28,620 --> 00:11:29,520
+algorithm and let's say that, you know,
+并且假设
+
+315
+00:11:30,190 --> 00:11:31,220
+somehow I ended up with
+我已经得到了
+
+316
+00:11:31,900 --> 00:11:32,880
+these values of the parameter.
+这些参数的值
+
+317
+00:11:33,510 --> 00:11:34,600
+So if theta 0 equals
+因此如果θ0等于-0.5
+
+318
+00:11:34,830 --> 00:11:36,010
+minus 0.5, theta 1 equals
+θ1等于1
+
+319
+00:11:36,390 --> 00:11:37,780
+1, theta 2 equals
+θ2等于1
+
+320
+00:11:38,180 --> 00:11:39,570
+1, and theta 3
+θ3等于0
+
+321
+00:11:40,370 --> 00:11:42,480
+equals 0 And what
+我想要做的是
+
+322
+00:11:42,720 --> 00:11:44,530
+I want to do is consider what
+我想要知道会发生什么
+
+323
+00:11:44,670 --> 00:11:46,100
+happens if we have a
+如果
+
+324
+00:11:46,200 --> 00:11:48,060
+training example that takes
+我们有一个训练实例
+
+325
+00:11:49,260 --> 00:11:51,710
+has location at this
+它的坐标在这里
+
+326
+00:11:52,510 --> 00:11:55,050
+magenta dot, right where I just drew this dot over here.
+这个红点 我画的这个点
+
+327
+00:11:55,380 --> 00:11:56,180
+So let's say I have a training
+假设我们有一个训练实例X
+
+328
+00:11:56,290 --> 00:11:58,690
+example X, what would my hypothesis predict?
+我想知道我的预测函数会给出怎样的预测结果
+
+329
+00:11:59,000 --> 00:12:01,430
+Well, If I look at this formula.
+看看这个公式
+
+330
+00:12:04,580 --> 00:12:05,890
+Because my training example X
+因为我的训练实例X
+
+331
+00:12:06,050 --> 00:12:07,820
+is close to l1, we have
+接近于L1
+
+332
+00:12:08,230 --> 00:12:10,190
+that f1 is going
+那么f1
+
+333
+00:12:10,250 --> 00:12:11,830
+to be close to 1 the because
+就接近于1
+
+334
+00:12:12,250 --> 00:12:13,200
+my training example X is
+又因为训练实例X
+
+335
+00:12:13,360 --> 00:12:15,050
+far from l2 and l3 I
+离L2 L3都很远
+
+336
+00:12:15,360 --> 00:12:16,880
+have that, you know, f2 would be close to
+所以 f2就接近于0
+
+337
+00:12:17,590 --> 00:12:20,500
+0 and f3 will be close to 0.
+f3也接近于0
+
+338
+00:12:21,550 --> 00:12:22,700
+So, if I look at
+所以
+
+339
+00:12:22,880 --> 00:12:23,970
+that formula, I have theta
+如果我们看看这个公式
+
+340
+00:12:24,230 --> 00:12:25,670
+0 plus theta 1
+θ0加上θ1
+
+341
+00:12:26,600 --> 00:12:29,970
+times 1 plus theta 2 times some value.
+乘以1加上θ2乘以某个值
+
+342
+00:12:30,510 --> 00:12:32,390
+Not exactly 0, but let's say close to 0.
+不是严格意义上等于0 但是接近于0
+
+343
+00:12:33,140 --> 00:12:36,400
+Then plus theta 3 times something close to 0.
+接着加上θ3乘以一个接近于0的值
+
+344
+00:12:37,480 --> 00:12:39,810
+And this is going to be equal to plugging in these values now.
+这个等于... 再把这些值代入进去
+
+345
+00:12:41,050 --> 00:12:43,470
+So, that gives minus 0.5
+这个是-0.5
+
+346
+00:12:44,160 --> 00:12:46,820
+plus 1 times 1 which is 1, and so on.
+加上1乘以1等于1 等等
+
+347
+00:12:46,960 --> 00:12:47,740
+Which is equal to 0.5 which is greater than or equal to 0.
+最后等于0.5 这个值大于等于0
+
+348
+00:12:48,000 --> 00:12:50,820
+So, at this point,
+因此 这个点
+
+349
+00:12:51,160 --> 00:12:54,280
+we're going to predict Y equals
+我们预测出的Y值是1
+
+350
+00:12:54,740 --> 00:12:57,320
+1, because that's greater than or equal to zero.
+因为大于等于0
+
+351
+00:12:58,910 --> 00:12:59,950
+Now let's take a different point.
+现在我们选择另一个不同的点
+
+352
+00:13:00,800 --> 00:13:02,100
+Now lets' say I take a
+假设
+
+353
+00:13:07,250 --> 00:13:08,470
+a point out there, if that
+这个点
+
+354
+00:13:08,710 --> 00:13:10,580
+were my training example X, then
+如果它是训练实例X
+
+355
+00:13:11,270 --> 00:13:12,190
+if you make a similar computation,
+如果你进行和之前相同的计算
+
+356
+00:13:12,950 --> 00:13:14,390
+you find that f1, f2,
+你发现f1 f2
+
+357
+00:13:15,420 --> 00:13:16,850
+Ff3 are all going to be close to 0.
+f3都接近于0
+
+358
+00:13:18,160 --> 00:13:19,910
+And so, we have theta
+因此 我们得到
+
+359
+00:13:20,240 --> 00:13:23,940
+0 plus theta 1, f1,
+θ0加上θ1*f1
+
+360
+00:13:24,230 --> 00:13:26,010
+plus so on and this
+加上其他项
+
+361
+00:13:26,200 --> 00:13:27,830
+will be about equal to
+最后的结果
+
+362
+00:13:28,020 --> 00:13:30,810
+minus 0.5, because theta
+会等于-0.5
+
+363
+00:13:31,170 --> 00:13:32,110
+0 is minus 0.5 and
+因为θ0等于-0.5
+
+364
+00:13:32,190 --> 00:13:33,920
+f1, f2, f3 are all zero.
+并且f1 f2 f3都为0
+
+365
+00:13:34,910 --> 00:13:37,510
+So this will be minus 0.5, this is less than zero.
+因此最后结果是-0.5 小于0
+
+366
+00:13:37,860 --> 00:13:38,910
+And so, at this
+因此
+
+367
+00:13:39,090 --> 00:13:40,220
+point out there, we're going to
+这个点
+
+368
+00:13:40,470 --> 00:13:42,010
+predict Y equals zero.
+我们预测的Y值是0
+
+369
+00:13:44,190 --> 00:13:45,100
+And if you do
+如果
+
+370
+00:13:45,270 --> 00:13:46,230
+this yourself for a range
+你自己来对大量的点
+
+371
+00:13:46,380 --> 00:13:47,460
+of different points, be sure to
+进行这样相应的处理
+
+372
+00:13:47,670 --> 00:13:48,660
+convince yourself that if you
+你应该可以确定
+
+373
+00:13:48,730 --> 00:13:50,340
+have a training example that's
+如果你有一个训练实例
+
+374
+00:13:50,890 --> 00:13:52,390
+close to L2, say,
+它非常接近于L2
+
+375
+00:13:52,970 --> 00:13:55,730
+then at this point we'll also predict Y equals one.
+那么通过这个点预测的Y值就是1
+
+376
+00:13:56,800 --> 00:13:58,110
+And in fact, what you end
+实际上
+
+377
+00:13:58,240 --> 00:13:59,300
+up doing is, you know,
+你最后得到的结果是
+
+378
+00:13:59,350 --> 00:14:00,920
+if you look around this boundary, this
+如果你看看这个边界线
+
+379
+00:14:01,140 --> 00:14:02,300
+space, what we'll find is that
+这个区域 我们会发现
+
+380
+00:14:02,820 --> 00:14:03,900
+for points near l1
+对于接近L1和L2的点
+
+381
+00:14:04,090 --> 00:14:05,560
+and l2 we end up predicting positive.
+我们的预测值是1
+
+382
+00:14:06,550 --> 00:14:07,780
+And for points far away from
+对于远离
+
+383
+00:14:08,050 --> 00:14:09,260
+l1 and l2, that's for
+L1和L2的店
+
+384
+00:14:09,470 --> 00:14:12,220
+points far away from these two
+对于离这两个标记点非常远的点
+
+385
+00:14:12,480 --> 00:14:13,780
+landmarks, we end up predicting
+我们最后预测的结果
+
+386
+00:14:14,390 --> 00:14:15,560
+that the class is equal to 0.
+是等于0的
+
+387
+00:14:16,510 --> 00:14:17,380
+As so, what we end up doing,is
+我们最后会得到
+
+388
+00:14:17,890 --> 00:14:20,270
+that the decision boundary of
+这个预测函数的
+
+389
+00:14:20,400 --> 00:14:22,110
+this hypothesis would end
+判别边界
+
+390
+00:14:22,280 --> 00:14:24,210
+up looking something like this where
+会像这样
+
+391
+00:14:24,370 --> 00:14:25,630
+inside this red decision boundary
+在这个红色的判别边界里面
+
+392
+00:14:26,580 --> 00:14:28,240
+would predict Y equals
+预测的Y值等于1
+
+393
+00:14:28,630 --> 00:14:30,250
+1 and outside we predict
+在这外面预测的Y值
+
+394
+00:14:32,570 --> 00:14:32,570
+Y equals 0.
+等于0
+
+395
+00:14:33,020 --> 00:14:34,770
+And so this is
+这就是
+
+396
+00:14:34,850 --> 00:14:36,010
+how with this definition
+我们如何通过标记点
+
+397
+00:14:36,870 --> 00:14:38,560
+of the landmarks and of the kernel function.
+以及核函数
+
+398
+00:14:39,370 --> 00:14:40,940
+We can learn pretty complex non-linear
+来训练出非常复杂的非线性
+
+399
+00:14:41,420 --> 00:14:42,800
+decision boundary, like what I
+判别边界的方法
+
+400
+00:14:42,930 --> 00:14:44,150
+just drew where we predict
+就像我刚才画的那个判别边界
+
+401
+00:14:44,560 --> 00:14:46,990
+positive when we're close to either one of the two landmarks.
+当我们接近两个标记点中任意一个时 预测值就会等于1
+
+402
+00:14:47,570 --> 00:14:48,880
+And we predict negative when we're
+否则预测值等于0
+
+403
+00:14:49,260 --> 00:14:50,680
+very far away from any
+如果这些点离标记点
+
+404
+00:14:50,950 --> 00:14:52,990
+of the landmarks.
+非常远
+
+405
+00:14:53,440 --> 00:14:55,000
+And so this is part of
+这就是核函数这部分
+
+406
+00:14:55,050 --> 00:14:57,300
+the idea of kernels of and
+的概念
+
+407
+00:14:57,600 --> 00:14:58,620
+how we use them with the
+以及我们如何
+
+408
+00:14:58,770 --> 00:14:59,810
+support vector machine, which is that
+在支持向量机中使用它们
+
+409
+00:14:59,990 --> 00:15:01,720
+we define these extra features using
+我们通过标记点和相似性函数
+
+410
+00:15:02,040 --> 00:15:03,900
+landmarks and similarity functions
+来定义新的特征变量
+
+411
+00:15:04,770 --> 00:15:06,730
+to learn more complex nonlinear classifiers.
+从而训练复杂的非线性边界
+
+412
+00:15:08,210 --> 00:15:09,290
+So hopefully that gives you
+我希望刚才讲的内容能够
+
+413
+00:15:09,390 --> 00:15:10,410
+a sense of the idea of
+帮助你更好的理解核函数的概念
+
+414
+00:15:10,590 --> 00:15:11,680
+kernels and how we could
+以及我们如何使用它
+
+415
+00:15:11,890 --> 00:15:14,110
+use it to define new features for the Support Vector Machine.
+在支持向量机中定义新的特征变量
+
+416
+00:15:15,510 --> 00:15:17,670
+But there are a couple of questions that we haven't answered yet.
+但是还有一些问题我们并没有做出回答
+
+417
+00:15:18,010 --> 00:15:19,550
+One is, how do we get these landmarks?
+其中一个是 我们如何得到这些标记点
+
+418
+00:15:20,120 --> 00:15:20,930
+How do we choose these landmarks?
+我们怎么来选择这些标记点
+
+419
+00:15:21,050 --> 00:15:22,910
+And another is, what
+另一个是
+
+420
+00:15:23,090 --> 00:15:24,500
+other similarity functions, if any,
+其他的相似度方程是什么样的 如果有其他的话
+
+421
+00:15:24,750 --> 00:15:25,680
+can we use other than the
+我们能够用其他的相似度方程
+
+422
+00:15:25,780 --> 00:15:29,000
+one we talked about, which is called the Gaussian kernel.
+来代替我们所讲的这个高斯核函数吗
+
+423
+00:15:29,190 --> 00:15:29,970
+In the next video we give
+在下一个视频中
+
+424
+00:15:29,990 --> 00:15:31,290
+answers to these questions and put
+我们会回答这些问题
+
+425
+00:15:31,490 --> 00:15:33,150
+everything together to show how
+然后把所有东西都整合到一起
+
+426
+00:15:33,740 --> 00:15:35,060
+support vector machines with kernels
+来看看支持向量机如何通过核函数的定义
+
+427
+00:15:35,720 --> 00:15:36,960
+can be a powerful way
+有效的学习
+
+428
+00:15:37,200 --> 00:15:38,610
+to learn complex nonlinear functions.
+复杂非线性函数
+
diff --git a/srt/12 - 5 - Kernels II (16 min).srt b/srt/12 - 5 - Kernels II (16 min).srt
new file mode 100644
index 00000000..288e5165
--- /dev/null
+++ b/srt/12 - 5 - Kernels II (16 min).srt
@@ -0,0 +1,2201 @@
+1
+00:00:00,530 --> 00:00:01,550
+In the last video, we started
+在上一节视频里 我们讨论了
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,950 --> 00:00:03,230
+to talk about the kernels idea
+核函数这个想法
+
+3
+00:00:03,710 --> 00:00:04,590
+and how it can be used to
+以及怎样利用它去
+
+4
+00:00:04,860 --> 00:00:07,900
+define new features for the support vector machine.
+实现支持向量机的一些新特性
+
+5
+00:00:08,100 --> 00:00:08,910
+In this video, I'd like to throw
+在这一节视频中 我将
+
+6
+00:00:09,230 --> 00:00:10,670
+in some of the missing details and,
+补充一些缺失的细节
+
+7
+00:00:11,020 --> 00:00:12,070
+also, say a few words about
+并简单的介绍一下
+
+8
+00:00:12,270 --> 00:00:14,100
+how to use these ideas in practice.
+怎么在实际中使用应用这些想法
+
+9
+00:00:14,650 --> 00:00:15,850
+Such as, how they pertain
+例如 怎么处理
+
+10
+00:00:16,340 --> 00:00:20,120
+to, for example, the bias variance trade-off in support vector machines.
+支持向量机中的偏差方差折中
+
+11
+00:00:22,690 --> 00:00:23,680
+In the last video, I talked
+在上一节课中
+
+12
+00:00:24,000 --> 00:00:25,970
+about the process of picking a few landmarks.
+我谈到过选择标记点
+
+13
+00:00:26,660 --> 00:00:28,890
+You know, l1, l2, l3 and that
+例如 l1 l2 l3
+
+14
+00:00:29,150 --> 00:00:30,220
+allowed us to define the
+这些点使我们能够定义
+
+15
+00:00:30,300 --> 00:00:31,900
+similarity function also called
+相似度函数
+
+16
+00:00:32,200 --> 00:00:33,500
+the kernel or in this
+也称之为核函数
+
+17
+00:00:33,690 --> 00:00:34,830
+example if you have
+在这个例子里
+
+18
+00:00:35,070 --> 00:00:37,410
+this similarity function this is a Gaussian kernel.
+我们的相似度函数为高斯核函数
+
+19
+00:00:38,610 --> 00:00:40,370
+And that allowed us to build
+这使我们能够
+
+20
+00:00:40,660 --> 00:00:42,070
+this form of a hypothesis function.
+构造一个模型
+
+21
+00:00:43,180 --> 00:00:44,880
+But where do we get these landmarks from?
+但是 我们从哪里得到这些标记点?
+
+22
+00:00:45,150 --> 00:00:45,670
+Where do we get l1, l2, l3 from?
+我们从哪里得到l1 l2 l3?
+
+23
+00:00:45,690 --> 00:00:49,080
+And it seems, also, that for complex learning
+而且 在一些复杂的学习问题中
+
+24
+00:00:49,610 --> 00:00:50,830
+problems, maybe we want a
+也许我们需要
+
+25
+00:00:50,920 --> 00:00:53,060
+lot more landmarks than just three of them that we might choose by hand.
+更多的标记点 而不是我们手选的这三个
+
+26
+00:00:55,160 --> 00:00:56,450
+So in practice this is
+因此 在实际应用时
+
+27
+00:00:56,580 --> 00:00:57,730
+how the landmarks are chosen
+怎么选取标记点
+
+28
+00:00:57,830 --> 00:00:59,910
+which is that given the
+是机器学习中必须解决的问题
+
+29
+00:01:00,150 --> 00:01:01,110
+machine learning problem. We have some
+这是我们的数据集
+
+30
+00:01:01,370 --> 00:01:02,230
+data set of some some positive
+有一些正样本和一些负样本
+
+31
+00:01:02,710 --> 00:01:04,460
+and negative examples. So, this is the idea here
+我们的想法是
+
+32
+00:01:05,310 --> 00:01:06,270
+which is that we're gonna take the
+我们将选取样本点
+
+33
+00:01:06,630 --> 00:01:08,200
+examples and for every
+我们拥有的
+
+34
+00:01:08,470 --> 00:01:09,780
+training example that we have,
+每一个样本点
+
+35
+00:01:10,490 --> 00:01:11,430
+we are just going to call
+我们只需要直接使用它们
+
+36
+00:01:11,980 --> 00:01:13,270
+it. We're just going
+我们直接
+
+37
+00:01:13,440 --> 00:01:14,850
+to put landmarks as exactly
+将训练样本
+
+38
+00:01:15,490 --> 00:01:17,600
+the same locations as the training examples.
+作为标记点
+
+39
+00:01:18,930 --> 00:01:20,360
+So if I have one training
+如果我有一个
+
+40
+00:01:20,680 --> 00:01:21,880
+example if that is x1,
+训练样本x1
+
+41
+00:01:22,120 --> 00:01:23,460
+well then I'm going
+那么
+
+42
+00:01:23,670 --> 00:01:24,550
+to choose this is my first landmark
+我将在这个样本点
+
+43
+00:01:25,100 --> 00:01:26,470
+to be at xactly the same location
+精确一致的位置上
+
+44
+00:01:27,250 --> 00:01:28,170
+as my first training example.
+选作我的第一个标记点
+
+45
+00:01:29,260 --> 00:01:30,180
+And if I have a different training
+如果我有另一个
+
+46
+00:01:30,470 --> 00:01:32,340
+example x2. Well we're
+训练样本x2
+
+47
+00:01:32,500 --> 00:01:33,980
+going to set the second landmark
+那么 我将把第二个标记点选在
+
+48
+00:01:35,060 --> 00:01:37,300
+to be the location of my second training example.
+与第二个样本点一致的位置上
+
+49
+00:01:38,480 --> 00:01:39,320
+On the figure on the right, I
+在右边的这幅图上
+
+50
+00:01:39,480 --> 00:01:40,480
+used red and blue dots
+我用红点和蓝点
+
+51
+00:01:40,820 --> 00:01:41,930
+just as illustration, the color
+来阐述
+
+52
+00:01:42,420 --> 00:01:44,320
+of this figure, the color of
+这幅图以及这些点的颜色
+
+53
+00:01:44,370 --> 00:01:46,030
+the dots on the figure on the right is not significant.
+可能并不显眼
+
+54
+00:01:47,120 --> 00:01:47,930
+But what I'm going to end up
+但是利用
+
+55
+00:01:48,110 --> 00:01:49,660
+with using this method is I'm
+这个方法
+
+56
+00:01:49,790 --> 00:01:51,450
+going to end up with m
+最终能得到
+
+57
+00:01:52,160 --> 00:01:53,690
+landmarks of l1, l2
+m个标记点 l1 l2
+
+58
+00:01:54,950 --> 00:01:56,320
+down to l(m) if I
+直到lm
+
+59
+00:01:56,380 --> 00:01:58,180
+have m training examples with
+即每一个标记点
+
+60
+00:01:58,420 --> 00:02:00,500
+one landmark per location of
+的位置都与
+
+61
+00:02:00,810 --> 00:02:02,680
+my per location of each
+每一个样本点
+
+62
+00:02:02,860 --> 00:02:04,810
+of my training examples. And this is
+的位置精确对应
+
+63
+00:02:04,950 --> 00:02:05,920
+nice because it is saying that
+这个过程很棒
+
+64
+00:02:06,120 --> 00:02:07,630
+my features are basically going
+这说明特征函数基本上
+
+65
+00:02:07,700 --> 00:02:09,300
+to measure how close an
+是在描述
+
+66
+00:02:09,380 --> 00:02:10,800
+example is to one
+每一个样本距离
+
+67
+00:02:10,970 --> 00:02:13,150
+of the things I saw in my training set.
+样本集中其他样本的距离
+
+68
+00:02:13,440 --> 00:02:14,180
+So, just to write this outline a
+我们具体的列出
+
+69
+00:02:14,350 --> 00:02:16,270
+little more concretely, given m
+这个过程的大纲
+
+70
+00:02:16,470 --> 00:02:17,870
+training examples, I'm going
+给定m个训练样本
+
+71
+00:02:18,050 --> 00:02:19,100
+to choose the the location
+我将选取与
+
+72
+00:02:19,310 --> 00:02:20,430
+of my landmarks to be exactly
+m个训练样本精确一致
+
+73
+00:02:21,190 --> 00:02:23,920
+near the locations of my m training examples.
+的位置作为我的标记点
+
+74
+00:02:25,430 --> 00:02:26,600
+When you are given example x,
+当输入样本x
+
+75
+00:02:26,920 --> 00:02:28,090
+and in this example x can be
+样本x可以
+
+76
+00:02:28,230 --> 00:02:29,260
+something in the training set,
+属于训练集
+
+77
+00:02:29,570 --> 00:02:30,800
+it can be something in the cross validation
+也可以属于交叉验证集
+
+78
+00:02:31,490 --> 00:02:32,470
+set, or it can be something in the test set.
+也可以属于测试集
+
+79
+00:02:33,320 --> 00:02:34,090
+Given an example x we are
+给定样本x
+
+80
+00:02:34,320 --> 00:02:35,470
+going to compute, you know,
+我们可以计算
+
+81
+00:02:35,750 --> 00:02:37,220
+these features as so f1,
+这些特征 即f1
+
+82
+00:02:37,560 --> 00:02:39,180
+f2, and so on.
+f2 以此类推
+
+83
+00:02:39,580 --> 00:02:41,120
+Where l1 is actually equal
+这里l1等于x1
+
+84
+00:02:41,490 --> 00:02:42,850
+to x1 and so on.
+剩下标记点的以此类推
+
+85
+00:02:43,570 --> 00:02:46,080
+And these then give me a feature vector.
+最终我们能到一个特征向量
+
+86
+00:02:46,840 --> 00:02:49,540
+So let me write f as the feature vector.
+我将特征向量记为f
+
+87
+00:02:50,270 --> 00:02:52,090
+I'm going to take these f1, f2 and
+我将f1 f2等等
+
+88
+00:02:52,290 --> 00:02:53,370
+so on, and just group
+构造为
+
+89
+00:02:53,580 --> 00:02:55,330
+them into feature vector.
+特征向量
+
+90
+00:02:56,330 --> 00:02:58,000
+Take those down to fm.
+一直写到fm
+
+91
+00:02:59,320 --> 00:03:01,080
+And, you know, just by convention.
+此外 按照惯例
+
+92
+00:03:01,610 --> 00:03:02,870
+If we want, we can add an
+如果我们需要的话
+
+93
+00:03:02,990 --> 00:03:06,250
+extra feature f0, which is always equal to 1.
+可以添加额外的特征f0 f0的值始终为1
+
+94
+00:03:06,450 --> 00:03:08,530
+So this plays a role similar to what we had previously.
+它与我们之前讨论过的
+
+95
+00:03:09,480 --> 00:03:11,200
+For x0, which was our interceptor.
+截距x0的作用相似
+
+96
+00:03:13,200 --> 00:03:14,450
+So, for example, if we
+举个例子
+
+97
+00:03:14,580 --> 00:03:16,550
+have a training example x(i), y(i),
+假设我们右训练样本(xi, yi)
+
+98
+00:03:18,270 --> 00:03:19,300
+the features we would compute for
+这个样本对应的
+
+99
+00:03:20,080 --> 00:03:21,330
+this training example will be
+特征向量可以
+
+100
+00:03:21,440 --> 00:03:23,440
+as follows: given x(i), we
+这样计算 给定xi
+
+101
+00:03:23,640 --> 00:03:26,560
+will then map it to, you know, f1(i).
+我们可以通过相似度函数
+
+102
+00:03:27,980 --> 00:03:29,670
+Which is the similarity. I'm going to
+将其映射到f1(i)
+
+103
+00:03:29,960 --> 00:03:31,980
+abbreviate as SIM instead of writing out the whole
+在这里 我将整个单词similarity
+
+104
+00:03:32,090 --> 00:03:33,380
+word
+简记为
+
+105
+00:03:35,540 --> 00:03:35,540
+similarity, right?
+SIM
+
+106
+00:03:37,050 --> 00:03:39,180
+And f2(i) equals the similarity
+f2(i)等于x(i)与l2
+
+107
+00:03:40,090 --> 00:03:42,780
+between x(i) and l2,
+之间的相似度
+
+108
+00:03:43,140 --> 00:03:45,050
+and so on,
+以此类推
+
+109
+00:03:45,230 --> 00:03:48,370
+down to fm(i) equals
+最后有fm(i)
+
+110
+00:03:49,600 --> 00:03:54,480
+the similarity between x(i) and l(m).
+等于x(i)与lm之间的相似度
+
+111
+00:03:55,700 --> 00:03:58,700
+And somewhere in the middle.
+在这一列中间的
+
+112
+00:03:59,160 --> 00:04:01,320
+Somewhere in this list, you know, at
+某个位置
+
+113
+00:04:01,480 --> 00:04:03,930
+the i-th component, I will
+即第i个元素
+
+114
+00:04:04,230 --> 00:04:05,740
+actually have one feature
+有一个特征
+
+115
+00:04:06,150 --> 00:04:07,590
+component which is f subscript
+为fi(i)
+
+116
+00:04:08,170 --> 00:04:09,930
+i(i), which is
+为fi(i)
+
+117
+00:04:10,050 --> 00:04:11,180
+going to be the similarity
+这是xi和li之间的
+
+118
+00:04:13,080 --> 00:04:14,550
+between x and l(i).
+相似度
+
+119
+00:04:15,680 --> 00:04:16,990
+Where l(i) is equal to
+这里l(i)就等于
+
+120
+00:04:17,190 --> 00:04:18,560
+x(i), and so you know
+x(i) 所以
+
+121
+00:04:19,140 --> 00:04:20,320
+fi(i) is just going to
+fi(i)衡量的是
+
+122
+00:04:20,410 --> 00:04:22,250
+be the similarity between x and itself.
+x(i)与其自身的相似度
+
+123
+00:04:23,960 --> 00:04:25,380
+And if you're using the Gaussian kernel this is
+如果你使用高斯核函数的话
+
+124
+00:04:25,620 --> 00:04:26,720
+actually e to the minus 0
+这一项等于
+
+125
+00:04:27,170 --> 00:04:29,440
+over 2 sigma squared and so, this will be equal to 1 and that's okay.
+exp(-0/(2*sigma^2)) 等于1
+
+126
+00:04:29,790 --> 00:04:31,060
+So one of my features for this
+所以 对于这个样本来说
+
+127
+00:04:31,370 --> 00:04:32,940
+training example is going to be equal to 1.
+其中的某一个特征等于1
+
+128
+00:04:34,290 --> 00:04:35,570
+And then similar to what I have above.
+接下来 类似于我们之前的过程
+
+129
+00:04:35,990 --> 00:04:36,940
+I can take all of these
+我将这m个特征
+
+130
+00:04:37,870 --> 00:04:39,910
+m features and group them into a feature vector.
+合并为一个特征向量
+
+131
+00:04:40,340 --> 00:04:41,730
+So instead of representing my example,
+于是 相比之前用x(i)来描述样本
+
+132
+00:04:42,710 --> 00:04:44,200
+using, you know, x(i) which is this what
+x(i)为n维或者n+1维空间
+
+133
+00:04:44,430 --> 00:04:46,970
+R(n) plus R(n) one dimensional vector.
+的向量
+
+134
+00:04:48,290 --> 00:04:49,590
+Depending on whether you can
+取决于你的具体项数
+
+135
+00:04:49,990 --> 00:04:51,120
+set terms, is either R(n)
+可能为n维向量空间
+
+136
+00:04:52,070 --> 00:04:52,750
+or R(n) plus 1.
+也可能为n+1维向量空间
+
+137
+00:04:53,440 --> 00:04:55,140
+We can now instead represent my
+我们现在可以用
+
+138
+00:04:55,300 --> 00:04:56,700
+training example using this feature
+这个特征向量f
+
+139
+00:04:56,980 --> 00:04:58,810
+vector f. I am
+来描述我的特征向量
+
+140
+00:04:58,920 --> 00:05:01,240
+going to write this f superscript i. Which
+我将合并f(i)
+
+141
+00:05:01,400 --> 00:05:03,060
+is going to be taking all
+将所有这些项
+
+142
+00:05:03,300 --> 00:05:06,010
+of these things and stacking them into a vector.
+合并为一个向量
+
+143
+00:05:06,540 --> 00:05:09,180
+So, f1(i) down
+即从f1(i)
+
+144
+00:05:09,430 --> 00:05:12,740
+to fm(i) and if you want and
+到fm(i) 如果有需要的话
+
+145
+00:05:13,030 --> 00:05:15,160
+well, usually we'll also add this
+我们通常也会加上
+
+146
+00:05:15,420 --> 00:05:16,990
+f0(i), where
+f0(i)这一项
+
+147
+00:05:17,130 --> 00:05:19,370
+f0(i) is equal to 1.
+f0(i)等于1
+
+148
+00:05:19,370 --> 00:05:20,970
+And so this vector
+那么 这个向量
+
+149
+00:05:21,300 --> 00:05:23,260
+here gives me my
+就是
+
+150
+00:05:23,430 --> 00:05:25,180
+new feature vector with which
+我们用于描述训练样本的
+
+151
+00:05:25,480 --> 00:05:28,310
+to represent my training example.
+特征向量
+
+152
+00:05:29,040 --> 00:05:30,980
+So given these kernels
+当给定核函数
+
+153
+00:05:31,530 --> 00:05:33,160
+and similarity functions, here's how
+和相似度函数后
+
+154
+00:05:33,400 --> 00:05:35,030
+we use a simple vector machine.
+我们按照这个方法来使用支持向量机
+
+155
+00:05:35,720 --> 00:05:37,100
+If you already have a learning
+如果你已经得到参数theta
+
+156
+00:05:37,300 --> 00:05:39,040
+set of parameters theta, then if you given a value of x and you want to make a prediction.
+并且想对样本x做出预测
+
+157
+00:05:41,680 --> 00:05:42,850
+What we do is we compute the
+我们先要计算
+
+158
+00:05:43,060 --> 00:05:44,170
+features f, which is now
+特征向量f
+
+159
+00:05:44,450 --> 00:05:46,920
+an R(m) plus 1 dimensional feature vector.
+f是m+1维特征向量
+
+160
+00:05:49,040 --> 00:05:50,640
+And we have m here because we have
+这里之所以有m
+
+161
+00:05:51,610 --> 00:05:53,190
+m training examples and thus
+是因为我们有m个训练样本
+
+162
+00:05:53,570 --> 00:05:56,370
+m landmarks and what
+于是就有m个标记点
+
+163
+00:05:57,330 --> 00:05:58,310
+we do is we predict
+我们在theta的转置乘以f
+
+164
+00:05:58,600 --> 00:06:00,180
+1 if theta transpose f
+大于或等于0时
+
+165
+00:06:00,780 --> 00:06:01,860
+is greater than or equal to 0.
+预测y=1
+
+166
+00:06:02,230 --> 00:06:02,430
+Right.
+具体一点
+
+167
+00:06:02,640 --> 00:06:03,770
+So, if theta transpose f, of course,
+theta的转置乘以f
+
+168
+00:06:04,090 --> 00:06:07,200
+that's just equal to theta 0, f0 plus theta 1,
+等于theta_0*f_0加上theta_1*f_1
+
+169
+00:06:07,900 --> 00:06:08,990
+f1 plus dot dot
+加上点点点
+
+170
+00:06:09,120 --> 00:06:11,200
+dot, plus theta m
+直到theta_m*f_m
+
+171
+00:06:12,170 --> 00:06:13,900
+f(m). And so my
+所以
+
+172
+00:06:14,050 --> 00:06:15,720
+parameter vector theta is also now
+参数向量theta
+
+173
+00:06:16,170 --> 00:06:17,730
+going to be an m
+在这里为
+
+174
+00:06:17,990 --> 00:06:21,260
+plus 1 dimensional vector.
+m+1维向量
+
+175
+00:06:21,780 --> 00:06:23,100
+And we have m here because where
+这里有m是因为
+
+176
+00:06:23,260 --> 00:06:25,030
+the number of landmarks is equal
+标记点的个数等于
+
+177
+00:06:25,450 --> 00:06:26,600
+to the training set size.
+训练点的个数
+
+178
+00:06:26,910 --> 00:06:28,190
+So m was the training set size and now, the
+m就是训练集的大小
+
+179
+00:06:29,100 --> 00:06:31,950
+parameter vector theta is going to be m plus one dimensional.
+所以 参数向量theta为m+1维
+
+180
+00:06:32,990 --> 00:06:33,990
+So that's how you make a prediction
+以上就是当已知参数theta时
+
+181
+00:06:34,360 --> 00:06:36,870
+if you already have a setting for the parameter's theta.
+怎么做出预测的过程
+
+182
+00:06:37,840 --> 00:06:39,160
+How do you get the parameter's theta?
+但是怎么得到参数theta?
+
+183
+00:06:39,680 --> 00:06:40,650
+Well you do that using the
+你在使用
+
+184
+00:06:40,920 --> 00:06:43,040
+SVM learning algorithm, and specifically
+SVM学习算法时
+
+185
+00:06:43,850 --> 00:06:46,460
+what you do is you would solve this minimization problem.
+具体来说就是要求解这个最小化问题
+
+186
+00:06:46,690 --> 00:06:48,170
+You've minimized the parameter's
+你需要求出能使这个式子取最小值的参数theta
+
+187
+00:06:48,540 --> 00:06:51,630
+theta of C times this cost function which we had before.
+式子为C乘以这个我们之前见过的代价函数
+
+188
+00:06:52,430 --> 00:06:54,770
+Only now, instead of looking
+只是在这里
+
+189
+00:06:55,040 --> 00:06:56,650
+there instead of making
+相比之前使用
+
+190
+00:06:56,970 --> 00:06:59,300
+predictions using theta transpose
+theta的转置乘以x^(i) 即我们的原始特征
+
+191
+00:07:00,020 --> 00:07:01,410
+x(i) using our original
+做出预测
+
+192
+00:07:01,720 --> 00:07:03,320
+features, x(i). Instead we've
+我们将替换
+
+193
+00:07:03,520 --> 00:07:04,840
+taken the features x(i)
+特征向量x^(i)
+
+194
+00:07:05,090 --> 00:07:06,260
+and replace them with a new features
+并使用这个新的特征向量
+
+195
+00:07:07,270 --> 00:07:09,080
+so we are using theta transpose
+我们使用theta的转置
+
+196
+00:07:09,380 --> 00:07:10,840
+f(i) to make a
+乘以f^(i)来对第i个训练样本
+
+197
+00:07:11,130 --> 00:07:12,480
+prediction on the i'f training
+做出预测
+
+198
+00:07:12,860 --> 00:07:13,860
+examples and we see that, you know,
+我们可以看到
+
+199
+00:07:14,230 --> 00:07:16,580
+in both places here and
+这两个地方(都要做出替换)
+
+200
+00:07:16,700 --> 00:07:18,270
+it's by solving this minimization problem
+通过解决这个最小化问题
+
+201
+00:07:18,760 --> 00:07:22,130
+that you get the parameters for your Support Vector Machine.
+我们就能得到支持向量机的参数
+
+202
+00:07:23,240 --> 00:07:24,640
+And one last detail is
+最后一个细节是
+
+203
+00:07:24,870 --> 00:07:26,880
+because this optimization
+对于这个优化问题
+
+204
+00:07:27,510 --> 00:07:29,580
+problem we really have
+我们有
+
+205
+00:07:30,570 --> 00:07:32,300
+n equals m features.
+n=m个特征
+
+206
+00:07:32,860 --> 00:07:33,650
+That is here.
+就在这里
+
+207
+00:07:34,520 --> 00:07:36,010
+The number of features we have.
+我们拥有的特征个数
+
+208
+00:07:37,100 --> 00:07:38,240
+Really, the effective number of
+显然 有效的特征个数
+
+209
+00:07:38,410 --> 00:07:39,390
+features we have is dimension
+应该等于f的维数
+
+210
+00:07:39,670 --> 00:07:41,020
+of f. So that n
+所以
+
+211
+00:07:41,730 --> 00:07:42,690
+is actually going to be equal
+n其实就等于m
+
+212
+00:07:42,900 --> 00:07:44,470
+to m. So, if you want to, you can
+如果愿意的话
+
+213
+00:07:44,610 --> 00:07:45,530
+think of this as a sum,
+你也可以认为这是一个求和
+
+214
+00:07:46,340 --> 00:07:47,280
+this really is a sum
+这确实就是
+
+215
+00:07:47,590 --> 00:07:48,680
+from j equals 1 through
+j从1到m的累和
+
+216
+00:07:49,490 --> 00:07:50,390
+m. And then one way to think
+可以这么来看这个问题
+
+217
+00:07:50,470 --> 00:07:51,500
+about this, is you can
+你可以想象(这一段话有问题,待改)
+
+218
+00:07:51,620 --> 00:07:53,250
+think of it as n being
+n就等于m
+
+219
+00:07:53,550 --> 00:07:55,060
+equal to m, because if
+因为如果f
+
+220
+00:07:55,570 --> 00:07:57,320
+f isn't a new feature, then
+不是新的特征向量
+
+221
+00:07:57,970 --> 00:07:59,650
+we have m plus 1
+那么我们有m+1个特征
+
+222
+00:08:00,120 --> 00:08:02,920
+features, with the plus 1 coming from the interceptor.
+额外的1是因为截距的关系
+
+223
+00:08:05,090 --> 00:08:06,760
+And here, we still do sum
+因此这里
+
+224
+00:08:06,990 --> 00:08:08,110
+from j equal 1 through n,
+我们仍要j从1累加到n
+
+225
+00:08:08,440 --> 00:08:10,070
+because similar to our
+与我们之前
+
+226
+00:08:10,380 --> 00:08:11,700
+earlier videos on regularization,
+视频中讲过的正则化类似
+
+227
+00:08:12,580 --> 00:08:14,110
+we still do not regularize the
+我们仍然不对tehta_0
+
+228
+00:08:14,180 --> 00:08:15,650
+parameter theta zero, which is
+做正则化处理
+
+229
+00:08:15,780 --> 00:08:16,560
+why this is a sum for
+这就是
+
+230
+00:08:16,740 --> 00:08:17,930
+j equals 1 through m
+j从1累加到m
+
+231
+00:08:18,880 --> 00:08:19,840
+instead of j equals zero though
+而不是从0累加到m的原因
+
+232
+00:08:20,000 --> 00:08:22,200
+m. So that's
+以上
+
+233
+00:08:22,580 --> 00:08:23,760
+the support vector machine learning algorithm.
+就是支持向量机的学习算法
+
+234
+00:08:24,660 --> 00:08:26,260
+That's one sort of, mathematical
+我在这里
+
+235
+00:08:27,160 --> 00:08:28,310
+detail aside that I
+还要讲到
+
+236
+00:08:28,440 --> 00:08:29,840
+should mention, which is
+一个数学细节
+
+237
+00:08:29,930 --> 00:08:30,780
+that in the way the support
+在支持向量机
+
+238
+00:08:31,310 --> 00:08:33,020
+vector machine is implemented, this last
+实现的过程中
+
+239
+00:08:33,320 --> 00:08:34,750
+term is actually done a little bit differently.
+这最后一项与这里写的有细微差别
+
+240
+00:08:35,680 --> 00:08:36,730
+So you don't really need to
+其实在实现支持向量机时
+
+241
+00:08:36,770 --> 00:08:38,080
+know about this last detail in
+你并不需要知道
+
+242
+00:08:38,190 --> 00:08:39,190
+order to use support vector machines,
+这个细节
+
+243
+00:08:39,700 --> 00:08:41,330
+and in fact the equations that
+事实上这写下的这个式子
+
+244
+00:08:41,450 --> 00:08:42,500
+are written down here should give
+已经给你提供了
+
+245
+00:08:42,620 --> 00:08:45,160
+you all the intuitions that should need.
+全部需要的原理
+
+246
+00:08:45,310 --> 00:08:46,190
+But in the way the support vector machine
+但是在支持向量机实现的过程中
+
+247
+00:08:46,450 --> 00:08:48,450
+is implemented, you know, that term, the
+这一项
+
+248
+00:08:48,570 --> 00:08:50,960
+sum of j of theta j squared right?
+theta_j从1到m的平方和
+
+249
+00:08:53,110 --> 00:08:54,780
+Another way to write this is this can
+这一项可以被重写为
+
+250
+00:08:55,580 --> 00:08:57,660
+be written as theta transpose
+theta的转置
+
+251
+00:08:58,500 --> 00:08:59,530
+theta if we ignore
+乘以theta
+
+252
+00:09:00,120 --> 00:09:02,730
+the parameter theta 0.
+如果我们忽略theta_0的话
+
+253
+00:09:03,570 --> 00:09:05,640
+So theta 1 down to
+考虑theta_1直到theta_m
+
+254
+00:09:05,800 --> 00:09:10,090
+theta m. Ignoring theta 0.
+并忽略theta_0
+
+255
+00:09:11,130 --> 00:09:13,790
+Then this sum of
+那么
+
+256
+00:09:14,510 --> 00:09:15,900
+j of theta j squared that this
+theta_j的平方和
+
+257
+00:09:16,040 --> 00:09:18,870
+can also be written theta transpose theta.
+可以被重写为theta的转置乘以theta
+
+258
+00:09:19,930 --> 00:09:21,520
+And what most support vector
+大多数支持向量机
+
+259
+00:09:21,730 --> 00:09:23,380
+machine implementations do is actually
+在实现的时候
+
+260
+00:09:23,720 --> 00:09:25,520
+replace this theta transpose theta,
+其实是替换掉theta的转置乘以theta
+
+261
+00:09:26,280 --> 00:09:28,270
+will instead, theta transpose times
+用theta的转置乘以
+
+262
+00:09:28,590 --> 00:09:30,140
+some matrix inside, that depends
+某个矩阵 这依赖于你采用的核函数
+
+263
+00:09:30,820 --> 00:09:33,930
+on the kernel you use, times theta.
+再乘以theta的转置
+
+264
+00:09:34,160 --> 00:09:35,500
+And so this gives us a slightly different distance metric.
+这其实是另一种略有区别的距离度量方法
+
+265
+00:09:36,140 --> 00:09:37,770
+We'll use a slightly different
+我们用一种略有变化的
+
+266
+00:09:38,070 --> 00:09:40,050
+measure instead of minimizing exactly
+度量来取代
+
+267
+00:09:41,320 --> 00:09:43,250
+the norm of theta squared means
+theta的模的平方
+
+268
+00:09:43,790 --> 00:09:45,990
+that minimize something slightly similar to it.
+这意味着我们最小化了一种类似的度量
+
+269
+00:09:46,140 --> 00:09:47,610
+That's like a rescale version of
+这是参数向量theta的变尺度版本
+
+270
+00:09:47,770 --> 00:09:50,150
+the parameter vector theta that depends on the kernel.
+这种变化和核函数相关
+
+271
+00:09:50,950 --> 00:09:52,440
+But this is kind of a mathematical detail.
+这个数学细节
+
+272
+00:09:53,210 --> 00:09:54,360
+That allows the support vector
+使得支持向量机
+
+273
+00:09:54,650 --> 00:09:56,350
+machine software to run much more efficiently.
+能够更有效率的运行
+
+274
+00:09:58,300 --> 00:09:59,410
+And the reason the support vector machine
+支持向量机做这种修改的
+
+275
+00:09:59,700 --> 00:10:01,500
+does this is with this modification.
+理由是
+
+276
+00:10:02,020 --> 00:10:03,250
+It allows it to
+这么做可以适应
+
+277
+00:10:03,300 --> 00:10:05,740
+scale to much bigger training sets.
+超大的训练集
+
+278
+00:10:06,370 --> 00:10:07,800
+Because for example, if you have
+例如
+
+279
+00:10:07,970 --> 00:10:11,530
+a training set with 10,000 training examples.
+当你的训练集有10000个样本时
+
+280
+00:10:12,590 --> 00:10:13,560
+Then, you know, the way we define
+根据我们之前定义标记点的方法
+
+281
+00:10:13,950 --> 00:10:15,750
+landmarks, we end up with 10,000 landmarks.
+我们最终有10000个标记点
+
+282
+00:10:16,780 --> 00:10:18,060
+And so theta becomes 10,000 dimensional.
+theta也随之是10000维的向量
+
+283
+00:10:18,490 --> 00:10:20,450
+And maybe that works, but when m
+或许这时这么做还可行
+
+284
+00:10:20,450 --> 00:10:21,710
+becomes really, really big
+但是 当m变得非常非常大时
+
+285
+00:10:22,470 --> 00:10:24,020
+then solving for all
+那么求解
+
+286
+00:10:24,150 --> 00:10:25,480
+of these parameters, you know, if m were
+这么多参数
+
+287
+00:10:25,590 --> 00:10:26,590
+50,000 or a 100,000
+如果m为50,000或者100,000
+
+288
+00:10:26,880 --> 00:10:28,170
+then solving for
+此时
+
+289
+00:10:28,340 --> 00:10:29,660
+all of these parameters can become
+利用支持向量机软件包
+
+290
+00:10:29,890 --> 00:10:31,240
+expensive for the support
+来解决我写在这里的最小化问题
+
+291
+00:10:31,420 --> 00:10:33,690
+vector machine optimization software, thus
+求解这些参数的成本
+
+292
+00:10:33,870 --> 00:10:35,750
+solving the minimization problem that I drew here.
+会非常高
+
+293
+00:10:36,490 --> 00:10:37,570
+So kind of as mathematical
+这些都是数学细节
+
+294
+00:10:37,860 --> 00:10:39,580
+detail, which again you really don't need to know about.
+事实上你没有必要了解这些
+
+295
+00:10:41,000 --> 00:10:43,070
+It actually modifies that last
+它实际上
+
+296
+00:10:43,350 --> 00:10:44,380
+term a little bit to
+细微的修改了最后一项
+
+297
+00:10:44,500 --> 00:10:45,940
+optimize something slightly different than
+使得最终的优化目标
+
+298
+00:10:46,080 --> 00:10:48,560
+just minimizing the norm squared of theta squared, of theta.
+与直接最小化theta的模的平方略有区别
+
+299
+00:10:49,370 --> 00:10:50,600
+But if you want,
+如果愿意的话
+
+300
+00:10:51,080 --> 00:10:52,450
+you can feel free to think
+你可以直接认为
+
+301
+00:10:52,710 --> 00:10:54,880
+of this as an kind of a n implementational detail
+这个具体的实现细节
+
+302
+00:10:55,340 --> 00:10:56,750
+that does change the objective a
+尽管略微的改变了
+
+303
+00:10:56,880 --> 00:10:58,260
+bit, but is done primarily
+优化目标
+
+304
+00:10:58,930 --> 00:11:01,590
+for reasons of computational efficiency,
+但是它主要是为了计算效率
+
+305
+00:11:02,260 --> 00:11:04,390
+so usually you don't really have to worry about this.
+所以 你不必要对此有太多担心
+
+306
+00:11:07,640 --> 00:11:09,460
+And by the way, in case your
+顺便说一下
+
+307
+00:11:09,560 --> 00:11:10,730
+wondering why we don't apply
+你可能会想为什么我们不将
+
+308
+00:11:11,100 --> 00:11:12,210
+the kernel's idea to other
+核函数这个想法
+
+309
+00:11:12,570 --> 00:11:13,690
+algorithms as well like logistic
+应用到其他算法 比如逻辑回归上
+
+310
+00:11:14,040 --> 00:11:15,450
+regression, it turns out
+事实证明
+
+311
+00:11:15,670 --> 00:11:16,770
+that if you want, you
+如果愿意的话
+
+312
+00:11:16,900 --> 00:11:18,120
+can actually apply the kernel's
+确实可以将核函数
+
+313
+00:11:18,550 --> 00:11:19,850
+idea and define the source
+这个想法用于定义特征向量
+
+314
+00:11:19,990 --> 00:11:22,920
+of features using landmarks and so on for logistic regression.
+将标记点之类的技术用于逻辑回归算法
+
+315
+00:11:23,880 --> 00:11:25,860
+But the computational tricks that apply
+但是用于支持向量机
+
+316
+00:11:26,440 --> 00:11:28,110
+for support vector machines don't
+的计算技巧
+
+317
+00:11:28,430 --> 00:11:30,700
+generalize well to other algorithms like logistic regression.
+不能较好的推广到其他算法诸如逻辑回归上
+
+318
+00:11:31,310 --> 00:11:33,110
+And so, using kernels with
+所以 将核函数用于
+
+319
+00:11:33,260 --> 00:11:34,390
+logistic regression is going too
+逻辑回归时
+
+320
+00:11:34,580 --> 00:11:36,330
+very slow, whereas, because of
+会变得非常的慢
+
+321
+00:11:36,440 --> 00:11:37,940
+computational tricks, like that
+相比之下 这些计算技巧
+
+322
+00:11:38,150 --> 00:11:39,490
+embodied and how it modifies
+比如具体化技术
+
+323
+00:11:39,900 --> 00:11:41,130
+this and the details of how
+对这些细节的修改
+
+324
+00:11:41,320 --> 00:11:43,140
+the support vector machine software is
+以及支持向量软件的实现细节
+
+325
+00:11:43,240 --> 00:11:44,990
+implemented, support vector machines and
+使得支持向量机
+
+326
+00:11:45,300 --> 00:11:47,090
+kernels tend go particularly well together.
+可以和核函数相得益彰
+
+327
+00:11:47,930 --> 00:11:49,450
+Whereas, logistic regression and kernels,
+而逻辑回归和核函数
+
+328
+00:11:50,250 --> 00:11:51,990
+you know, you can do it, but this would run very slowly.
+则运行得十分缓慢
+
+329
+00:11:52,890 --> 00:11:53,670
+And it won't be able to
+更何况它们还不能
+
+330
+00:11:53,750 --> 00:11:55,420
+take advantage of advanced optimization
+使用那些高级优化技巧
+
+331
+00:11:56,040 --> 00:11:57,360
+techniques that people have figured
+因为这些技巧
+
+332
+00:11:57,530 --> 00:11:58,530
+out for the particular case
+是人们专门为
+
+333
+00:11:59,140 --> 00:12:00,950
+of running a support vector machine with a kernel.
+使用核函数的支持向量机开发的
+
+334
+00:12:01,540 --> 00:12:03,340
+But all this pertains only
+但是这些问题只有
+
+335
+00:12:03,710 --> 00:12:04,850
+to how you actually implement
+在你亲自实现最小化函数
+
+336
+00:12:05,230 --> 00:12:06,900
+software to minimize the cost function.
+才会遇到
+
+337
+00:12:07,870 --> 00:12:08,940
+I will say more about that in
+我将在下一节视频中
+
+338
+00:12:09,040 --> 00:12:09,950
+the next video, but you really don't
+进一步讨论这些问题
+
+339
+00:12:10,150 --> 00:12:11,530
+need to know about
+但是 你并不需要知道
+
+340
+00:12:12,200 --> 00:12:13,520
+how to write software to
+怎么去写一个软件
+
+341
+00:12:13,670 --> 00:12:14,890
+minimize this cost function because
+来最小化代价函数
+
+342
+00:12:15,170 --> 00:12:17,560
+you can find very good off the shelf software for doing so.
+你能找到很好的成熟软件来做些
+
+343
+00:12:18,670 --> 00:12:19,890
+And just as, you know, I wouldn't
+与我
+
+344
+00:12:20,140 --> 00:12:21,340
+recommend writing code to invert
+不建议自己写矩阵求逆函数
+
+345
+00:12:21,850 --> 00:12:22,960
+a matrix or to compute a
+或者平方根函数
+
+346
+00:12:23,150 --> 00:12:24,490
+square root, I actually do
+的道理一样
+
+347
+00:12:24,660 --> 00:12:26,420
+not recommend writing software to
+我也不建议亲自写
+
+348
+00:12:26,560 --> 00:12:27,750
+minimize this cost function yourself,
+最小化代价函数的代码
+
+349
+00:12:28,240 --> 00:12:29,610
+but instead to use off
+而应该使用
+
+350
+00:12:29,780 --> 00:12:31,490
+the shelf software packages that people
+人们开发的
+
+351
+00:12:31,740 --> 00:12:33,240
+have developed and so
+成熟的软件包
+
+352
+00:12:33,540 --> 00:12:35,140
+those software packages already embody
+这些软件包
+
+353
+00:12:35,790 --> 00:12:37,720
+these numerical optimization tricks,
+已经包含了那些数值优化技巧
+
+354
+00:12:39,540 --> 00:12:41,770
+so you don't really have to worry about them.
+所以你不必担心这些东西
+
+355
+00:12:41,950 --> 00:12:42,920
+But one other thing that is
+但是另外一个
+
+356
+00:12:43,180 --> 00:12:45,200
+worth knowing about is when
+值得说明的问题是
+
+357
+00:12:45,350 --> 00:12:46,400
+you're applying a support vector
+在你使用支持向量机时
+
+358
+00:12:46,640 --> 00:12:47,730
+machine, how do you
+怎么选择
+
+359
+00:12:47,820 --> 00:12:50,220
+choose the parameters of the support vector machine?
+支持向量机中的参数?
+
+360
+00:12:51,520 --> 00:12:52,300
+And the last thing I want to
+在本节视频的末尾
+
+361
+00:12:52,400 --> 00:12:53,290
+do in this video is say a
+我想稍微说明一下
+
+362
+00:12:53,450 --> 00:12:54,680
+little word about the bias and
+在使用支持向量机时的
+
+363
+00:12:54,840 --> 00:12:57,070
+variance trade offs when using a support vector machine.
+偏差-方差折中
+
+364
+00:12:57,900 --> 00:12:59,230
+When using an SVM, one of
+在使用支持向量机时
+
+365
+00:12:59,390 --> 00:13:00,670
+the things you need to choose is
+其中一个要选择的事情是
+
+366
+00:13:00,960 --> 00:13:03,850
+the parameter C which
+目标函数中的
+
+367
+00:13:04,090 --> 00:13:05,880
+was in the optimization objective, and
+参数C
+
+368
+00:13:05,980 --> 00:13:07,690
+you recall that C played a
+回忆一下
+
+369
+00:13:07,770 --> 00:13:09,800
+role similar to 1 over
+C的作用与1/lambda相似
+
+370
+00:13:10,050 --> 00:13:11,750
+lambda, where lambda was the regularization
+lambda时逻辑回归算法中
+
+371
+00:13:12,520 --> 00:13:13,970
+parameter we had for logistic regression.
+的正则化参数
+
+372
+00:13:15,360 --> 00:13:16,760
+So, if you have a
+所以
+
+373
+00:13:16,930 --> 00:13:18,760
+large value of C, this corresponds
+大的C对应着
+
+374
+00:13:19,520 --> 00:13:20,560
+to what we have back in logistic
+我们以前在逻辑回归
+
+375
+00:13:21,270 --> 00:13:22,260
+regression, of a small
+问题中的小的lambda
+
+376
+00:13:22,670 --> 00:13:25,080
+value of lambda meaning of not using much regularization.
+这意味着不使用正则化
+
+377
+00:13:25,980 --> 00:13:26,960
+And if you do that, you
+如果你这么做
+
+378
+00:13:27,050 --> 00:13:29,330
+tend to have a hypothesis with lower bias and higher variance.
+就有可能得到一个低偏差但高方差的模型
+
+379
+00:13:30,570 --> 00:13:31,420
+Whereas if you use a smaller
+如果你使用了
+
+380
+00:13:31,630 --> 00:13:33,050
+value of C then this
+较小的C
+
+381
+00:13:33,240 --> 00:13:34,510
+corresponds to when we
+这对应着
+
+382
+00:13:34,660 --> 00:13:36,450
+are using logistic regression with a
+在逻辑回归问题中
+
+383
+00:13:36,620 --> 00:13:38,090
+large value of lambda and that corresponds
+使用较大的lambda
+
+384
+00:13:38,690 --> 00:13:40,180
+to a hypothesis with higher
+对应着一个高偏差
+
+385
+00:13:40,470 --> 00:13:41,760
+bias and lower variance.
+但是低方差的模型
+
+386
+00:13:42,580 --> 00:13:44,520
+And so, hypothesis with large
+所以
+
+387
+00:13:45,000 --> 00:13:46,870
+C has a higher
+使用较大C值的模型
+
+388
+00:13:47,450 --> 00:13:48,380
+variance, and is more prone
+为高方差
+
+389
+00:13:48,580 --> 00:13:50,290
+to overfitting, whereas hypothesis with
+更倾向于过拟合
+
+390
+00:13:50,450 --> 00:13:52,820
+small C has higher bias
+而使用较小C值的模型为高偏差
+
+391
+00:13:52,910 --> 00:13:54,900
+and is thus more prone to underfitting.
+更倾向于欠拟合
+
+392
+00:13:56,710 --> 00:13:59,870
+So this parameter C is one of the parameters we need to choose.
+C只是我们要选择的其中一个参数
+
+393
+00:14:00,210 --> 00:14:01,280
+The other one is the parameter
+另外一个要选择的参数是
+
+394
+00:14:02,280 --> 00:14:04,580
+sigma squared, which appeared in the Gaussian kernel.
+高斯核函数中的sigma^2
+
+395
+00:14:05,760 --> 00:14:07,080
+So if the Gaussian kernel
+当高斯核函数中的
+
+396
+00:14:07,750 --> 00:14:09,370
+sigma squared is large, then
+sigma^2偏大时
+
+397
+00:14:09,640 --> 00:14:11,350
+in the similarity function, which
+那么对应的相似度函数
+
+398
+00:14:11,530 --> 00:14:12,710
+was this you know E to the
+为exp(-||x-l^(i)||^2/(2*sigma^2))
+
+399
+00:14:13,390 --> 00:14:14,710
+minus x minus landmark
+exp(-||x-l^(i)||^2/(2*sigma^2))
+
+400
+00:14:16,280 --> 00:14:17,950
+varies squared over 2 sigma squared.
+exp(-||x-l^(i)||^2/(2*sigma^2))
+
+401
+00:14:20,130 --> 00:14:21,290
+In this one of the example; If I
+在这个例子中
+
+402
+00:14:21,480 --> 00:14:23,330
+have only one feature, x1, if
+如果我们只有一个特征x_1
+
+403
+00:14:23,570 --> 00:14:25,390
+I have a landmark there at
+我们在这个位置
+
+404
+00:14:25,490 --> 00:14:27,710
+that location, if sigma
+有一个标记点
+
+405
+00:14:27,960 --> 00:14:29,230
+squared is large, then, you know, the
+如果sigma^2较大
+
+406
+00:14:29,480 --> 00:14:30,600
+Gaussian kernel would tend to
+那么高斯核函数
+
+407
+00:14:30,690 --> 00:14:32,940
+fall off relatively slowly
+倾向于变得相对平滑
+
+408
+00:14:33,960 --> 00:14:34,740
+and so this would be my feature
+这可能是我的特征f_i
+
+409
+00:14:35,210 --> 00:14:36,690
+f(i), and so this
+所以
+
+410
+00:14:36,880 --> 00:14:38,970
+would be smoother function that varies
+由于函数平滑
+
+411
+00:14:39,060 --> 00:14:40,640
+more smoothly, and so this will
+且变化的比较平缓
+
+412
+00:14:40,760 --> 00:14:42,750
+give you a hypothesis with higher
+这会给你的模型
+
+413
+00:14:43,030 --> 00:14:44,170
+bias and lower variance, because
+带来较高的偏差和较低的方差
+
+414
+00:14:44,550 --> 00:14:46,000
+the Gaussian kernel that falls off smoothly,
+由于高斯核函数变得平缓
+
+415
+00:14:46,840 --> 00:14:48,240
+you tend to get a hypothesis that
+就更倾向于得到一个
+
+416
+00:14:48,520 --> 00:14:50,060
+varies slowly, or varies smoothly
+随着输入x
+
+417
+00:14:50,130 --> 00:14:51,860
+as you change the
+变化得缓慢的模型
+
+418
+00:14:52,050 --> 00:14:53,680
+input x. Whereas in contrast,
+反之
+
+419
+00:14:54,030 --> 00:14:55,330
+if sigma squared was
+如果sigma^2很小
+
+420
+00:14:55,660 --> 00:14:57,430
+small and if that's my
+这是我的标记点
+
+421
+00:14:57,540 --> 00:14:58,830
+landmark given my 1
+利用其给出特征x_1
+
+422
+00:14:58,960 --> 00:15:01,440
+feature x1, you know, my Gaussian
+那么
+
+423
+00:15:01,820 --> 00:15:04,630
+kernel, my similarity function, will vary more abruptly.
+高斯核函数 即相似度函数会变化的很剧烈
+
+424
+00:15:05,310 --> 00:15:07,520
+And in both cases I'd pick
+我们标记出这两种情况下1的位置
+
+425
+00:15:07,580 --> 00:15:08,550
+out 1, and so if sigma squared
+在sigma^2较小的情况下
+
+426
+00:15:08,870 --> 00:15:11,730
+is small, then my features vary less smoothly.
+特征的变化会变得不平滑
+
+427
+00:15:12,190 --> 00:15:13,740
+So if it's just higher slopes
+会有较大的斜率
+
+428
+00:15:14,250 --> 00:15:15,300
+or higher derivatives here.
+和较大的导数
+
+429
+00:15:16,020 --> 00:15:17,170
+And using this, you end
+在这种情况下
+
+430
+00:15:17,330 --> 00:15:19,620
+up fitting hypotheses of lower
+最终得到的模型会
+
+431
+00:15:19,840 --> 00:15:21,870
+bias and you can have higher variance.
+是低偏差和高方差
+
+432
+00:15:23,030 --> 00:15:24,460
+And if you look at this
+如果你看了
+
+433
+00:15:24,680 --> 00:15:26,240
+week's points exercise, you actually get
+本周的编程作业
+
+434
+00:15:26,450 --> 00:15:27,230
+to play around with some
+你就能亲自实现这些想法
+
+435
+00:15:27,330 --> 00:15:29,480
+of these ideas yourself and see these effects yourself.
+并亲眼看到这些效果
+
+436
+00:15:31,590 --> 00:15:34,430
+So, that was the support vector machine with kernels algorithm.
+这就是利用核函数的支持向量机算法
+
+437
+00:15:35,320 --> 00:15:36,450
+And hopefully this discussion of
+希望这些关于
+
+438
+00:15:37,090 --> 00:15:39,170
+bias and variance will give
+偏差和方差的讨论
+
+439
+00:15:39,310 --> 00:15:40,380
+you some sense of how you
+能给你一些
+
+440
+00:15:40,460 --> 00:15:42,600
+can expect this algorithm to behave as well.
+对于算法结果预期的直观印象
+
diff --git a/srt/12 - 6 - Using An SVM (21 min).srt b/srt/12 - 6 - Using An SVM (21 min).srt
new file mode 100644
index 00000000..d4925735
--- /dev/null
+++ b/srt/12 - 6 - Using An SVM (21 min).srt
@@ -0,0 +1,3090 @@
+1
+00:00:00,140 --> 00:00:01,310
+So far we've been talking about
+目前为止 我们已经讨论了
+
+2
+00:00:01,640 --> 00:00:03,290
+SVMs in a fairly abstract level.
+SVM比较抽象的层面
+
+3
+00:00:03,980 --> 00:00:05,030
+In this video I'd like to
+在这个视频中 我将要
+
+4
+00:00:05,200 --> 00:00:06,460
+talk about what you actually need
+讨论到为了运行或者运用SVM
+
+5
+00:00:06,740 --> 00:00:09,410
+to do in order to run or to use an SVM.
+你实际上所需要的一些东西
+
+6
+00:00:11,320 --> 00:00:12,300
+The support vector machine algorithm
+支持向量机算法
+
+7
+00:00:12,850 --> 00:00:14,870
+poses a particular optimization problem.
+提出了一个特别优化的问题
+
+8
+00:00:15,530 --> 00:00:16,940
+But as I briefly mentioned in
+但是就如在之前的
+
+9
+00:00:17,120 --> 00:00:18,150
+an earlier video, I really
+视频中我简单提到的
+
+10
+00:00:18,380 --> 00:00:20,570
+do not recommend writing your
+我真的不建议你自己写
+
+11
+00:00:20,630 --> 00:00:22,810
+own software to solve for the parameter's theta yourself.
+软件来求解参数θ
+
+12
+00:00:23,950 --> 00:00:26,110
+So just as today, very
+因此由于今天
+
+13
+00:00:26,420 --> 00:00:27,730
+few of us, or maybe almost essentially
+我们中的很少人 或者其实
+
+14
+00:00:28,090 --> 00:00:29,400
+none of us would think of
+没有人考虑过
+
+15
+00:00:29,530 --> 00:00:31,680
+writing code ourselves to invert a matrix
+自己写代码来转换矩阵
+
+16
+00:00:31,950 --> 00:00:33,940
+or take a square root of a number, and so on.
+或求一个数的平方根等
+
+17
+00:00:34,190 --> 00:00:36,570
+We just, you know, call some library function to do that.
+我们只是知道如何去调用库函数来实现这些功能
+
+18
+00:00:36,700 --> 00:00:38,090
+In the same way, the
+同样的
+
+19
+00:00:38,850 --> 00:00:40,310
+software for solving the SVM
+用以解决SVM
+
+20
+00:00:40,620 --> 00:00:42,200
+optimization problem is very
+最优化问题的软件很
+
+21
+00:00:42,440 --> 00:00:43,880
+complex, and there have
+复杂 且已经有
+
+22
+00:00:43,990 --> 00:00:44,960
+been researchers that have been
+研究者做了
+
+23
+00:00:45,110 --> 00:00:47,560
+doing essentially numerical optimization research for many years.
+很多年数值优化了
+
+24
+00:00:47,850 --> 00:00:48,960
+So you come up with good
+因此你提出好的
+
+25
+00:00:49,150 --> 00:00:50,550
+software libraries and good software
+软件库和好的软件
+
+26
+00:00:50,930 --> 00:00:52,270
+packages to do this.
+包来做这样一些事儿
+
+27
+00:00:52,470 --> 00:00:53,480
+And then strongly recommend just using
+然后强烈建议使用
+
+28
+00:00:53,860 --> 00:00:55,260
+one of the highly optimized software
+高优化软件库中的一个
+
+29
+00:00:55,710 --> 00:00:57,780
+libraries rather than trying to implement something yourself.
+而不是尝试自己落实一些数据
+
+30
+00:00:58,730 --> 00:01:00,680
+And there are lots of good software libraries out there.
+有许多好的软件库
+
+31
+00:01:00,970 --> 00:01:02,060
+The two that I happen to
+我正好用得最多的
+
+32
+00:01:02,210 --> 00:01:03,220
+use the most often are the
+两个是
+
+33
+00:01:03,400 --> 00:01:05,000
+linear SVM but there are really
+线性SVM 但是真的有
+
+34
+00:01:05,410 --> 00:01:06,860
+lots of good software libraries for
+很多软件库可以用来
+
+35
+00:01:07,030 --> 00:01:08,430
+doing this that you know, you can
+做这件事儿 你可以
+
+36
+00:01:08,600 --> 00:01:10,190
+link to many of the
+连接许多
+
+37
+00:01:10,450 --> 00:01:11,860
+major programming languages that you
+你可能会用来编写学习算法的
+
+38
+00:01:11,950 --> 00:01:14,410
+may be using to code up learning algorithm.
+主要编程语言
+
+39
+00:01:15,280 --> 00:01:16,460
+Even though you shouldn't be writing
+尽管你不去写
+
+40
+00:01:16,730 --> 00:01:18,330
+your own SVM optimization software,
+你自己的SVM(支持向量机)的优化软件
+
+41
+00:01:19,120 --> 00:01:20,680
+there are a few things you need to do, though.
+但是你也需要做几件事儿
+
+42
+00:01:21,420 --> 00:01:23,130
+First is to come up
+首先是提出
+
+43
+00:01:23,130 --> 00:01:24,230
+with with some choice of the
+参数C的选择
+
+44
+00:01:24,320 --> 00:01:25,640
+parameter's C. We talked a
+我们在之前的视频中
+
+45
+00:01:25,940 --> 00:01:26,930
+little bit of the bias/variance properties of
+讨论过误差/方差在
+
+46
+00:01:27,040 --> 00:01:28,850
+this in the earlier video.
+这方面的性质
+
+47
+00:01:30,290 --> 00:01:31,480
+Second, you also need to
+第二 你也需要
+
+48
+00:01:31,630 --> 00:01:33,040
+choose the kernel or the
+选择内核参数或
+
+49
+00:01:33,410 --> 00:01:34,880
+similarity function that you want to use.
+你想要使用的相似函数
+
+50
+00:01:35,730 --> 00:01:37,080
+So one choice might
+其中一个选择是
+
+51
+00:01:37,280 --> 00:01:38,980
+be if we decide not to use any kernel.
+我们选择不需要任何内核参数
+
+52
+00:01:40,560 --> 00:01:41,510
+And the idea of no kernel
+没有内核参数的理念
+
+53
+00:01:41,910 --> 00:01:43,600
+is also called a linear kernel.
+也叫线性核函数
+
+54
+00:01:44,130 --> 00:01:45,320
+So if someone says, I use
+因此 如果有人说他使用
+
+55
+00:01:45,530 --> 00:01:46,760
+an SVM with a linear kernel,
+了线性核的SVM(支持向量机)
+
+56
+00:01:47,180 --> 00:01:48,330
+what that means is you know, they use
+这就意味这 他使用了
+
+57
+00:01:48,490 --> 00:01:50,690
+an SVM without using without
+不带有
+
+58
+00:01:51,020 --> 00:01:52,250
+using a kernel and it
+核函数的SVM(支持向量机)
+
+59
+00:01:52,360 --> 00:01:53,410
+was a version of the SVM
+这是一个
+
+60
+00:01:54,120 --> 00:01:55,870
+that just uses theta transpose X, right,
+只是用了θTX
+
+61
+00:01:56,140 --> 00:01:57,620
+that predicts 1 theta 0
+预测1 θ0
+
+62
+00:01:57,850 --> 00:01:59,420
+plus theta 1 X1
++θ1X1
+
+63
+00:01:59,740 --> 00:02:01,000
+plus so on plus theta
++...+θnXn
+
+64
+00:02:01,690 --> 00:02:04,160
+N, X N is greater than equals 0.
+这个式子大于等于0
+
+65
+00:02:05,520 --> 00:02:06,830
+This term linear kernel, you
+这个内核线性参数
+
+66
+00:02:06,950 --> 00:02:08,250
+can think of this as you know this
+你可以把它想象成
+
+67
+00:02:08,480 --> 00:02:09,290
+is the version of the SVM
+SVM的一个版本
+
+68
+00:02:10,340 --> 00:02:12,320
+that just gives you a standard linear classifier.
+它只是给你一个标准的线性分类器
+
+69
+00:02:13,940 --> 00:02:14,700
+So that would be one
+因此它可以成为一个
+
+70
+00:02:15,040 --> 00:02:16,160
+reasonable choice for some problems,
+解决一些问题的合理选择
+
+71
+00:02:17,130 --> 00:02:18,080
+and you know, there would be many software
+且你知道的 有许多软件
+
+72
+00:02:18,470 --> 00:02:20,900
+libraries, like linear, was
+库 如线性的是
+
+73
+00:02:21,210 --> 00:02:22,320
+one example, out of many,
+其中的一个例子
+
+74
+00:02:22,840 --> 00:02:23,880
+one example of a software library
+一个软件库的例子
+
+75
+00:02:24,560 --> 00:02:25,620
+that can train an SVM
+可以用来训练不带
+
+76
+00:02:25,980 --> 00:02:27,410
+without using a kernel, also
+内核参数的SVM 也
+
+77
+00:02:27,760 --> 00:02:29,470
+called a linear kernel.
+叫线性内核函数
+
+78
+00:02:29,850 --> 00:02:31,340
+So, why would you want to do this?
+那么你为什么想要做这样一件事儿呢?
+
+79
+00:02:31,410 --> 00:02:32,820
+If you have a large number of
+如果你有大量的
+
+80
+00:02:33,150 --> 00:02:34,280
+features, if N is
+特征值 如果N
+
+81
+00:02:34,430 --> 00:02:37,800
+large, and M the
+很大 且M
+
+82
+00:02:37,990 --> 00:02:39,590
+number of training examples is
+训练的样本数
+
+83
+00:02:39,670 --> 00:02:41,050
+small, then you know
+很小 那么
+
+84
+00:02:41,230 --> 00:02:42,300
+you have a huge number of
+你有大量的
+
+85
+00:02:42,360 --> 00:02:43,630
+features that if X, this is
+特征值 如果X是
+
+86
+00:02:43,710 --> 00:02:45,850
+an X is an Rn, Rn +1.
+X属于Rn+1
+
+87
+00:02:46,010 --> 00:02:46,940
+So if you have a
+那么如果你已经有
+
+88
+00:02:47,080 --> 00:02:48,700
+huge number of features already, with
+大量的特征值 而只有
+
+89
+00:02:48,800 --> 00:02:50,540
+a small training set, you know, maybe you
+很小的训练数据集 也许你
+
+90
+00:02:50,610 --> 00:02:51,430
+want to just fit a linear
+就只想牛和一个线性
+
+91
+00:02:51,710 --> 00:02:52,890
+decision boundary and not try
+的判定边界 而不会去
+
+92
+00:02:53,060 --> 00:02:54,420
+to fit a very complicated nonlinear
+拟合一个非常复杂的非线性
+
+93
+00:02:54,860 --> 00:02:56,980
+function, because might not have enough data.
+函数 因为没有足够的数据
+
+94
+00:02:57,560 --> 00:02:59,330
+And you might risk overfitting, if
+你可能会过度拟合 如果
+
+95
+00:02:59,470 --> 00:03:00,530
+you're trying to fit a very complicated function
+你试着拟合非常复杂的函数的话
+
+96
+00:03:01,540 --> 00:03:03,220
+in a very high dimensional feature space,
+在一个非常高维的特征空间中
+
+97
+00:03:03,980 --> 00:03:04,990
+but if your training set sample
+但是如果你的训练集样本
+
+98
+00:03:05,040 --> 00:03:07,120
+is small. So this
+很小的话 因此
+
+99
+00:03:07,340 --> 00:03:08,600
+would be one reasonable setting where
+这将是一个合理的设置 在此
+
+100
+00:03:08,740 --> 00:03:09,950
+you might decide to just
+你可以决定
+
+101
+00:03:10,700 --> 00:03:11,960
+not use a kernel, or
+不使用内核参数 或
+
+102
+00:03:12,250 --> 00:03:15,580
+equivalents to use what's called a linear kernel.
+一些被叫做线性内核函数的等价物
+
+103
+00:03:15,740 --> 00:03:16,740
+A second choice for the kernel that
+对于内核函数的第二个选择是
+
+104
+00:03:16,820 --> 00:03:18,010
+you might make, is this Gaussian
+你可以构建 这是一个高斯
+
+105
+00:03:18,370 --> 00:03:19,920
+kernel, and this is what we had previously.
+内核函数 这个是我们之前有的
+
+106
+00:03:21,270 --> 00:03:22,350
+And if you do this, then the
+如果你选择这个 那么
+
+107
+00:03:22,440 --> 00:03:23,130
+other choice you need to make
+你需要做的另外一个选择是
+
+108
+00:03:23,420 --> 00:03:25,980
+is to choose this parameter sigma squared
+选择一个参数σ的平方
+
+109
+00:03:26,850 --> 00:03:29,800
+when we also talk a little bit about the bias variance tradeoffs
+当我们开始讨论一些如何权衡偏差方差的时候
+
+110
+00:03:30,820 --> 00:03:32,360
+of how, if sigma squared is
+如果σ2
+
+111
+00:03:32,600 --> 00:03:33,890
+large, then you tend
+很大 那么你就很有可能
+
+112
+00:03:34,160 --> 00:03:35,580
+to have a higher bias, lower
+会有一个较大的误差 较低
+
+113
+00:03:35,770 --> 00:03:37,650
+variance classifier, but if
+方差的分类器 但是如果
+
+114
+00:03:37,800 --> 00:03:39,700
+sigma squared is small, then you
+σ2很小 那么你
+
+115
+00:03:40,060 --> 00:03:42,360
+have a higher variance, lower bias classifier.
+就会有较大的方差 较低误差的分类器
+
+116
+00:03:43,940 --> 00:03:45,350
+So when would you choose a Gaussian kernel?
+那么什么时候选择高斯内核函数呢?
+
+117
+00:03:46,210 --> 00:03:48,050
+Well, if your omission
+如果你忽略了
+
+118
+00:03:48,310 --> 00:03:49,540
+of features X, I mean
+特征值X 我的意思是
+
+119
+00:03:49,820 --> 00:03:51,370
+Rn, and if N
+Rn 如果N
+
+120
+00:03:51,570 --> 00:03:53,890
+is small, and, ideally, you know,
+值很小 很理想地
+
+121
+00:03:55,660 --> 00:03:57,110
+if n is large, right,
+如果n值很大
+
+122
+00:03:58,470 --> 00:04:00,170
+so that's if, you know, we have
+那么如果我们有
+
+123
+00:04:00,550 --> 00:04:02,340
+say, a two-dimensional training set,
+如一个二维的训练集
+
+124
+00:04:03,130 --> 00:04:04,880
+like the example I drew earlier.
+就像我前面讲到的例子一样
+
+125
+00:04:05,470 --> 00:04:08,320
+So n is equal to 2, but we have a pretty large training set.
+那么n等于2 但是我们有相当大的训练集
+
+126
+00:04:08,680 --> 00:04:09,770
+So, you know, I've drawn in a
+我已经有了一个
+
+127
+00:04:09,950 --> 00:04:10,890
+fairly large number of training examples,
+相当大的训练样本了
+
+128
+00:04:11,650 --> 00:04:12,410
+then maybe you want to use
+那么可能你想用
+
+129
+00:04:12,540 --> 00:04:14,400
+a kernel to fit a more
+一个内核函数去拟合一个更加
+
+130
+00:04:14,910 --> 00:04:16,260
+complex nonlinear decision boundary,
+复杂的非线性的判定边界
+
+131
+00:04:16,650 --> 00:04:18,750
+and the Gaussian kernel would be a fine way to do this.
+那么高斯内核函数是一个不错的选择
+
+132
+00:04:19,480 --> 00:04:20,610
+I'll say more towards the end
+我会在这个视频的后面
+
+133
+00:04:20,720 --> 00:04:22,570
+of the video, a little bit
+部分讲到更多 一些关于
+
+134
+00:04:22,660 --> 00:04:23,760
+more about when you might choose a
+什么时候你可以选择
+
+135
+00:04:23,970 --> 00:04:26,310
+linear kernel, a Gaussian kernel and so on.
+线性内核函数 高斯内核函数等
+
+136
+00:04:27,860 --> 00:04:29,740
+But if concretely, if you
+但是如果具体地你
+
+137
+00:04:30,040 --> 00:04:31,210
+decide to use a Gaussian
+决定用高斯
+
+138
+00:04:31,720 --> 00:04:33,910
+kernel, then here's what you need to do.
+内核函数的话 那么这里就是你需要做的
+
+139
+00:04:35,380 --> 00:04:36,550
+Depending on what support vector machine
+根据你所要用的支持向量机
+
+140
+00:04:37,280 --> 00:04:38,990
+software package you use, it
+软件包 这
+
+141
+00:04:39,100 --> 00:04:40,960
+may ask you to implement a
+可能需要你实现一个
+
+142
+00:04:41,070 --> 00:04:42,200
+kernel function, or to implement
+核函数 或者实现
+
+143
+00:04:43,060 --> 00:04:43,880
+the similarity function.
+相似的函数
+
+144
+00:04:45,020 --> 00:04:46,750
+So if you're using an
+因此 如果你用
+
+145
+00:04:47,010 --> 00:04:49,820
+octave or MATLAB implementation of
+octave或者Matlab来实现
+
+146
+00:04:50,000 --> 00:04:50,720
+an SVM, it may ask you
+支持向量机的话 那么就需要你
+
+147
+00:04:50,810 --> 00:04:52,560
+to provide a function to
+提供一个函数来
+
+148
+00:04:52,690 --> 00:04:54,680
+compute a particular feature of the kernel.
+计算核函数的特征值
+
+149
+00:04:55,110 --> 00:04:56,480
+So this is really computing f
+因此这个是在一个特定值i
+
+150
+00:04:56,770 --> 00:04:57,890
+subscript i for one
+的情况下来
+
+151
+00:04:58,220 --> 00:04:59,560
+particular value of i, where
+计算fi
+
+152
+00:05:00,570 --> 00:05:02,310
+f here is just a
+这里的f只是一个
+
+153
+00:05:02,330 --> 00:05:03,570
+single real number, so maybe
+简单的实数
+
+154
+00:05:03,840 --> 00:05:05,060
+I should move this better written
+也许最好是写成
+
+155
+00:05:05,250 --> 00:05:07,230
+f(i), but what you
+f(i) 但是你所
+
+156
+00:05:07,510 --> 00:05:08,130
+need to do is to write a kernel
+需要做的是写一个核
+
+157
+00:05:08,480 --> 00:05:09,530
+function that takes this input, you know,
+函数 让它把这个作为输入 你知道的
+
+158
+00:05:10,610 --> 00:05:11,910
+a training example or a
+一个训练样本 或者一个
+
+159
+00:05:12,020 --> 00:05:13,140
+test example whatever it takes
+测试样本 不管是什么它
+
+160
+00:05:13,280 --> 00:05:14,640
+in some vector X and takes
+把向量X作为输入
+
+161
+00:05:14,990 --> 00:05:16,220
+as input one of the
+把输入作为一种
+
+162
+00:05:16,370 --> 00:05:18,270
+landmarks and but
+标识 不过
+
+163
+00:05:18,880 --> 00:05:20,750
+only I've come down X1 and
+在这里我只写了X1和
+
+164
+00:05:20,950 --> 00:05:21,810
+X2 here, because the
+X2 因为这些
+
+165
+00:05:21,900 --> 00:05:23,750
+landmarks are really training examples as well.
+标识也是训练样本
+
+166
+00:05:24,470 --> 00:05:26,160
+But what you
+但是你所
+
+167
+00:05:26,400 --> 00:05:27,490
+need to do is write software that
+需要做的是写一个
+
+168
+00:05:27,670 --> 00:05:28,960
+takes this input, you know, X1, X2
+可以将这些X1,X2进行输入的软
+
+169
+00:05:29,150 --> 00:05:30,320
+and computes this sort
+并用它们来计算
+
+170
+00:05:30,580 --> 00:05:31,950
+of similarity function between them
+这个相似函数
+
+171
+00:05:32,530 --> 00:05:33,470
+and return a real number.
+之后返回一个实数
+
+172
+00:05:36,180 --> 00:05:37,430
+And so what some support vector machine
+因此一些支持向量机的
+
+173
+00:05:37,580 --> 00:05:39,040
+packages do is expect
+包所做的是期望
+
+174
+00:05:39,510 --> 00:05:40,860
+you to provide this kernel function
+你能提供一个核函数
+
+175
+00:05:41,410 --> 00:05:44,580
+that take this input you know, X1, X2 and returns a real number.
+能够输入X1, X2 并返回一个实数
+
+176
+00:05:45,580 --> 00:05:46,460
+And then it will take it from there
+从这里开始
+
+177
+00:05:46,850 --> 00:05:49,070
+and it will automatically generate all the features, and
+它将自动地生成所有特征变量
+
+178
+00:05:49,410 --> 00:05:51,480
+so automatically take X and
+自动利用特征变量X
+
+179
+00:05:51,600 --> 00:05:53,370
+map it to f1,
+并用你写的函数对应到f1
+
+180
+00:05:53,420 --> 00:05:54,420
+f2, down to f(m) using
+f2 一直到f(m)
+
+181
+00:05:54,750 --> 00:05:56,200
+this function that you write, and
+并且
+
+182
+00:05:56,310 --> 00:05:57,190
+generate all the features and
+生成所有特征变量
+
+183
+00:05:57,650 --> 00:05:59,080
+train the support vector machine from there.
+并从这儿开始训练支持向量机
+
+184
+00:05:59,870 --> 00:06:00,800
+But sometimes you do need to
+但是有些时候你却一定要
+
+185
+00:06:00,880 --> 00:06:04,710
+provide this function yourself.
+自己提供这个函数
+
+186
+00:06:05,680 --> 00:06:06,770
+Other if you are using the Gaussian kernel, some SVM implementations will also include the Gaussian kernel
+如果你使用高斯核函数 一些SVM的函数实现也会包括高斯核函数
+
+187
+00:06:06,980 --> 00:06:09,950
+and a
+和一
+
+188
+00:06:10,040 --> 00:06:10,990
+few other kernels as well, since
+些其他的核函数 这是因为
+
+189
+00:06:11,230 --> 00:06:13,580
+the Gaussian kernel is probably the most common kernel.
+高斯核函数可能是最常见的核函数
+
+190
+00:06:14,880 --> 00:06:16,290
+Gaussian and linear kernels are
+到目前为止高斯核函数和线性核函数是
+
+191
+00:06:16,380 --> 00:06:18,210
+really the two most popular kernels by far.
+最普遍的核函数
+
+192
+00:06:19,130 --> 00:06:20,230
+Just one implementational note.
+一个实现函数的注意事项
+
+193
+00:06:20,750 --> 00:06:21,820
+If you have features of very
+如果你有大小很不一样
+
+194
+00:06:22,080 --> 00:06:23,620
+different scales, it is important
+的特征变量
+
+195
+00:06:24,700 --> 00:06:26,270
+to perform feature scaling before
+在使用高斯函数之前
+
+196
+00:06:26,600 --> 00:06:27,780
+using the Gaussian kernel.
+将这些特征变量的大小按比例归一化
+
+197
+00:06:28,580 --> 00:06:29,180
+And here's why.
+这就是原因
+
+198
+00:06:30,150 --> 00:06:31,600
+If you imagine the computing
+如果假设你在计算
+
+199
+00:06:32,290 --> 00:06:33,570
+the norm between X and
+X和l之间的标量
+
+200
+00:06:33,790 --> 00:06:34,890
+l, right, so this term here,
+就是这样一个式子
+
+201
+00:06:35,390 --> 00:06:37,150
+and the numerator term over there.
+这里是一个计算的式子
+
+202
+00:06:38,300 --> 00:06:39,780
+What this is doing, the norm
+这个式子所算的就是
+
+203
+00:06:40,070 --> 00:06:40,930
+between X and l, that's really
+X和l之间的标量
+
+204
+00:06:41,130 --> 00:06:42,140
+saying, you know, let's compute the vector
+换句话说,计算一个向量
+
+205
+00:06:42,450 --> 00:06:43,290
+V, which is equal to
+V 这个向量V等于
+
+206
+00:06:43,410 --> 00:06:44,980
+X minus l. And then
+X减l 然后
+
+207
+00:06:45,250 --> 00:06:47,940
+let's compute the norm does
+计算向量V的标量
+
+208
+00:06:48,130 --> 00:06:49,080
+vector V, which is the
+这就是
+
+209
+00:06:49,170 --> 00:06:50,510
+difference between X. So the
+与X不同的地方 因此
+
+210
+00:06:50,580 --> 00:06:51,510
+norm of V is really
+V的标量等于
+
+211
+00:06:53,360 --> 00:06:54,140
+equal to V1 squared
+V1的平方
+
+212
+00:06:54,250 --> 00:06:55,610
+plus V2 squared plus
+加v2的平方加
+
+213
+00:06:55,830 --> 00:06:58,290
+dot dot dot, plus Vn squared.
+点点点 加Vn的平方
+
+214
+00:06:58,900 --> 00:07:00,320
+Because here X is in
+因为这里的X属于
+
+215
+00:07:01,060 --> 00:07:02,200
+Rn, or Rn
+Rn 或者
+
+216
+00:07:02,290 --> 00:07:05,180
+plus 1, but I'm going to ignore, you know, X0.
+Rn+1 但是我容易忽略x0
+
+217
+00:07:06,540 --> 00:07:08,420
+So, let's pretend X is
+因此我们假设X是属于
+
+218
+00:07:08,510 --> 00:07:10,800
+an Rn, square on
+Rn的 在
+
+219
+00:07:10,950 --> 00:07:12,320
+the left side is what makes this correct.
+左边方的平方就是正确的了
+
+220
+00:07:12,570 --> 00:07:14,090
+So this is equal
+因此这部分就等于
+
+221
+00:07:14,400 --> 00:07:16,120
+to that, right?
+那个部分
+
+222
+00:07:17,210 --> 00:07:18,710
+And so written differently, this is
+那么另一种不同的写法就是
+
+223
+00:07:18,850 --> 00:07:20,100
+going to be X1 minus l1
+X1减l1
+
+224
+00:07:20,290 --> 00:07:22,600
+squared, plus x2
+的平方加X2
+
+225
+00:07:22,910 --> 00:07:24,590
+minus l2 squared, plus
+减l2的平方 加
+
+226
+00:07:24,910 --> 00:07:26,580
+dot dot dot plus Xn minus
+点点点 加Xn减
+
+227
+00:07:27,130 --> 00:07:28,540
+ln squared.
+ln的平方
+
+228
+00:07:29,720 --> 00:07:30,790
+And now if your features
+现在如果你的特征向量
+
+229
+00:07:31,850 --> 00:07:33,460
+take on very different ranges of value.
+的值的范围很不一样
+
+230
+00:07:33,940 --> 00:07:35,150
+So take a housing
+就拿房价预测来
+
+231
+00:07:35,360 --> 00:07:37,180
+prediction, for example, if
+举例 如果
+
+232
+00:07:38,020 --> 00:07:40,490
+your data is some data about houses.
+你的数据是一些房价数据
+
+233
+00:07:41,420 --> 00:07:43,000
+And if X is in the
+如果X在
+
+234
+00:07:43,140 --> 00:07:44,660
+range of thousands of square
+成千上万平方英尺的范围内
+
+235
+00:07:44,950 --> 00:07:47,190
+feet, for the
+对于
+
+236
+00:07:48,010 --> 00:07:48,840
+first feature, X1.
+第一个特征变量X1
+
+237
+00:07:49,700 --> 00:07:51,630
+But if your second feature, X2 is the number of bedrooms.
+但是如果你的第二个特征向量X2是卧室的数量
+
+238
+00:07:52,540 --> 00:07:53,610
+So if this is in the
+且如果它在
+
+239
+00:07:53,730 --> 00:07:56,720
+range of one to five bedrooms, then
+一到五个卧室范围内 那么
+
+240
+00:07:57,810 --> 00:07:59,320
+X1 minus l1 is going to be huge.
+X1减l1将会很大
+
+241
+00:07:59,780 --> 00:08:00,820
+This could be like a thousand squared,
+这有可能上千数值的平方
+
+242
+00:08:01,000 --> 00:08:02,880
+whereas X2 minus l2
+然而X2减l2
+
+243
+00:08:03,200 --> 00:08:04,620
+is going to be much smaller and if
+将会变得很小 如果
+
+244
+00:08:04,750 --> 00:08:06,800
+that's the case, then in this term,
+是在这样的情况下的话 那么在这个式子中
+
+245
+00:08:08,320 --> 00:08:09,660
+those distances will be almost
+这些间距将几乎
+
+246
+00:08:10,060 --> 00:08:12,060
+essentially dominated by the
+都是由
+
+247
+00:08:12,570 --> 00:08:13,280
+sizes of the houses
+房子的大小来决定的
+
+248
+00:08:14,390 --> 00:08:15,760
+and the number of bathrooms would be largely ignored.
+从而护绿了卧室的数量
+
+249
+00:08:16,950 --> 00:08:18,060
+As so as, to avoid this in
+为了避免这种情况
+
+250
+00:08:18,230 --> 00:08:19,070
+order to make a machine work
+让机器算法得以
+
+251
+00:08:19,360 --> 00:08:21,890
+well, do perform future scaling.
+很好的实现 就需要进一步地缩放比例
+
+252
+00:08:23,420 --> 00:08:24,830
+And that will sure that the SVM
+这将会保证SVM
+
+253
+00:08:25,810 --> 00:08:27,020
+gives, you know, comparable amount of attention
+能考虑到
+
+254
+00:08:27,950 --> 00:08:28,870
+to all of your different features,
+所有不同的特征变量
+
+255
+00:08:29,190 --> 00:08:30,450
+and not just to in
+而不只是像
+
+256
+00:08:30,600 --> 00:08:31,870
+this example to size of
+例子中的那样
+
+257
+00:08:32,150 --> 00:08:33,440
+houses were big movement here the features.
+房子的大小影响特别大 这就是特征变量
+
+258
+00:08:34,700 --> 00:08:35,810
+When you try a support vector
+当你尝试支持向量机
+
+259
+00:08:36,110 --> 00:08:38,760
+machines chances are by
+时 目前为止你最有
+
+260
+00:08:38,970 --> 00:08:40,000
+far the two most common
+可能用到的两个最常用
+
+261
+00:08:40,460 --> 00:08:41,750
+kernels you use will
+的核函数是
+
+262
+00:08:41,850 --> 00:08:43,120
+be the linear kernel, meaning no
+线性核函数 意思就是
+
+263
+00:08:43,320 --> 00:08:45,600
+kernel, or the Gaussian kernel that we talked about.
+没有核参数的函数 或是我们所讨论到的高斯核函数
+
+264
+00:08:46,520 --> 00:08:47,390
+And just one note of warning
+这里有一个警告
+
+265
+00:08:47,900 --> 00:08:49,070
+which is that not all similarity
+不是所有你可能提出来
+
+266
+00:08:49,580 --> 00:08:50,590
+functions you might come up
+的相似函数
+
+267
+00:08:50,770 --> 00:08:52,520
+with are valid kernels.
+都是有效的核函数
+
+268
+00:08:53,450 --> 00:08:54,840
+And the Gaussian kernel and the linear
+高斯核函数和线性
+
+269
+00:08:55,090 --> 00:08:56,410
+kernel and other kernels that you
+核函数以及其你有时
+
+270
+00:08:56,710 --> 00:08:57,850
+sometimes others will use, all
+可能会用到的核函数 所有的
+
+271
+00:08:58,030 --> 00:08:59,840
+of them need to satisfy a technical condition.
+这些函数都需要满足一个技术条件
+
+272
+00:09:00,380 --> 00:09:02,510
+It's called Mercer's Theorem and
+它叫作默塞尔定理
+
+273
+00:09:02,630 --> 00:09:03,560
+the reason you need to this
+需要满足这个条件的原因是
+
+274
+00:09:03,710 --> 00:09:05,430
+is because support vector machine
+因为支持向量机
+
+275
+00:09:06,380 --> 00:09:08,140
+algorithms or implementations of the
+算法或者
+
+276
+00:09:08,480 --> 00:09:09,560
+SVM have lots of clever
+SVM的实现函数有许多熟练的
+
+277
+00:09:10,050 --> 00:09:11,380
+numerical optimization tricks.
+数值优化技巧
+
+278
+00:09:12,110 --> 00:09:13,270
+In order to solve for the
+为了有效地求解
+
+279
+00:09:13,340 --> 00:09:15,650
+parameter's theta efficiently and
+参数θ
+
+280
+00:09:16,590 --> 00:09:18,840
+in the original design envisaged,
+在最初的设想里
+
+281
+00:09:19,470 --> 00:09:21,010
+those are decision made to restrict
+这些决策都用以
+
+282
+00:09:21,540 --> 00:09:22,900
+our attention only to kernels
+将我们的注意力仅仅限制在
+
+283
+00:09:23,510 --> 00:09:25,860
+that satisfy this technical condition called Mercer's Theorem.
+可以满足默塞尔定理的核函数上
+
+284
+00:09:26,280 --> 00:09:27,360
+And what that does is, that
+这个定理所做的是
+
+285
+00:09:27,570 --> 00:09:28,540
+makes sure that all of these
+确保所有的
+
+286
+00:09:28,820 --> 00:09:30,270
+SVM packages, all of these SVM
+SVM包 所有的SVM
+
+287
+00:09:30,500 --> 00:09:32,210
+software packages can use the
+软件包能够用
+
+288
+00:09:32,310 --> 00:09:34,740
+large class of optimizations and
+大类的优化方法并
+
+289
+00:09:35,280 --> 00:09:37,470
+get the parameter theta very quickly.
+很快得到参数θ
+
+290
+00:09:39,320 --> 00:09:40,340
+So, what most people end up doing
+大多数人最后要做的是
+
+291
+00:09:40,840 --> 00:09:42,470
+is using either the linear
+用线性核函数
+
+292
+00:09:42,610 --> 00:09:44,210
+or Gaussian kernel, but there
+或者高斯核函数 但是
+
+293
+00:09:44,430 --> 00:09:45,610
+are a few other kernels that also
+有几个其他的几个核函数也
+
+294
+00:09:45,940 --> 00:09:47,460
+satisfy Mercer's theorem and
+满足默塞尔定理
+
+295
+00:09:47,560 --> 00:09:48,690
+that you may run across other
+你可能会遇到其他人使用这些核函数
+
+296
+00:09:48,850 --> 00:09:50,050
+people using, although I personally
+然而我个人
+
+297
+00:09:50,880 --> 00:09:53,780
+end up using other kernels you know, very, very rarely, if at all.
+最后是很少很少使用其他核函数
+
+298
+00:09:54,160 --> 00:09:56,990
+Just to mention some of the other kernels that you may run across.
+只是简单提及一下你可能会遇到的其他核函数
+
+299
+00:09:57,990 --> 00:10:00,300
+One is the polynomial kernel.
+一个是多项式核函数
+
+300
+00:10:01,570 --> 00:10:03,350
+And for that the similarity between
+X和l之间的
+
+301
+00:10:03,800 --> 00:10:05,520
+X and l is
+相似值
+
+302
+00:10:05,730 --> 00:10:06,760
+defined as, there are
+定义为 有
+
+303
+00:10:06,830 --> 00:10:07,880
+a lot of options, you can
+很多种选择 你可以
+
+304
+00:10:08,640 --> 00:10:10,370
+take X transpose l squared.
+用X的转置I的平方
+
+305
+00:10:10,960 --> 00:10:13,410
+So, here's one measure of how similar X and l are.
+那么这里就有一个估计X和l相似度的估量
+
+306
+00:10:13,610 --> 00:10:14,930
+If X and l are very close with
+如果X和l相互之间很接近
+
+307
+00:10:15,500 --> 00:10:18,260
+each other, then the inner product will tend to be large.
+那么这个内积就会很大
+
+308
+00:10:20,200 --> 00:10:21,870
+And so, you know, this is a slightly
+这是一个有些
+
+309
+00:10:23,080 --> 00:10:23,520
+unusual kernel.
+不寻常的核函数
+
+310
+00:10:24,000 --> 00:10:25,130
+That is not used that often, but
+它并不那么常用 但是
+
+311
+00:10:26,490 --> 00:10:29,190
+you may run across some people using it.
+你可能会见到有些人使用它
+
+312
+00:10:30,050 --> 00:10:31,810
+This is one version of a polynomial kernel.
+这是一个多项式核函数的变体
+
+313
+00:10:32,330 --> 00:10:35,090
+Another is X transpose l cubed.
+另外一个是X转置I 的立方
+
+314
+00:10:36,690 --> 00:10:38,780
+These are all examples of the polynomial kernel.
+这些是多项式核函数的所有例子
+
+315
+00:10:39,040 --> 00:10:41,270
+X transpose l plus 1 cubed.
+X的转置l加1的立方
+
+316
+00:10:42,560 --> 00:10:43,620
+X transpose l plus maybe
+X的转置加 可以
+
+317
+00:10:43,910 --> 00:10:44,930
+a number different then one 5
+是一个不同的数 然后一个5
+
+318
+00:10:44,970 --> 00:10:46,680
+and, you know, to the power of 4 and
+的4次方
+
+319
+00:10:47,700 --> 00:10:49,840
+so the polynomial kernel actually has two parameters.
+多项式内核函数实际上有两个参数
+
+320
+00:10:50,610 --> 00:10:53,020
+One is, what number do you add over here?
+一个是 你需要在这里加上一个什么样的数字
+
+321
+00:10:53,520 --> 00:10:53,920
+It could be 0.
+可能是0
+
+322
+00:10:54,430 --> 00:10:58,660
+This is really plus 0 over there, as well as what's the degree of the polynomial over there.
+这里就是一个0 这些是多项式的次数
+
+323
+00:10:58,680 --> 00:11:01,670
+So the degree power and these numbers.
+这些数字就是多项式的次数
+
+324
+00:11:02,250 --> 00:11:04,140
+And the more general form of the
+多项式核函数更一般的
+
+325
+00:11:04,280 --> 00:11:05,530
+polynomial kernel is X
+形式是X
+
+326
+00:11:05,720 --> 00:11:07,620
+transpose l, plus some
+转置I 加上一些
+
+327
+00:11:07,940 --> 00:11:11,510
+constant and then
+常数项 然后
+
+328
+00:11:11,800 --> 00:11:14,850
+to some degree in the
+是指数部分
+
+329
+00:11:15,060 --> 00:11:16,720
+X1 and so both
+(在X1上)这两个
+
+330
+00:11:16,940 --> 00:11:19,650
+of these are parameters for the polynomial kernel.
+都是多项式核函数的参数
+
+331
+00:11:20,510 --> 00:11:22,820
+So the polynomial kernel almost always
+多项式核函数几乎总是
+
+332
+00:11:23,350 --> 00:11:24,440
+or usually performs worse.
+或者经常执行的效果都比较差
+
+333
+00:11:24,820 --> 00:11:25,950
+And the Gaussian kernel does not
+高斯核函数用得比不是
+
+334
+00:11:26,270 --> 00:11:28,370
+use that much, but this is just something that you may run across.
+那么多 但是你有可能会碰到
+
+335
+00:11:29,320 --> 00:11:30,480
+Usually it is used only for
+通常它用在
+
+336
+00:11:30,750 --> 00:11:31,710
+data where X and l
+当X和l
+
+337
+00:11:32,000 --> 00:11:33,180
+are all strictly non negative,
+都是严格的非负数时
+
+338
+00:11:33,740 --> 00:11:34,720
+and so that ensures that these
+这样以保证这些
+
+339
+00:11:34,910 --> 00:11:36,710
+inner products are never negative.
+内积值永远不会是负数
+
+340
+00:11:37,850 --> 00:11:40,010
+And this captures the intuition that
+这扑捉到了
+
+341
+00:11:40,390 --> 00:11:41,340
+X and l are very similar
+X和l之间非常相似
+
+342
+00:11:41,540 --> 00:11:44,110
+to each other, then maybe the inter product between them will be large.
+也许它们之间的内积会很大
+
+343
+00:11:44,420 --> 00:11:45,590
+They have some other properties as well
+它们也有其他的一些性质
+
+344
+00:11:46,260 --> 00:11:48,080
+but people tend not to use it much.
+但是人们通常用得不多
+
+345
+00:11:49,130 --> 00:11:50,150
+And then, depending on what you're
+那么 根据你所做的
+
+346
+00:11:50,260 --> 00:11:51,210
+doing, there are other, sort of more
+也有其他一些更加
+
+347
+00:11:52,330 --> 00:11:54,950
+esoteric kernels as well, that you may come across.
+难懂的核函数 这你也有可能会碰到
+
+348
+00:11:55,670 --> 00:11:57,180
+You know, there's a string kernel, this
+有字符串核函数 这个
+
+349
+00:11:57,340 --> 00:11:58,430
+is sometimes used if your
+在你的
+
+350
+00:11:58,550 --> 00:12:01,350
+input data is text strings or other types of strings.
+输入数据是文本字符串或者其他类型的字符串时 有时会用到
+
+351
+00:12:02,270 --> 00:12:02,940
+There are things like the
+还有一些函数 如
+
+352
+00:12:03,260 --> 00:12:06,000
+chi-square kernel, the histogram intersection kernel, and so on.
+卡方核函数 直方相交核函数 等等
+
+353
+00:12:06,690 --> 00:12:08,420
+There are sort of more esoteric kernels that
+有一些难懂的核函数
+
+354
+00:12:08,660 --> 00:12:09,840
+you can use to measure similarity
+这样一些函数你可以用来估量
+
+355
+00:12:10,760 --> 00:12:12,030
+between different objects.
+不同对象之间的相似性
+
+356
+00:12:12,660 --> 00:12:13,800
+So for example, if you're trying to
+例如 如果你在尝试
+
+357
+00:12:14,380 --> 00:12:15,840
+do some sort of text classification
+做一些文本分类的
+
+358
+00:12:16,170 --> 00:12:17,060
+problem, where the input
+问题 这个问题中输入
+
+359
+00:12:17,200 --> 00:12:19,300
+x is a string then
+变量X是一个字符串 然后
+
+360
+00:12:19,490 --> 00:12:20,490
+maybe we want to find the
+我们也许想要通过字符串核函数来找到
+
+361
+00:12:20,550 --> 00:12:22,050
+similarity between two strings
+两个字符串间的相似度
+
+362
+00:12:22,430 --> 00:12:24,240
+using the string kernel, but I
+但是我
+
+363
+00:12:24,520 --> 00:12:26,440
+personally you know end up very rarely,
+个人很少用这个
+
+364
+00:12:26,990 --> 00:12:29,340
+if at all, using these more esoteric kernels. I
+如果真要用的话 就用这些更加难懂的核函数 我
+
+365
+00:12:29,880 --> 00:12:30,970
+think I might have use the chi-square
+认为我可以用卡方
+
+366
+00:12:31,170 --> 00:12:32,270
+kernel, may be once in
+核函数 也许是我
+
+367
+00:12:32,340 --> 00:12:33,670
+my life and the histogram kernel,
+人生唯一一次 和直方核函数
+
+368
+00:12:34,240 --> 00:12:35,580
+may be once or twice in my life. I've
+也许是我人生中的一次或者两次用它 我
+
+369
+00:12:35,630 --> 00:12:38,500
+actually never used the string kernel myself. But in
+实际上从来不用字符串核函数 但是
+
+370
+00:12:39,350 --> 00:12:41,560
+case you've run across this in other applications. You know, if
+以防万一你已经在其他应用中碰到了这样的情况 如果
+
+371
+00:12:42,700 --> 00:12:43,640
+you do a quick web
+你快速搜索网页
+
+372
+00:12:43,860 --> 00:12:44,850
+search we do a quick Google
+用快速的谷歌
+
+373
+00:12:45,040 --> 00:12:46,000
+search or quick Bing search
+搜索或者快速的Bing搜索
+
+374
+00:12:46,590 --> 00:12:48,240
+you should have found definitions that these are the kernels as well. So
+你应该已经发现这些定义也是核函数
+
+375
+00:12:51,480 --> 00:12:55,680
+just two last details I want to talk about in this video. One in multiclass classification. So, you
+我想要在这个视频里讨论的最后两个细节 一个是在多类分类中
+
+376
+00:12:56,370 --> 00:12:59,510
+have four classes or more generally
+你有4个类别或者更一般的是
+
+377
+00:12:59,800 --> 00:13:01,880
+3 classes output some appropriate
+3(K)个类别 输出的是一些在你的多个类别间恰当的
+
+378
+00:13:02,530 --> 00:13:06,860
+decision bounday between your multiple classes. Most SVM, many SVM
+判定边界 大部分的SVM 许多SVM
+
+379
+00:13:07,220 --> 00:13:08,750
+packages already have built-in
+包已经建立了
+
+380
+00:13:09,030 --> 00:13:10,430
+multiclass classification functionality. So
+多类分类的函数了 因此
+
+381
+00:13:11,100 --> 00:13:12,060
+if your using a pattern like
+如果你使用一个类似与这样的一个模式
+
+382
+00:13:12,270 --> 00:13:13,320
+that, you just use the
+你只是用了
+
+383
+00:13:13,540 --> 00:13:15,370
+both that functionality and that
+这样的函数 且
+
+384
+00:13:15,490 --> 00:13:16,940
+should work fine. Otherwise,
+应该会工作得很好 此外
+
+385
+00:13:17,790 --> 00:13:18,790
+one way to do this
+实现这个的一个方式
+
+386
+00:13:19,000 --> 00:13:19,880
+is to use the one
+是用 one-vs
+
+387
+00:13:20,000 --> 00:13:21,280
+versus all method that we
+.all method(一对多) 的方法 这个我们在
+
+388
+00:13:21,370 --> 00:13:23,690
+talked about when we are developing logistic regression. So
+讲解逻辑回归的时候讨论过
+
+389
+00:13:24,680 --> 00:13:25,410
+what you do is you trade
+所以你要做的是训练
+
+390
+00:13:26,160 --> 00:13:27,550
+kSVM's if you have
+kSVM 如果你有
+
+391
+00:13:27,700 --> 00:13:29,190
+k classes, one to distinguish
+k个类别 用以将
+
+392
+00:13:29,900 --> 00:13:31,060
+each of the classes from the rest.
+每个类别从其他的类别中区分开来
+
+393
+00:13:31,850 --> 00:13:32,930
+And this would give you k parameter
+这个会给你K参数的
+
+394
+00:13:33,520 --> 00:13:34,530
+vectors, so this will
+向量 这个会
+
+395
+00:13:34,680 --> 00:13:36,210
+give you, upi lmpw. theta 1, which
+给你θ1 这会
+
+396
+00:13:36,530 --> 00:13:38,170
+is trying to distinguish class y equals
+尝试从所有其他
+
+397
+00:13:38,630 --> 00:13:39,980
+one from all of
+类别中识别出y=1的类别
+
+398
+00:13:40,130 --> 00:13:41,340
+the other classes, then you
+之后你
+
+399
+00:13:41,420 --> 00:13:42,910
+get the second parameter, vector
+得到第二个参数 向量
+
+400
+00:13:42,970 --> 00:13:43,910
+theta 2, which is what
+θ2 这个是
+
+401
+00:13:44,020 --> 00:13:45,420
+you get when you, you know, have
+在把
+
+402
+00:13:45,720 --> 00:13:47,080
+y equals 2 as the positive class
+y=2作为正类别
+
+403
+00:13:47,460 --> 00:13:48,680
+and all the others as negative class
+其他的作为负类别时 得到的
+
+404
+00:13:49,260 --> 00:13:50,550
+and so on up to
+以此类推
+
+405
+00:13:50,800 --> 00:13:52,400
+a parameter vector theta k,
+参数向量θk
+
+406
+00:13:52,750 --> 00:13:54,520
+which is the parameter vector for
+是用于
+
+407
+00:13:54,600 --> 00:13:56,770
+distinguishing the final class
+识别最后一个类别参数向量
+
+408
+00:13:57,360 --> 00:13:59,380
+key from anything else, and
+最后
+
+409
+00:13:59,490 --> 00:14:00,590
+then lastly, this is exactly
+这就与
+
+410
+00:14:01,270 --> 00:14:02,040
+the same as the one versus
+我们在逻辑回归中用到的
+
+411
+00:14:02,420 --> 00:14:04,230
+all method we have for logistic regression.
+一对多(one-vs-all)的方法一样
+
+412
+00:14:04,760 --> 00:14:05,910
+Where we you just predict the class
+在逻辑回归中我们只是
+
+413
+00:14:06,390 --> 00:14:07,690
+i with the largest theta
+用最大的θ转置X来预测类别i
+
+414
+00:14:08,030 --> 00:14:11,840
+transpose X. So let's multiclass classification designate.
+定义一下多类分类
+
+415
+00:14:12,440 --> 00:14:13,750
+For the more common cases
+对于更为常见的情况
+
+416
+00:14:14,300 --> 00:14:15,090
+that there is a good
+很有
+
+417
+00:14:15,180 --> 00:14:16,460
+chance that whatever software package
+可能不过你使用神马软件包
+
+418
+00:14:16,780 --> 00:14:18,010
+you use, you know, there will be
+都有可能
+
+419
+00:14:18,340 --> 00:14:19,650
+a reasonable chance that are already
+已经建立了
+
+420
+00:14:19,920 --> 00:14:21,740
+have built in multiclass classification functionality,
+多类分类的函数功能
+
+421
+00:14:21,920 --> 00:14:24,410
+and so you don't need to worry about this result.
+因此你不用去当心这种资源
+
+422
+00:14:25,280 --> 00:14:27,010
+Finally, we developed support vector
+最后 我们开展
+
+423
+00:14:27,210 --> 00:14:28,650
+machines starting off with logistic
+以逻辑回归开始的支持向量机
+
+424
+00:14:29,090 --> 00:14:31,500
+regression and then modifying the cost function a little bit.
+然后改造一下代价函数
+
+425
+00:14:31,910 --> 00:14:34,900
+The last thing we want to do in this video is, just say a little bit about.
+最后我想要在这个视频中讨论一点的是
+
+426
+00:14:35,550 --> 00:14:36,570
+when you will use one of
+当你要使用
+
+427
+00:14:36,660 --> 00:14:38,840
+these two algorithms, so let's
+这两个算法中的一个时
+
+428
+00:14:39,080 --> 00:14:40,000
+say n is the number
+假设n是一个
+
+429
+00:14:40,160 --> 00:14:42,000
+of features and m is the number of training examples.
+特征变量 m是一个训练样本
+
+430
+00:14:43,190 --> 00:14:45,250
+So, when should we use one algorithm versus the other?
+那么我们什么时候用哪一个呢?
+
+431
+00:14:47,130 --> 00:14:48,430
+Well, if n is larger
+如果n
+
+432
+00:14:48,980 --> 00:14:50,140
+relative to your training set
+相对于你的训练数据集大小较大的时候
+
+433
+00:14:50,360 --> 00:14:51,390
+size, so for example,
+比如
+
+434
+00:14:52,810 --> 00:14:53,990
+if you take a business
+如果你有一个
+
+435
+00:14:54,250 --> 00:14:55,180
+with a number of features this is
+特征变量的数值 这个数值
+
+436
+00:14:55,330 --> 00:14:56,870
+much larger than m and this
+比m要大很多
+
+437
+00:14:57,120 --> 00:14:58,210
+might be, for example, if you
+这可以是 比如说 如果你
+
+438
+00:14:58,320 --> 00:15:00,590
+have a text classification problem, where
+有一个文本分类的问题 在这里
+
+439
+00:15:01,550 --> 00:15:02,430
+you know, the dimension of the feature
+特征向量的维数
+
+440
+00:15:02,700 --> 00:15:04,160
+vector is I don't know, maybe, 10 thousand.
+我是不知道的 有可能是1万(10千)
+
+441
+00:15:05,370 --> 00:15:06,350
+And if your training
+且如果你的训练
+
+442
+00:15:06,720 --> 00:15:08,290
+set size is maybe 10
+集的大小有可能是10
+
+443
+00:15:08,510 --> 00:15:10,250
+you know, maybe, up to 1000.
+到1000范围内
+
+444
+00:15:10,500 --> 00:15:12,140
+So, imagine a spam
+想象一下垃圾邮件
+
+445
+00:15:12,320 --> 00:15:14,250
+classification problem, where email
+的分类问题 在这个问题中
+
+446
+00:15:14,510 --> 00:15:15,840
+spam, where you have 10,000
+你有 (1万)10,000
+
+447
+00:15:16,150 --> 00:15:18,010
+features corresponding to 10,000 words
+个与10,000(1万)个单词对应的特征向量
+
+448
+00:15:18,190 --> 00:15:19,550
+but you have, you know, maybe 10
+但是你可能有10
+
+449
+00:15:19,780 --> 00:15:21,150
+training examples or maybe up to 1,000 examples.
+训练样本 也可能有多达 1000个训练样本
+
+450
+00:15:22,450 --> 00:15:23,750
+So if n is large relative to
+如果n相对于
+
+451
+00:15:23,890 --> 00:15:25,090
+m, then what I
+m足够大的话 那么我
+
+452
+00:15:25,250 --> 00:15:26,480
+would usually do is use logistic
+通常所做的就是使用逻辑
+
+453
+00:15:26,850 --> 00:15:27,990
+regression or use it
+回归 或者
+
+454
+00:15:28,100 --> 00:15:29,030
+as the m without a kernel or
+用不带核函数的m 或者
+
+455
+00:15:29,460 --> 00:15:30,790
+use it with a linear kernel.
+使用线性核函数
+
+456
+00:15:31,620 --> 00:15:32,430
+Because, if you have so many
+因为 如果你有许多
+
+457
+00:15:32,580 --> 00:15:33,830
+features with smaller training sets, you know,
+特征变量 而有相对较小的训练集
+
+458
+00:15:34,530 --> 00:15:35,870
+a linear function will probably
+线性函数可能会
+
+459
+00:15:36,330 --> 00:15:37,380
+do fine, and you don't have
+工作得很好 而且你也没有
+
+460
+00:15:37,640 --> 00:15:38,790
+really enough data to
+足够的数据
+
+461
+00:15:38,910 --> 00:15:40,760
+fit a very complicated nonlinear function.
+来拟合非常复杂的非线性函数
+
+462
+00:15:41,340 --> 00:15:42,410
+Now if is n is
+现在如果n
+
+463
+00:15:42,520 --> 00:15:44,020
+small and m is
+较小 m
+
+464
+00:15:44,350 --> 00:15:45,890
+intermediate what I mean
+大小适中 我的意思是
+
+465
+00:15:45,940 --> 00:15:47,450
+by this is n is
+在这里n
+
+466
+00:15:48,040 --> 00:15:50,350
+maybe anywhere from 1 - 1000, 1 would be very small.
+可能是1-1000之间的任何数 1会很小
+
+467
+00:15:50,530 --> 00:15:51,470
+But maybe up to 1000
+也许也会到1000个
+
+468
+00:15:51,700 --> 00:15:54,270
+features and if
+变量 如果
+
+469
+00:15:54,590 --> 00:15:56,180
+the number of training
+训练样本的数量
+
+470
+00:15:56,330 --> 00:15:57,700
+examples is maybe anywhere from
+可能是从
+
+471
+00:15:58,210 --> 00:16:00,750
+10, you know, 10 to maybe up to 10,000 examples.
+10 也许是从10到10,000中的任何一个数值
+
+472
+00:16:01,350 --> 00:16:03,160
+Maybe up to 50,000 examples.
+也许多达(5万)50,000个样本
+
+473
+00:16:03,630 --> 00:16:06,490
+If m is pretty big like maybe 10,000 but not a million.
+如果m做够大 如可能是(一万)10,000 但是不是一百万
+
+474
+00:16:06,760 --> 00:16:08,100
+Right? So if m is an
+因此如果m是一个
+
+475
+00:16:08,300 --> 00:16:09,950
+intermediate size then often
+大小合适的数值 那么通常
+
+476
+00:16:10,790 --> 00:16:12,980
+an SVM with a linear kernel will work well.
+线性核函数的SVM会工作得很好
+
+477
+00:16:13,530 --> 00:16:14,580
+We talked about this early as
+这个我们在这之前也讨论过
+
+478
+00:16:14,710 --> 00:16:15,800
+well, with the one concrete example,
+一个具体的例子
+
+479
+00:16:16,350 --> 00:16:17,100
+this would be if you have
+是这样一个例子 如果你有
+
+480
+00:16:17,520 --> 00:16:19,720
+a two dimensional training set. So, if n
+一个二维的训练集 如果n
+
+481
+00:16:19,900 --> 00:16:21,010
+is equal to 2 where you
+等于2 在这里你
+
+482
+00:16:21,320 --> 00:16:23,710
+have, you know, drawing in a pretty large number of training examples.
+可以画相当多的训练样本
+
+483
+00:16:24,710 --> 00:16:25,860
+So Gaussian kernel will do
+高斯核函数可以
+
+484
+00:16:26,130 --> 00:16:28,160
+a pretty good job separating positive and negative classes.
+很好地把正类和负类区分开来
+
+485
+00:16:29,770 --> 00:16:30,890
+One third setting that's of
+第三种有趣的情况是
+
+486
+00:16:30,980 --> 00:16:32,420
+interest is if n is
+如果n很
+
+487
+00:16:32,520 --> 00:16:34,270
+small but m is large.
+小 但是m很大
+
+488
+00:16:34,890 --> 00:16:36,560
+So if n is you know, again maybe
+如果n也是
+
+489
+00:16:37,390 --> 00:16:39,280
+1 to 1000, could be larger.
+1到1000之间的数 可能会更大一点
+
+490
+00:16:40,200 --> 00:16:42,750
+But if m was, maybe
+但是如果m是
+
+491
+00:16:43,320 --> 00:16:46,400
+50,000 and greater to millions.
+(5万)50,000 或者更大 大到上百万
+
+492
+00:16:47,520 --> 00:16:50,270
+So, 50,000, a 100,000, million, trillion.
+(5万) 50,000, (10万)100,000, 百万,万亿
+
+493
+00:16:51,290 --> 00:16:54,020
+You have very very large training set sizes, right.
+你有很大很大的数据训练集
+
+494
+00:16:55,240 --> 00:16:56,160
+So if this is the case,
+如果是这样的情况
+
+495
+00:16:56,380 --> 00:16:57,630
+then a SVM of the
+那么高斯核函数的支持向量机
+
+496
+00:16:57,900 --> 00:16:59,850
+Gaussian Kernel will be somewhat slow to run.
+运行起来就会很慢
+
+497
+00:17:00,160 --> 00:17:02,300
+Today's SVM packages, if you're
+如果你用高斯核函数的话 你就会知道今天的SVM包
+
+498
+00:17:02,410 --> 00:17:04,900
+using a Gaussian Kernel, tend to struggle a bit.
+运行这样的函数会很慢
+
+499
+00:17:05,050 --> 00:17:06,250
+If you have, you know, maybe 50
+如果你有50 千(5万)
+
+500
+00:17:06,590 --> 00:17:07,530
+thousands okay, but if you
+但是你
+
+501
+00:17:07,620 --> 00:17:10,250
+have a million training examples, maybe
+有百万个训练样本 也许
+
+502
+00:17:10,450 --> 00:17:11,950
+or even a 100,000 with a
+甚至是100,000(十万)个
+
+503
+00:17:12,170 --> 00:17:13,730
+massive value of m. Today's
+m值很大的训练样本 今天的
+
+504
+00:17:14,180 --> 00:17:15,590
+SVM packages are very good,
+SVM包会工作得很好
+
+505
+00:17:15,870 --> 00:17:17,100
+but they can still struggle
+但是它们仍然会有
+
+506
+00:17:17,600 --> 00:17:18,400
+a little bit when you have a
+一些慢 当你有
+
+507
+00:17:19,010 --> 00:17:20,940
+massive, massive trainings that size when using a Gaussian Kernel.
+非常非常大的训练集 且用高斯核函数是
+
+508
+00:17:22,050 --> 00:17:23,150
+So in that case, what I
+在这种情况下 我
+
+509
+00:17:23,350 --> 00:17:24,960
+would usually do is try to just
+经常会做的是尝试
+
+510
+00:17:25,330 --> 00:17:26,660
+manually create have more
+手动地创建 拥有更多
+
+511
+00:17:26,800 --> 00:17:28,600
+features and then use
+的特征变量 然后用
+
+512
+00:17:28,930 --> 00:17:30,340
+logistic regression or an SVM
+逻辑回归或者
+
+513
+00:17:30,630 --> 00:17:32,060
+without the Kernel.
+不带核函数的支持向量机
+
+514
+00:17:33,140 --> 00:17:34,030
+And in case you look at this
+如果你看到这张
+
+515
+00:17:34,230 --> 00:17:35,900
+slide and you see logistic regression
+幻灯片 看到了逻辑回归
+
+516
+00:17:36,460 --> 00:17:37,750
+or SVM without a kernel.
+或者不带核函数的支持向量机
+
+517
+00:17:38,510 --> 00:17:39,890
+In both of these places, I
+在这个两个地方
+
+518
+00:17:39,980 --> 00:17:41,750
+kind of paired them together. There's
+我把它们放在一起
+
+519
+00:17:42,060 --> 00:17:43,050
+a reason for that, is that
+是有原因的 原因是
+
+520
+00:17:43,900 --> 00:17:45,640
+logistic regression and SVM without
+逻辑回归和不带核函数的支持向量机
+
+521
+00:17:46,000 --> 00:17:47,130
+the kernel, those are really pretty
+它们都是非常
+
+522
+00:17:47,350 --> 00:17:49,450
+similar algorithms and, you know, either
+相似的算法 不管是
+
+523
+00:17:49,680 --> 00:17:51,170
+logistic regression or SVM
+逻辑回归还是
+
+524
+00:17:51,500 --> 00:17:53,230
+without a kernel will usually do
+不带核函数的SVM通常都会做
+
+525
+00:17:53,380 --> 00:17:54,780
+pretty similar things and give
+相似的事情 并给出
+
+526
+00:17:54,900 --> 00:17:56,690
+pretty similar performance, but depending
+相似的结果 但是根据
+
+527
+00:17:57,060 --> 00:18:00,340
+on your implementational details, one may be more efficient than the other.
+你实现的情况 其中一个可能会比另一个更加有效
+
+528
+00:18:00,930 --> 00:18:02,220
+But, where one of
+但是在其中一个
+
+529
+00:18:02,310 --> 00:18:03,530
+these algorithms applies, logistic
+算法应用的地方 逻辑
+
+530
+00:18:03,740 --> 00:18:05,190
+regression where SVM without a
+回归或不带
+
+531
+00:18:05,420 --> 00:18:05,840
+kernel, the other one is to likely
+核函数的SVM 另一个也很有可能
+
+532
+00:18:06,650 --> 00:18:07,600
+to work pretty well as well.
+很有效
+
+533
+00:18:08,540 --> 00:18:09,660
+But along with the power of
+但是随着
+
+534
+00:18:09,720 --> 00:18:11,610
+the SVM is when you
+SVM的复杂度增加 当你
+
+535
+00:18:11,810 --> 00:18:14,100
+use different kernels to learn
+使用不同的内核函数来学习
+
+536
+00:18:14,430 --> 00:18:15,860
+complex nonlinear functions.
+复杂的非线性函数时
+
+537
+00:18:16,680 --> 00:18:20,300
+And this regime, you know, when you
+这个体系 你知道的 当你
+
+538
+00:18:20,550 --> 00:18:22,530
+have maybe up to 10,000 examples, maybe up to 50,000.
+有多达1万(10,000)的样本时 也可能是5万(50,000)
+
+539
+00:18:22,610 --> 00:18:25,010
+And your number of features,
+你特征变量的数量
+
+540
+00:18:26,580 --> 00:18:27,540
+this is reasonably large.
+这是相当大的
+
+541
+00:18:27,840 --> 00:18:29,230
+That's a very common regime
+那是一个非常常见的体系
+
+542
+00:18:29,670 --> 00:18:30,910
+and maybe that's a regime
+也许在这个体系里
+
+543
+00:18:31,430 --> 00:18:33,830
+where a support vector machine with a kernel kernel will shine.
+不带核函数的支持向量机就会表现得相当突出
+
+544
+00:18:34,320 --> 00:18:35,640
+You can do things that are much
+你可以做比这
+
+545
+00:18:35,860 --> 00:18:39,850
+harder to do that will need logistic regression.
+困难得多需要逻辑回归的事情
+
+546
+00:18:40,100 --> 00:18:40,930
+And finally, where do neural networks fit in?
+最后 神经网络使用于什么时候呢?
+
+547
+00:18:41,120 --> 00:18:42,230
+Well for all of these
+对于所有的这些
+
+548
+00:18:42,440 --> 00:18:43,890
+problems, for all of
+问题 对于所有的
+
+549
+00:18:43,960 --> 00:18:46,310
+these different regimes, a well
+这些不同体系 一个
+
+550
+00:18:46,630 --> 00:18:49,110
+designed neural network is likely to work well as well.
+设计得很好的神经网络也很有可能会非常有效
+
+551
+00:18:50,320 --> 00:18:51,700
+The one disadvantage, or the one
+有一个缺点是 或者说是
+
+552
+00:18:51,830 --> 00:18:52,980
+reason that might not sometimes use
+有时可能不会使用
+
+553
+00:18:53,220 --> 00:18:54,690
+the neural network is that,
+神经网络的原因是
+
+554
+00:18:54,920 --> 00:18:56,080
+for some of these problems, the
+对于许多这样的问题
+
+555
+00:18:56,180 --> 00:18:57,640
+neural network might be slow to train.
+神经网络训练起来可能会特别慢
+
+556
+00:18:58,250 --> 00:18:59,080
+But if you have a very good
+但是如果你有一个非常好的
+
+557
+00:18:59,350 --> 00:19:01,190
+SVM implementation package, that
+SVM实现包 它
+
+558
+00:19:01,400 --> 00:19:04,120
+could run faster, quite a bit faster than your neural network.
+可能会运行得比较快 比神经网络快很多
+
+559
+00:19:05,130 --> 00:19:06,130
+And, although we didn't show this
+尽管我们在此之前没有展示
+
+560
+00:19:06,350 --> 00:19:07,520
+earlier, it turns out that
+但是事实证明
+
+561
+00:19:07,630 --> 00:19:09,800
+the optimization problem that the
+SVM具有的优化问题
+
+562
+00:19:10,070 --> 00:19:11,120
+SVM has is a convex
+是一种凸
+
+563
+00:19:12,320 --> 00:19:13,830
+optimization problem and so the
+优化问题 因此
+
+564
+00:19:14,410 --> 00:19:15,800
+good SVM optimization software
+好的SVM优化软件
+
+565
+00:19:16,160 --> 00:19:17,870
+packages will always find
+包总是会找到
+
+566
+00:19:18,240 --> 00:19:21,370
+the global minimum or something close to it.
+全局最小值 或者接近它的值
+
+567
+00:19:21,720 --> 00:19:24,100
+And so for the SVM you don't need to worry about local optima.
+对于SVM 你不需要担心局部最优
+
+568
+00:19:25,280 --> 00:19:26,440
+In practice local optima aren't
+在实际应用中 局部最优不是
+
+569
+00:19:26,580 --> 00:19:27,920
+a huge problem for neural networks
+神经网络所需要解决的一个重大问题
+
+570
+00:19:28,090 --> 00:19:29,120
+but they all solve, so this
+所以这是
+
+571
+00:19:29,310 --> 00:19:31,520
+is one less thing to worry about if you're using an SVM.
+你在使用SVM的时候不需要太去担心的一个问题
+
+572
+00:19:33,350 --> 00:19:34,560
+And depending on your problem, the neural
+根据你的问题 神经
+
+573
+00:19:34,910 --> 00:19:37,050
+network may be slower, especially
+网络可能会比SVM慢 尤其是
+
+574
+00:19:37,580 --> 00:19:41,020
+in this sort of regime than the SVM.
+在这样一个体系中
+
+575
+00:19:41,420 --> 00:19:42,200
+In case the guidelines they gave
+至于这里给出的参考
+
+576
+00:19:42,520 --> 00:19:43,500
+here, seem a little bit vague
+看上去有些模糊
+
+577
+00:19:43,860 --> 00:19:44,600
+and if you're looking at some problems, you know,
+如果你在考虑一些问题
+
+578
+00:19:46,930 --> 00:19:48,050
+the guidelines are a bit
+这些参考会有一些
+
+579
+00:19:48,170 --> 00:19:49,190
+vague, I'm still not entirely
+模糊 但是我仍然不能完全
+
+580
+00:19:49,570 --> 00:19:50,730
+sure, should I use this
+确定 我是该用这个
+
+581
+00:19:50,780 --> 00:19:52,690
+algorithm or that algorithm, that's actually okay.
+算法还是改用那个算法 这个没有太大关系
+
+582
+00:19:52,950 --> 00:19:54,100
+When I face a machine learning
+当我遇到机器学习
+
+583
+00:19:54,330 --> 00:19:55,570
+problem, you know, sometimes its actually
+问题的时候 有时它确实
+
+584
+00:19:55,730 --> 00:19:57,010
+just not clear whether that's the
+不清楚这是否是
+
+585
+00:19:57,150 --> 00:19:58,700
+best algorithm to use, but as
+最好的算法 但是就如
+
+586
+00:19:59,540 --> 00:20:00,590
+you saw in the earlier videos, really,
+在之前的视频中看到的
+
+587
+00:20:01,200 --> 00:20:02,470
+you know, the algorithm does
+算法确实
+
+588
+00:20:02,700 --> 00:20:03,920
+matter, but what often matters
+很重要 但是经常
+
+589
+00:20:04,250 --> 00:20:06,400
+even more is things like, how much data do you have.
+更加重要的是 你有多少数据
+
+590
+00:20:07,090 --> 00:20:08,280
+And how skilled are you, how
+你有多熟练
+
+591
+00:20:08,450 --> 00:20:09,500
+good are you at doing error
+是否擅长做误差
+
+592
+00:20:09,750 --> 00:20:11,450
+analysis and debugging learning
+分析和排除学习
+
+593
+00:20:11,660 --> 00:20:13,090
+algorithms, figuring out how
+算法 指出如何
+
+594
+00:20:13,220 --> 00:20:15,120
+to design new features and
+设定新的特征变量
+
+595
+00:20:15,280 --> 00:20:17,540
+figuring out what other features to give you learning algorithms and so on.
+和找出其他能决定你学习算法的变量等方面
+
+596
+00:20:17,960 --> 00:20:19,110
+And often those things will matter
+通常这些方面会比
+
+597
+00:20:19,660 --> 00:20:20,700
+more than what you are
+你使用
+
+598
+00:20:20,840 --> 00:20:22,370
+using logistic regression or an SVM.
+逻辑回归还是SVM这方面更加重要
+
+599
+00:20:23,280 --> 00:20:24,650
+But having said that,
+但是 已经说过了
+
+600
+00:20:25,010 --> 00:20:26,180
+the SVM is still widely
+SVM仍然被广泛
+
+601
+00:20:26,630 --> 00:20:27,890
+perceived as one of
+认为是一种
+
+602
+00:20:27,950 --> 00:20:29,600
+the most powerful learning algorithms, and
+最强大的学习算法
+
+603
+00:20:29,740 --> 00:20:31,570
+there is this regime of when there's
+这是一个体系 包含了什么时候
+
+604
+00:20:31,790 --> 00:20:34,340
+a very effective way to learn complex non linear functions.
+一个有效的方法去学习复杂的非线性函数
+
+605
+00:20:35,150 --> 00:20:36,840
+And so I actually, together with
+因此 实际上与
+
+606
+00:20:37,040 --> 00:20:38,930
+logistic regressions, neural networks, SVM's,
+逻辑回归 神经网络 SVM一起
+
+607
+00:20:39,090 --> 00:20:40,630
+using those to speed
+使用这些方法来提高
+
+608
+00:20:40,760 --> 00:20:42,170
+learning algorithms you're I think
+学习算法 我认为
+
+609
+00:20:42,440 --> 00:20:43,610
+very well positioned to build
+你会很好地建立
+
+610
+00:20:44,120 --> 00:20:45,120
+state of the art you know,
+很有技术的状态
+
+611
+00:20:45,310 --> 00:20:46,710
+machine learning systems for a wide
+机器学习系统对于一个宽泛的
+
+612
+00:20:46,960 --> 00:20:49,110
+region for applications and this
+应用领域来说 这是
+
+613
+00:20:49,330 --> 00:20:52,460
+is another very powerful tool to have in your arsenal.
+另一个在你军械库里非常强大的工具
+
+614
+00:20:53,160 --> 00:20:54,270
+One that is used all
+你可以把它应用到
+
+615
+00:20:54,460 --> 00:20:55,850
+over the place in Silicon Valley,
+很多地方 在硅谷
+
+616
+00:20:56,390 --> 00:20:58,030
+or in industry and in
+在工业 在
+
+617
+00:20:58,310 --> 00:20:59,860
+the Academia, to build many
+学术等领域 建立许多
+
+618
+00:21:00,120 --> 00:21:01,680
+high performance machine learning system.
+高性能的机器学习系统
+
diff --git a/srt/13 - 1 - Unsupervised Learninguction (3 min).srt b/srt/13 - 1 - Unsupervised Learninguction (3 min).srt
new file mode 100644
index 00000000..8115431a
--- /dev/null
+++ b/srt/13 - 1 - Unsupervised Learninguction (3 min).srt
@@ -0,0 +1,501 @@
+1
+00:00:00,090 --> 00:00:02,320
+In this video, I'd like to start to talk about clustering.
+在这个视频中 我将开始介绍聚类算法
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:03,420 --> 00:00:04,850
+This will be exciting because this
+这将是一个激动人心的时刻
+
+3
+00:00:04,930 --> 00:00:06,910
+is our first unsupervised learning algorithm
+因为这是我们学习的第一个非监督学习算法
+
+4
+00:00:07,350 --> 00:00:09,080
+where we learn from unlabeled data
+我们将要让计算机学习无标签数据
+
+5
+00:00:09,840 --> 00:00:10,730
+instead of the label data.
+而不是此前的标签数据
+
+6
+00:00:11,900 --> 00:00:13,300
+So, what is unsupervised learning?
+那么 什么是非监督学习呢
+
+7
+00:00:14,390 --> 00:00:15,630
+I briefly talked about unsupervised
+在课程的一开始
+
+8
+00:00:16,350 --> 00:00:17,470
+learning at the beginning of
+我曾简单的介绍过非监督学习
+
+9
+00:00:17,550 --> 00:00:18,560
+the class but it's useful
+然而 我们还是有必要
+
+10
+00:00:19,030 --> 00:00:20,320
+to contrast it with supervised learning. So, here's
+将其与监督学习做一下比较
+
+11
+00:00:21,760 --> 00:00:23,750
+a typical supervisory problem where
+在一个典型的监督学习中
+
+12
+00:00:24,030 --> 00:00:25,470
+we are given a label training sets
+我们有一个有标签的训练集
+
+13
+00:00:25,770 --> 00:00:27,470
+and the goal is to find
+我们的目标是找到
+
+14
+00:00:27,980 --> 00:00:29,420
+the decision boundary that separates the
+能够区分正样本和负样本的
+
+15
+00:00:29,530 --> 00:00:31,310
+positive label examples and the negative label examples.
+决策边界
+
+16
+00:00:33,100 --> 00:00:34,400
+The supervised learning problem in
+在这里的监督学习中
+
+17
+00:00:34,460 --> 00:00:35,710
+this case is given a
+我们有一系列标签
+
+18
+00:00:35,850 --> 00:00:38,270
+set of labels to fit a hypothesis to it.
+我们需要据此拟合一个假设函数
+
+19
+00:00:39,160 --> 00:00:40,560
+In contrast, in the unsupervised
+与此不同的是
+
+20
+00:00:41,080 --> 00:00:42,420
+learning problem, we're given
+在非监督学习中
+
+21
+00:00:42,710 --> 00:00:43,740
+data that does not have
+我们的数据没有
+
+22
+00:00:43,890 --> 00:00:45,270
+any labels associated with it.
+附带任何标签
+
+23
+00:00:46,730 --> 00:00:47,940
+So we're given data that looks
+我们拿到的数据就是这样的
+
+24
+00:00:48,100 --> 00:00:49,090
+like this, here's a set
+在这里
+
+25
+00:00:49,180 --> 00:00:50,470
+of points and then no labels.
+我们有一系列点 却没有标签
+
+26
+00:00:51,800 --> 00:00:52,860
+And so our training set is written
+因此 我们的训练集可以写成
+
+27
+00:00:53,220 --> 00:00:54,720
+just x1, x2 and
+只有x(1) x(2)
+
+28
+00:00:55,210 --> 00:00:56,890
+so on up to x(m)
+一直到 x(m)
+
+29
+00:00:57,450 --> 00:00:58,720
+and we don't get any labels y.
+我们没有任何标签y
+
+30
+00:00:59,540 --> 00:01:00,800
+And that's why the points plotted
+因此 图上画的这些点
+
+31
+00:01:01,160 --> 00:01:02,300
+up on the figure don't have
+没有
+
+32
+00:01:02,430 --> 00:01:04,330
+any labels on them.
+标签信息
+
+33
+00:01:04,490 --> 00:01:05,510
+So in unsupervised learning, what
+也就是说 在非监督学习中
+
+34
+00:01:05,710 --> 00:01:06,860
+we do is, we give this sort of
+我们需要将一系列
+
+35
+00:01:07,280 --> 00:01:09,150
+unlabeled training set to
+无标签的训练数据
+
+36
+00:01:09,250 --> 00:01:10,510
+an algorithm and we just
+输入到一个算法中
+
+37
+00:01:10,600 --> 00:01:12,220
+ask the algorithm: find some
+然后我们告诉这个算法
+
+38
+00:01:12,430 --> 00:01:14,130
+structure in the data for us.
+快去为我们找找这个数据的内在结构
+
+39
+00:01:15,420 --> 00:01:16,490
+Given this data set, one
+给定数据
+
+40
+00:01:16,650 --> 00:01:17,810
+type of structure we might
+我们可能需要某种算法
+
+41
+00:01:18,010 --> 00:01:19,540
+have an algorithm find, is that
+帮助我们寻找一种结构
+
+42
+00:01:19,810 --> 00:01:21,440
+it looks like this data set has
+图上的数据看起来
+
+43
+00:01:21,620 --> 00:01:23,740
+points grouped into two
+可以分成两个
+
+44
+00:01:24,030 --> 00:01:25,500
+separate clusters and so
+分开的点集(称为簇)
+
+45
+00:01:25,800 --> 00:01:28,230
+an algorithm that finds that
+一个能够找到
+
+46
+00:01:28,360 --> 00:01:29,230
+clusters like the ones I just
+我圈出的这些点集的算法
+
+47
+00:01:29,450 --> 00:01:30,610
+circled, is called a clustering
+就被称为
+
+48
+00:01:32,440 --> 00:01:32,440
+algorithm.
+聚类算法
+
+49
+00:01:33,160 --> 00:01:34,620
+And this will be our first type of
+这将是我们介绍的
+
+50
+00:01:34,720 --> 00:01:36,590
+unsupervised learning, although there
+第一个非监督学习算法
+
+51
+00:01:36,790 --> 00:01:38,320
+will be other types of unsupervised
+当然 此后我们还将提到
+
+52
+00:01:39,020 --> 00:01:40,200
+learning algorithms that we'll talk
+其他类型的非监督学习算法
+
+53
+00:01:40,320 --> 00:01:41,710
+about later that finds other
+它们可以为我们
+
+54
+00:01:42,130 --> 00:01:43,710
+types of structure or other
+找到其他类型的结构
+
+55
+00:01:43,920 --> 00:01:46,000
+types of patterns in the data other than clusters.
+或者其他的一些模式 而不只是簇
+
+56
+00:01:46,900 --> 00:01:48,360
+We'll talk about this afterwards, we will talk about clustering.
+我们将先介绍聚类算法 此后 我们将陆续介绍其他算法
+
+57
+00:01:50,020 --> 00:01:51,210
+So what is clustering good for?
+好啦 那么聚类算法一般用来做什么呢?
+
+58
+00:01:51,380 --> 00:01:54,350
+Early in this class I had already mentioned a few applications.
+在这门课程的早些时候 我曾经列举过一些应用
+
+59
+00:01:54,950 --> 00:01:56,540
+One is market segmentation, where
+比如市场分割
+
+60
+00:01:56,670 --> 00:01:57,690
+you may have a database of
+也许你在数据库中
+
+61
+00:01:57,770 --> 00:01:58,840
+customers and want to group
+存储了许多客户的信息
+
+62
+00:01:59,070 --> 00:02:00,380
+them into different market segments.
+而你希望将他们分成不同的客户群
+
+63
+00:02:00,950 --> 00:02:02,590
+So, you can sell to
+这样你可以对不同类型的客户分别销售产品
+
+64
+00:02:02,720 --> 00:02:05,570
+them separately or serve your different market segments better.
+或者分别提供更适合的服务
+
+65
+00:02:06,730 --> 00:02:08,370
+Social network analysis, there are
+社交网络分析
+
+66
+00:02:08,580 --> 00:02:10,090
+actually, you know, groups that have done
+事实上 有许多研究人员
+
+67
+00:02:10,320 --> 00:02:12,590
+this, things like looking at a
+正在研究这样一些内容
+
+68
+00:02:12,730 --> 00:02:14,540
+group of people, social networks,
+他们关注一群人 关注社交网络
+
+69
+00:02:15,070 --> 00:02:16,390
+so things like Facebook, Google plus
+例如 Facebook Google+
+
+70
+00:02:16,710 --> 00:02:18,260
+or maybe information about who
+或者是其他的一些信息
+
+71
+00:02:18,430 --> 00:02:19,710
+are the people that you
+比如说
+
+72
+00:02:20,030 --> 00:02:21,110
+email the most frequently and who
+你经常跟哪些人联系
+
+73
+00:02:21,230 --> 00:02:22,170
+are the people that they email
+而这些人
+
+74
+00:02:22,310 --> 00:02:23,600
+the most frequently, and
+又经常给哪些人发邮件
+
+75
+00:02:23,750 --> 00:02:25,400
+to find coherent groups of people.
+由此找到关系密切的人群
+
+76
+00:02:26,500 --> 00:02:27,600
+So, this would be another maybe
+因此 这可能需要另一个
+
+77
+00:02:28,180 --> 00:02:28,850
+clustering algorithm where, you know, you'd want
+聚类算法
+
+78
+00:02:29,080 --> 00:02:32,200
+to find who other coherent groups of friends in a social network.
+你希望用它发现社交网络中关系密切的朋友
+
+79
+00:02:33,140 --> 00:02:33,990
+Here's something that one of my
+我有一个朋友
+
+80
+00:02:34,140 --> 00:02:35,170
+friends actually worked on, which is,
+正在研究这个问题
+
+81
+00:02:35,920 --> 00:02:37,200
+use clustering to organize compute
+他希望使用聚类算法
+
+82
+00:02:37,670 --> 00:02:39,220
+clusters or to organize data
+来更好的组织计算机集群
+
+83
+00:02:39,440 --> 00:02:40,600
+centers better because, if you
+或者更好的管理数据中心
+
+84
+00:02:40,800 --> 00:02:42,450
+know which computers in the
+因为如果你知道数据中心中
+
+85
+00:02:42,520 --> 00:02:44,990
+data center are in the cluster there tend to work together.
+那些计算机经常协作工作
+
+86
+00:02:45,400 --> 00:02:46,270
+You can use that to reorganize
+那么 你可以
+
+87
+00:02:46,950 --> 00:02:48,390
+your resources and how you
+重新分配资源
+
+88
+00:02:48,570 --> 00:02:50,120
+lay out its network and
+重新布局网络
+
+89
+00:02:50,260 --> 00:02:52,040
+how design your data center and communications.
+由此优化数据中心 优化数据通信
+
+90
+00:02:53,140 --> 00:02:54,540
+And lastly something that, actually
+最后 我实际上
+
+91
+00:02:54,850 --> 00:02:55,910
+another thing I worked on, using
+还在研究
+
+92
+00:02:56,130 --> 00:02:57,810
+clustering algorithms to understand
+如何利用聚类算法
+
+93
+00:02:58,400 --> 00:03:00,030
+galaxy formation and using
+了解星系的形成
+
+94
+00:03:00,280 --> 00:03:02,260
+that to understand how, to
+然后用这个知识
+
+95
+00:03:02,600 --> 00:03:03,860
+understand astronomical detail.
+了解一些天文学上的细节问题
+
+96
+00:03:06,550 --> 00:03:08,580
+So, that's clustering which
+好的 这就是聚类算法
+
+97
+00:03:08,890 --> 00:03:10,450
+is our first example of
+这将是我们介绍的第一个
+
+98
+00:03:10,530 --> 00:03:12,650
+an unsupervised learning algorithm.
+非监督学习算法
+
+99
+00:03:13,090 --> 00:03:14,200
+In the next video, we'll start to
+在下一个视频中 我们将开始介绍
+
+100
+00:03:14,370 --> 00:03:16,250
+talk about a specific clustering algorithm.
+一个具体的聚类算法
+
diff --git a/srt/13 - 2 - K-Means Algorithm (13 min).srt b/srt/13 - 2 - K-Means Algorithm (13 min).srt
new file mode 100644
index 00000000..c1517721
--- /dev/null
+++ b/srt/13 - 2 - K-Means Algorithm (13 min).srt
@@ -0,0 +1,1721 @@
+1
+00:00:00,300 --> 00:00:02,220
+In the clustering problem we are
+在聚类问题中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,360 --> 00:00:03,630
+given an unlabeled data
+我们有未加标签的数据
+
+3
+00:00:03,950 --> 00:00:05,040
+set and we would like
+我们希望有一个算法
+
+4
+00:00:05,200 --> 00:00:06,480
+to have an algorithm automatically
+能够自动的
+
+5
+00:00:07,320 --> 00:00:08,700
+group the data into coherent
+把这些数据分成
+
+6
+00:00:09,340 --> 00:00:11,000
+subsets or into coherent clusters for us.
+有紧密关系的子集或是簇
+
+7
+00:00:12,380 --> 00:00:14,160
+The K Means algorithm is by
+K均值 (K-means) 算法
+
+8
+00:00:14,310 --> 00:00:15,860
+far the most popular, by
+是现在最为广泛使用的
+
+9
+00:00:16,090 --> 00:00:17,410
+far the most widely used clustering
+聚类方法
+
+10
+00:00:17,780 --> 00:00:19,380
+algorithm, and in this
+那么在这个视频中
+
+11
+00:00:19,550 --> 00:00:20,320
+video I would like to tell
+我将会告诉你
+
+12
+00:00:20,570 --> 00:00:23,400
+you what the K Means Algorithm is and how it works.
+什么是K均值算法以及它是怎么运作的
+
+13
+00:00:27,000 --> 00:00:29,310
+The K means clustering algorithm is best illustrated in pictures.
+K均值算法最好用图来表达
+
+14
+00:00:29,960 --> 00:00:30,770
+Let's say I want to take
+如图所示
+
+15
+00:00:31,080 --> 00:00:32,330
+an unlabeled data set like
+现在我有一些
+
+16
+00:00:32,490 --> 00:00:34,040
+the one shown here, and I
+没加标签的数据
+
+17
+00:00:34,100 --> 00:00:36,450
+want to group the data into two clusters.
+而我想将这些数据分成两个簇
+
+18
+00:00:37,710 --> 00:00:38,740
+If I run the K Means clustering
+现在我执行K均值算法
+
+19
+00:00:39,080 --> 00:00:41,560
+algorithm, here is what
+方法是这样的
+
+20
+00:00:41,910 --> 00:00:44,190
+I'm going to do. The first step is to randomly initialize two
+首先我随机选择两个点
+
+21
+00:00:44,410 --> 00:00:45,920
+points, called the cluster centroids.
+这两个点叫做聚类中心 (cluster centroids)
+
+22
+00:00:46,700 --> 00:00:48,170
+So, these two crosses here,
+就是图上边的两个叉
+
+23
+00:00:49,010 --> 00:00:51,730
+these are called the Cluster Centroids
+这两个就是聚类中心
+
+24
+00:00:53,270 --> 00:00:54,320
+and I have two of them
+为什么要两个点呢
+
+25
+00:00:55,100 --> 00:00:57,840
+because I want to group my data into two clusters.
+因为我希望聚出两个类
+
+26
+00:00:59,130 --> 00:01:02,400
+K Means is an iterative algorithm and it does two things.
+K均值是一个迭代方法 它要做两件事情
+
+27
+00:01:03,480 --> 00:01:04,790
+First is a cluster assignment
+第一个是簇分配
+
+28
+00:01:05,330 --> 00:01:07,800
+step, and second is a move centroid step.
+第二个是移动聚类中心
+
+29
+00:01:08,360 --> 00:01:09,630
+So, let me tell you what those things mean.
+我来告诉你这两个是干嘛的
+
+30
+00:01:11,170 --> 00:01:12,520
+The first of the two steps in the
+在K均值算法的每次循环中
+
+31
+00:01:12,700 --> 00:01:14,930
+loop of K means, is this cluster assignment step.
+第一步是要进行簇分配
+
+32
+00:01:15,840 --> 00:01:17,070
+What that means is that, it's
+这就是说
+
+33
+00:01:17,220 --> 00:01:18,360
+going through each of the
+我要遍历所有的样本
+
+34
+00:01:18,700 --> 00:01:19,880
+examples, each of these green
+就是图上所有的绿色的点
+
+35
+00:01:20,170 --> 00:01:22,120
+dots shown here and depending
+然后依据
+
+36
+00:01:22,580 --> 00:01:24,140
+on whether it's closer to the
+每一个点
+
+37
+00:01:24,350 --> 00:01:25,530
+red cluster centroid or the
+是更接近红色的这个中心
+
+38
+00:01:25,620 --> 00:01:27,390
+blue cluster centroid, it is going
+还是蓝色的这个中心
+
+39
+00:01:27,560 --> 00:01:28,570
+to assign each of the
+来将每个数据点
+
+40
+00:01:28,670 --> 00:01:30,670
+data points to one of the two cluster centroids.
+分配到两个不同的聚类中心中
+
+41
+00:01:32,040 --> 00:01:33,350
+Specifically, what I mean
+具体来讲
+
+42
+00:01:33,460 --> 00:01:34,610
+by that, is to go through your
+我指的是
+
+43
+00:01:34,730 --> 00:01:36,930
+data set and color each
+对数据集中的所有点
+
+44
+00:01:37,130 --> 00:01:38,510
+of the points either red or
+依据他们
+
+45
+00:01:38,810 --> 00:01:39,890
+blue, depending on whether
+更接近红色这个中心
+
+46
+00:01:40,160 --> 00:01:41,060
+it is closer to the red
+还是蓝色这个中心
+
+47
+00:01:41,170 --> 00:01:42,150
+cluster centroid or the blue
+进行染色
+
+48
+00:01:42,470 --> 00:01:45,210
+cluster centroid, and I've done that in this diagram here.
+染色之后的结果如图所示
+
+49
+00:01:46,930 --> 00:01:48,700
+So, that was the cluster assignment step.
+以上就是簇分配的步骤
+
+50
+00:01:49,780 --> 00:01:52,270
+The other part of K means, in the
+K均值的另一部分
+
+51
+00:01:52,410 --> 00:01:53,390
+loop of K means, is the move
+是要移动聚类中心
+
+52
+00:01:53,590 --> 00:01:54,860
+centroid step, and what
+具体的操作方法
+
+53
+00:01:55,020 --> 00:01:55,730
+we are going to do is, we
+是这样的
+
+54
+00:01:55,800 --> 00:01:56,890
+are going to take the two cluster centroids,
+我们将两个聚类中心
+
+55
+00:01:57,390 --> 00:01:58,550
+that is, the red cross and
+也就是说红色的叉
+
+56
+00:01:58,830 --> 00:02:00,270
+the blue cross, and we are
+和蓝色的叉
+
+57
+00:02:00,420 --> 00:02:01,420
+going to move them to the average
+移动到
+
+58
+00:02:02,070 --> 00:02:03,900
+of the points colored the same colour.
+和它一样颜色的那堆点的均值处
+
+59
+00:02:04,880 --> 00:02:05,700
+So what we are going
+那么我们要做的是
+
+60
+00:02:05,730 --> 00:02:06,510
+to do is look at all the
+找出所有红色的点
+
+61
+00:02:06,630 --> 00:02:07,810
+red points and compute the
+计算出它们的均值
+
+62
+00:02:08,240 --> 00:02:09,520
+average, really the mean
+就是所有红色的点
+
+63
+00:02:10,080 --> 00:02:11,500
+of the location of all the red points,
+平均下来的位置
+
+64
+00:02:11,650 --> 00:02:13,690
+and we are going to move the red cluster centroid there.
+然后我们就把红色点的聚类中心移动到这里
+
+65
+00:02:14,190 --> 00:02:15,260
+And the same things for the
+蓝色的点也是这样
+
+66
+00:02:15,460 --> 00:02:16,370
+blue cluster centroid, look at all
+找出所有蓝色的点
+
+67
+00:02:16,560 --> 00:02:17,720
+the blue dots and compute their
+计算它们的均值
+
+68
+00:02:17,840 --> 00:02:19,710
+mean, and then move the blue cluster centroid there.
+把蓝色的叉放到那里
+
+69
+00:02:20,320 --> 00:02:20,880
+So, let me do that now.
+那我们现在就这么做
+
+70
+00:02:21,170 --> 00:02:22,990
+We're going to move the cluster centroids as follows
+我们将按照图上所示这么移动
+
+71
+00:02:24,590 --> 00:02:27,350
+and I've now moved them to their new means.
+现在两个中心都已经移动到新的均值那里了
+
+72
+00:02:28,300 --> 00:02:29,760
+The red one moved like that
+你看
+
+73
+00:02:29,820 --> 00:02:31,350
+and the blue one moved
+蓝色的这么移动
+
+74
+00:02:31,510 --> 00:02:34,460
+like that and the red one moved like that.
+红色的这么移动
+
+75
+00:02:34,620 --> 00:02:35,460
+And then we go back to another cluster
+然后我们就会进入下一个
+
+76
+00:02:35,910 --> 00:02:36,920
+assignment step, so we're again
+簇分配
+
+77
+00:02:37,190 --> 00:02:38,090
+going to look at all of
+我们重新检查
+
+78
+00:02:38,160 --> 00:02:39,670
+my unlabeled examples and depending
+所有没有标签的样本
+
+79
+00:02:40,090 --> 00:02:42,840
+on whether it's closer the red or the blue cluster centroid,
+依据它离红色中心还是蓝色中心更近一些
+
+80
+00:02:43,340 --> 00:02:45,150
+I'm going to color them either red or blue.
+将它染成红色或是蓝色
+
+81
+00:02:45,640 --> 00:02:47,160
+I'm going to assign each point
+我要将每个点
+
+82
+00:02:47,530 --> 00:02:48,550
+to one of the two cluster centroids, so let me do that now.
+分配给两个中心的某一个 就像这么做
+
+83
+00:02:51,450 --> 00:02:52,260
+And so the colors of some of the points just changed.
+你看某些点的颜色变了
+
+84
+00:02:53,400 --> 00:02:55,690
+And then I'm going to do another move centroid step.
+然后我们又要移动聚类中心
+
+85
+00:02:56,040 --> 00:02:56,810
+So I'm going to compute the
+于是我计算
+
+86
+00:02:57,070 --> 00:02:57,880
+average of all the blue points,
+蓝色点的均值
+
+87
+00:02:58,110 --> 00:02:59,000
+compute the average of all
+还有红色点的均值
+
+88
+00:02:59,040 --> 00:03:00,360
+the red points and move my
+然后就像图上所表示的
+
+89
+00:03:00,480 --> 00:03:03,770
+cluster centroids like this, and
+移动两个聚类中心
+
+90
+00:03:03,930 --> 00:03:05,650
+so, let's do that again.
+来我们再来一遍
+
+91
+00:03:06,160 --> 00:03:07,810
+Let me do one more cluster assignment step.
+下面我还是要做一次簇分配
+
+92
+00:03:08,320 --> 00:03:09,450
+So colour each point red
+将每个点
+
+93
+00:03:09,620 --> 00:03:10,840
+or blue, based on what it's
+染成红色或是蓝色
+
+94
+00:03:11,170 --> 00:03:13,070
+closer to and then
+依然根据它们离那个中心近
+
+95
+00:03:13,310 --> 00:03:20,000
+do another move centroid step and we're done.
+然后是移动中心 你看就像这样
+
+96
+00:03:20,350 --> 00:03:21,230
+And in fact if you
+实际上
+
+97
+00:03:21,290 --> 00:03:23,250
+keep running additional iterations of
+如果你从这一步开始
+
+98
+00:03:23,500 --> 00:03:26,020
+K means from here the
+一直迭代下去
+
+99
+00:03:26,160 --> 00:03:27,240
+cluster centroids will not change
+聚类中心是不会变的
+
+100
+00:03:27,540 --> 00:03:28,770
+any further and the colours of
+并且
+
+101
+00:03:28,830 --> 00:03:29,760
+the points will not change any
+那些点的颜色也不会变
+
+102
+00:03:29,940 --> 00:03:31,520
+further. And so, this is
+在这时
+
+103
+00:03:31,810 --> 00:03:33,520
+the, at this point,
+我们就能说
+
+104
+00:03:33,770 --> 00:03:35,290
+K means has converged and it's
+K均值方法已经收敛了
+
+105
+00:03:35,400 --> 00:03:36,430
+done a pretty good job finding
+在这些数据中找到两个簇
+
+106
+00:03:37,470 --> 00:03:38,750
+the two clusters in this data.
+K均值表现的很好
+
+107
+00:03:39,360 --> 00:03:40,310
+Let's write out the K means algorithm more formally.
+来我们用更加规范的格式描述K均值算法
+
+108
+00:03:42,150 --> 00:03:43,930
+The K means algorithm takes two inputs.
+K均值算法接受两个输入
+
+109
+00:03:44,570 --> 00:03:46,200
+One is a parameter K,
+第一个是参数K
+
+110
+00:03:46,450 --> 00:03:47,260
+which is the number of clusters
+表示你想从数据中
+
+111
+00:03:47,830 --> 00:03:48,900
+you want to find in the data.
+聚类出的簇的个数
+
+112
+00:03:49,640 --> 00:03:50,820
+I'll later say how we might
+我一会儿会讲到
+
+113
+00:03:51,170 --> 00:03:53,290
+go about trying to choose k, but
+我们可以怎样选择K
+
+114
+00:03:53,470 --> 00:03:54,600
+for now let's just say that
+这里呢 我们只是说
+
+115
+00:03:55,110 --> 00:03:56,210
+we've decided we want a
+我们已经确定了
+
+116
+00:03:56,360 --> 00:03:57,600
+certain number of clusters and we're
+需要几个簇
+
+117
+00:03:57,690 --> 00:03:58,810
+going to tell the algorithm how many
+然后我们要告诉这个算法
+
+118
+00:03:59,040 --> 00:04:00,730
+clusters we think there are in the data set.
+我们觉得在数据集里有多少个簇
+
+119
+00:04:01,170 --> 00:04:02,120
+And then K means also
+K均值同时要
+
+120
+00:04:02,490 --> 00:04:03,430
+takes as input this sort
+接收另外一个输入
+
+121
+00:04:03,880 --> 00:04:05,060
+of unlabeled training set of
+那就是只有 x 的
+
+122
+00:04:05,250 --> 00:04:06,530
+just the Xs and
+没有标签 y 的训练集
+
+123
+00:04:06,710 --> 00:04:08,430
+because this is unsupervised learning, we
+因为这是非监督学习
+
+124
+00:04:08,520 --> 00:04:10,690
+don't have the labels Y anymore.
+我们用不着 y
+
+125
+00:04:10,980 --> 00:04:12,470
+And for unsupervised learning of
+同时在非监督学习的
+
+126
+00:04:12,740 --> 00:04:14,020
+the K means I'm going to
+K均值算法里
+
+127
+00:04:14,550 --> 00:04:16,160
+use the convention that XI
+我们约定
+
+128
+00:04:16,420 --> 00:04:17,750
+is an RN dimensional vector.
+x(i) 是一个n维向量
+
+129
+00:04:18,280 --> 00:04:19,190
+And that's why my training examples
+这就是
+
+130
+00:04:19,750 --> 00:04:22,460
+are now N dimensional rather N plus one dimensional vectors.
+训练样本是 n 维而不是 n+1 维的原因
+
+131
+00:04:24,340 --> 00:04:25,430
+This is what the K means algorithm does.
+这就是K均值算法
+
+132
+00:04:27,180 --> 00:04:28,630
+The first step is that it
+第一步是
+
+133
+00:04:28,790 --> 00:04:31,170
+randomly initializes k cluster
+随机初始化 K 个聚类中心
+
+134
+00:04:31,570 --> 00:04:33,550
+centroids which we will
+记作
+
+135
+00:04:33,820 --> 00:04:34,610
+call mu 1, mu 2, up
+μ1, μ2 一直到 μk
+
+136
+00:04:34,840 --> 00:04:36,250
+to mu k. And so
+就像之前
+
+137
+00:04:36,650 --> 00:04:38,450
+in the earlier diagram, the
+图中所示
+
+138
+00:04:38,550 --> 00:04:40,770
+cluster centroids corresponded to the
+聚类中心对应于
+
+139
+00:04:41,060 --> 00:04:42,240
+location of the red cross
+红色叉和蓝色叉
+
+140
+00:04:42,660 --> 00:04:44,020
+and the location of the blue cross.
+所在的位置
+
+141
+00:04:44,410 --> 00:04:45,640
+So there we had two cluster
+于是我们有两个聚类中心
+
+142
+00:04:45,960 --> 00:04:47,000
+centroids, so maybe the red
+按照这样的记法
+
+143
+00:04:47,170 --> 00:04:48,470
+cross was mu 1
+红叉是 μ1
+
+144
+00:04:48,650 --> 00:04:49,940
+and the blue cross was mu
+蓝叉是 μ2
+
+145
+00:04:50,300 --> 00:04:51,360
+2, and more generally we would have
+通常情况下
+
+146
+00:04:51,820 --> 00:04:53,830
+k cluster centroids rather than just 2.
+我们可能会有比2要多的聚类中心
+
+147
+00:04:54,520 --> 00:04:56,240
+Then the inner loop
+K均值的内部循环
+
+148
+00:04:56,520 --> 00:04:57,360
+of k means does the following,
+是这样的
+
+149
+00:04:57,830 --> 00:04:59,020
+we're going to repeatedly do the following.
+我们会重复做下面的事情
+
+150
+00:05:00,070 --> 00:05:01,950
+First for each of
+首先
+
+151
+00:05:02,160 --> 00:05:03,920
+my training examples, I'm going
+对于每个训练样本
+
+152
+00:05:04,110 --> 00:05:05,950
+to set this variable CI
+我们用变量 c(i) 表示
+
+153
+00:05:06,790 --> 00:05:07,960
+to be the index 1 through
+K个聚类中心中最接近 x(i) 的
+
+154
+00:05:08,170 --> 00:05:10,520
+K of the cluster centroid closest to XI.
+那个中心的下标
+
+155
+00:05:11,170 --> 00:05:13,810
+So this was my cluster assignment
+这就是簇分配
+
+156
+00:05:14,330 --> 00:05:16,870
+step, where we
+这个步骤
+
+157
+00:05:17,000 --> 00:05:18,680
+took each of my examples and
+我先将每个样本
+
+158
+00:05:18,980 --> 00:05:20,740
+coloured it either red
+依据它离那个聚类中心近
+
+159
+00:05:21,050 --> 00:05:22,050
+or blue, depending on which
+将其染成
+
+160
+00:05:22,380 --> 00:05:23,940
+cluster centroid it was closest to.
+红色或是蓝色
+
+161
+00:05:24,140 --> 00:05:25,090
+So CI is going to be
+所以 c(i) 是一个
+
+162
+00:05:25,280 --> 00:05:26,280
+a number from 1 to
+在1到 K 之间的数
+
+163
+00:05:26,380 --> 00:05:27,680
+K that tells us, you
+而且它表明
+
+164
+00:05:27,780 --> 00:05:28,760
+know, is it closer to the
+这个点到底是
+
+165
+00:05:28,920 --> 00:05:29,820
+red cross or is it
+更接近红色叉
+
+166
+00:05:29,900 --> 00:05:31,170
+closer to the blue cross,
+还是蓝色叉
+
+167
+00:05:32,200 --> 00:05:33,210
+and another way of writing this
+另一种表达方式是
+
+168
+00:05:33,580 --> 00:05:35,350
+is I'm going to,
+我想要计算 c(i)
+
+169
+00:05:35,620 --> 00:05:37,820
+to compute Ci, I'm
+那么
+
+170
+00:05:37,890 --> 00:05:39,120
+going to take my Ith
+我要用第i个样本x(i)
+
+171
+00:05:39,380 --> 00:05:41,170
+example Xi and and
+然后
+
+172
+00:05:41,360 --> 00:05:42,670
+I'm going to measure it's distance
+计算出这个样本
+
+173
+00:05:43,900 --> 00:05:44,860
+to each of my cluster centroids,
+距离所有K个聚类中心的距离
+
+174
+00:05:45,410 --> 00:05:46,690
+this is mu and then
+这是 μ
+
+175
+00:05:47,060 --> 00:05:48,640
+lower-case k, right, so
+以及小写的k
+
+176
+00:05:48,890 --> 00:05:50,630
+capital K is the total
+大写的 K 表示
+
+177
+00:05:50,910 --> 00:05:51,900
+number centroids and I'm going
+所有聚类中心的个数
+
+178
+00:05:52,100 --> 00:05:53,160
+to use lower case k here
+小写的 k 则是
+
+179
+00:05:53,770 --> 00:05:55,140
+to index into the different centroids.
+不同的中心的下标
+
+180
+00:05:56,240 --> 00:05:58,470
+But so, Ci is going to, I'm going
+我希望的是
+
+181
+00:05:58,550 --> 00:06:00,110
+to minimize over my values
+在所有K个中心中
+
+182
+00:06:00,550 --> 00:06:01,930
+of k and find the
+找到一个k
+
+183
+00:06:02,120 --> 00:06:03,650
+value of K that minimizes this
+使得xi到μk的距离
+
+184
+00:06:03,900 --> 00:06:04,750
+distance between Xi and the
+是xi到所有的聚类中心的距离中
+
+185
+00:06:04,800 --> 00:06:06,130
+cluster centroid, and then,
+最小的那个
+
+186
+00:06:06,340 --> 00:06:08,990
+you know, the
+也就是说
+
+187
+00:06:09,070 --> 00:06:10,350
+value of k that minimizes
+k的值使这个最小
+
+188
+00:06:10,940 --> 00:06:12,160
+this, that's what gets set in
+这就是计算ci的方法
+
+189
+00:06:12,300 --> 00:06:14,100
+Ci. So, here's
+这里还有
+
+190
+00:06:14,360 --> 00:06:16,470
+another way of writing out what Ci is.
+另外的表示ci的方法
+
+191
+00:06:18,050 --> 00:06:19,150
+If I write the norm between
+我用xi减μk的范数
+
+192
+00:06:19,270 --> 00:06:21,500
+Xi minus Mu-k,
+来表示
+
+193
+00:06:23,000 --> 00:06:24,120
+then this is the distance between
+这是第i个训练样本
+
+194
+00:06:24,630 --> 00:06:26,040
+my ith training example
+到聚类中心μk
+
+195
+00:06:26,180 --> 00:06:27,350
+Xi and the cluster centroid
+的距离
+
+196
+00:06:28,140 --> 00:06:30,280
+Mu subscript K, this is--this
+注意
+
+197
+00:06:31,150 --> 00:06:32,830
+here, that's a lowercase K. So uppercase
+我这里用的是小写的k
+
+198
+00:06:33,320 --> 00:06:34,710
+K is going to be
+大写的K
+
+199
+00:06:34,980 --> 00:06:36,210
+used to denote the total
+大写的k表示
+
+200
+00:06:36,450 --> 00:06:38,020
+number of cluster centroids,
+聚类中心的总数
+
+201
+00:06:38,770 --> 00:06:40,430
+and this lowercase K's
+这个小写的k
+
+202
+00:06:40,790 --> 00:06:41,840
+a number between one and
+是第一个到第K个中心
+
+203
+00:06:41,960 --> 00:06:42,940
+capital K. I'm just using
+中的一个
+
+204
+00:06:43,210 --> 00:06:44,450
+lower case K to index
+我用小写的k
+
+205
+00:06:44,930 --> 00:06:45,990
+into my different cluster centroids.
+表示不同聚类中心的下标
+
+206
+00:06:47,130 --> 00:06:49,020
+Next is lower case k. So
+这是个小写k
+
+207
+00:06:50,050 --> 00:06:51,330
+that's the distance between the example and the cluster centroid
+这就是某个样本到聚类中心的距离
+
+208
+00:06:51,490 --> 00:06:52,810
+and so what I'm going to
+接下来
+
+209
+00:06:53,050 --> 00:06:54,330
+do is find the value
+我要做的是
+
+210
+00:06:55,250 --> 00:06:56,390
+of K, of lower case
+找出小写的k的值
+
+211
+00:06:56,710 --> 00:06:58,900
+k that minimizes this, and
+让这个式子最小
+
+212
+00:06:59,080 --> 00:07:00,320
+so the value of
+那么
+
+213
+00:07:00,480 --> 00:07:02,100
+k that minimizes you know,
+接下来
+
+214
+00:07:02,280 --> 00:07:03,610
+that's what I'm going to
+我就要将 c(i)
+
+215
+00:07:04,000 --> 00:07:06,560
+set as Ci, and
+赋值为k
+
+216
+00:07:06,760 --> 00:07:07,850
+by convention here I've written
+我这里按照惯例表示
+
+217
+00:07:08,190 --> 00:07:09,430
+the distance between Xi and
+x(i) 和聚类中心的距离
+
+218
+00:07:09,480 --> 00:07:11,310
+the cluster centroid, by convention
+因为出于惯例
+
+219
+00:07:11,820 --> 00:07:13,330
+people actually tend to write this as the squared distance.
+人们更喜欢用距离的平方来表示
+
+220
+00:07:13,780 --> 00:07:15,370
+So we think of Ci as picking
+所以我们可以认为
+
+221
+00:07:15,660 --> 00:07:16,860
+the cluster centroid with the smallest
+c(i) 是距样本 x(i) 的距离的平方
+
+222
+00:07:17,450 --> 00:07:20,110
+squared distance to my training example Xi.
+最小的那个聚类中心
+
+223
+00:07:20,750 --> 00:07:22,080
+But of course minimizing squared distance,
+当然
+
+224
+00:07:22,500 --> 00:07:23,700
+and minimizing distance that should
+使距离的平方最小或是距离最小
+
+225
+00:07:23,880 --> 00:07:25,670
+give you the same value of Ci,
+都能让我们得到相同的 c(i)
+
+226
+00:07:25,830 --> 00:07:26,670
+but we usually put in the
+但是我们通常还是
+
+227
+00:07:26,750 --> 00:07:28,120
+square there, just as the
+写成距离的平方
+
+228
+00:07:28,430 --> 00:07:31,020
+convention that people use for K means.
+因为这是约定俗成的
+
+229
+00:07:31,170 --> 00:07:32,320
+So that was the cluster assignment step.
+这就是簇分配
+
+230
+00:07:33,480 --> 00:07:34,750
+The other in the loop
+K均值循环中的另一部分是
+
+231
+00:07:34,980 --> 00:07:37,740
+of K means does the move centroid step.
+移动聚类中心
+
+232
+00:07:40,540 --> 00:07:41,750
+And what that does is for
+这是说
+
+233
+00:07:42,160 --> 00:07:43,460
+each of my cluster centroids, so
+对于每个聚类中心
+
+234
+00:07:43,550 --> 00:07:44,740
+for lower case k equals 1 through
+也就是说
+
+235
+00:07:44,870 --> 00:07:46,190
+K, it sets Mu-k equals
+小写k从1循环到K
+
+236
+00:07:46,710 --> 00:07:48,460
+to the average of the points assigned to cluster.
+将 μk 赋值为这个簇的均值
+
+237
+00:07:49,270 --> 00:07:50,720
+So as a concrete example, let's say
+举个例子
+
+238
+00:07:50,910 --> 00:07:52,100
+that one of my cluster
+某一个聚类中心
+
+239
+00:07:52,340 --> 00:07:53,420
+centroids, let's say cluster centroid
+比如说是 μ2
+
+240
+00:07:53,750 --> 00:07:55,030
+two, has training examples,
+被分配了一些训练样本
+
+241
+00:07:55,820 --> 00:08:02,390
+you know, 1, 5, 6, and 10 assigned to it.
+像是1,5,6,10
+
+242
+00:08:03,220 --> 00:08:04,270
+And what this means is,
+这个表明
+
+243
+00:08:04,470 --> 00:08:05,510
+really this means that C1 equals
+c(1) 等于2
+
+244
+00:08:06,560 --> 00:08:09,180
+to C5 equals to
+c(5) 等于2
+
+245
+00:08:10,690 --> 00:08:12,180
+C6 equals to and
+c(6) 等于2
+
+246
+00:08:12,300 --> 00:08:13,730
+similarly well c10 equals, too, right?
+同样的 c(10) 也是等于2 对吧?
+
+247
+00:08:14,970 --> 00:08:17,070
+If we got that
+如果我们从上一步
+
+248
+00:08:17,160 --> 00:08:18,940
+from the cluster assignment step, then
+也就是簇分配那一步得到了这些
+
+249
+00:08:19,190 --> 00:08:21,250
+that means examples 1,5,6 and
+这个表明
+
+250
+00:08:21,450 --> 00:08:22,960
+10 were assigned to the cluster centroid two.
+样本1 5 6 10被分配给了聚类中心2
+
+251
+00:08:24,020 --> 00:08:25,210
+Then in this move centroid step,
+然后在移动聚类中心这一步中
+
+252
+00:08:25,540 --> 00:08:26,580
+what I'm going to do is just
+我们要做的是
+
+253
+00:08:27,180 --> 00:08:29,290
+compute the average of these four things.
+计算出这四个的平均值
+
+254
+00:08:31,340 --> 00:08:33,950
+So X1 plus X5 plus X6
+即
+
+255
+00:08:34,270 --> 00:08:35,620
+plus X10.
+计算 x(1)+x(5)+x(6)+x(10)
+
+256
+00:08:35,890 --> 00:08:37,190
+And now I'm going
+然后计算
+
+257
+00:08:37,380 --> 00:08:38,630
+to average them so here I
+它们的平均值
+
+258
+00:08:38,920 --> 00:08:40,020
+have four points assigned to
+这里聚类中心有
+
+259
+00:08:40,100 --> 00:08:41,700
+this cluster centroid, just take
+4个点
+
+260
+00:08:42,280 --> 00:08:43,240
+one quarter of that.
+那么我们要计算和的四分之一
+
+261
+00:08:43,980 --> 00:08:45,890
+And now Mu2 is going to
+这时μ2就是一个
+
+262
+00:08:46,100 --> 00:08:47,910
+be an n-dimensional vector.
+n维的向量
+
+263
+00:08:48,420 --> 00:08:49,480
+Because each of these
+因为
+
+264
+00:08:49,700 --> 00:08:51,050
+example x1, x5, x6, x10
+x(1) x(5) x(6) x(10) 都是
+
+265
+00:08:52,160 --> 00:08:53,170
+each of them were an n-dimensional
+n维的向量
+
+266
+00:08:53,700 --> 00:08:55,150
+vector, and I'm going to
+然后
+
+267
+00:08:55,240 --> 00:08:56,270
+add up these things and, you
+把这些相加
+
+268
+00:08:56,550 --> 00:08:57,870
+know, divide by four because I
+再除以4
+
+269
+00:08:57,940 --> 00:08:59,320
+have four points assigned to this
+因为
+
+270
+00:08:59,490 --> 00:09:00,730
+cluster centroid, I end up
+有4个点分配到了这个聚类中心
+
+271
+00:09:01,280 --> 00:09:02,770
+with my move centroid step,
+这样聚类中心μ2的移动
+
+272
+00:09:03,870 --> 00:09:05,260
+for my cluster centroid mu-2.
+就结束了
+
+273
+00:09:05,870 --> 00:09:06,850
+This has the effect of moving
+这个作用是说
+
+274
+00:09:07,210 --> 00:09:08,950
+mu-2 to the average of
+将μ2移动到
+
+275
+00:09:09,130 --> 00:09:10,620
+the four points listed here.
+这四个点的均值处
+
+276
+00:09:12,430 --> 00:09:13,850
+One thing that I've asked is, well here we said, let's
+我要问的问题是
+
+277
+00:09:14,080 --> 00:09:16,600
+let mu-k be the average of the points assigned to the cluster.
+既然我们要让μk移动到分配给它的那些点的均值处
+
+278
+00:09:17,500 --> 00:09:18,900
+But what if there is
+那么如果
+
+279
+00:09:18,960 --> 00:09:21,310
+a cluster centroid no points
+存在一个
+
+280
+00:09:21,690 --> 00:09:23,000
+with zero points assigned to it.
+没有点分配给它的聚类中心 那怎么办?
+
+281
+00:09:23,280 --> 00:09:24,300
+In that case the more common
+通常在这种情况下
+
+282
+00:09:24,650 --> 00:09:25,720
+thing to do is to just
+我们就直接移除
+
+283
+00:09:26,140 --> 00:09:27,220
+eliminate that cluster centroid.
+那个聚类中心
+
+284
+00:09:27,830 --> 00:09:28,630
+And if you do that, you end
+如果这么做了
+
+285
+00:09:28,840 --> 00:09:30,260
+up with K minus one clusters
+最终将会得到K-1个簇
+
+286
+00:09:31,350 --> 00:09:33,840
+instead of k clusters.
+而不是K个簇
+
+287
+00:09:34,400 --> 00:09:35,620
+Sometimes if you really need
+如果就是要K个簇
+
+288
+00:09:35,830 --> 00:09:37,380
+k clusters, then the other
+不多不少
+
+289
+00:09:37,490 --> 00:09:38,220
+thing you can do if you
+但是有个
+
+290
+00:09:38,290 --> 00:09:39,530
+have a cluster centroid with no
+没有点分配给它的聚类中心
+
+291
+00:09:39,740 --> 00:09:41,170
+points assigned to it is you can
+你所要做的是
+
+292
+00:09:41,260 --> 00:09:42,590
+just randomly reinitialize that cluster
+重新随机找一个聚类中心
+
+293
+00:09:43,450 --> 00:09:44,870
+centroid, but it's more
+但是直接移除那个中心
+
+294
+00:09:45,170 --> 00:09:46,590
+common to just eliminate a
+是更为常见的方法
+
+295
+00:09:46,670 --> 00:09:48,210
+cluster if somewhere during
+当你遇到了一个
+
+296
+00:09:48,410 --> 00:09:49,690
+K means it with no points
+没有分配点的
+
+297
+00:09:50,290 --> 00:09:52,020
+assigned to that cluster centroid, and
+聚类中心
+
+298
+00:09:52,140 --> 00:09:53,340
+that can happen, altthough in practice
+不过在实际过程中
+
+299
+00:09:53,820 --> 00:09:55,590
+it happens not that often.
+这个问题不会经常出现
+
+300
+00:09:55,810 --> 00:09:57,280
+So that's the K means Algorithm.
+这就是K均值算法
+
+301
+00:09:59,330 --> 00:10:00,220
+Before wrapping up this video
+在这个视频结束之前
+
+302
+00:10:00,620 --> 00:10:01,290
+I just want to tell you
+我还想告诉你
+
+303
+00:10:01,350 --> 00:10:02,710
+about one other common application
+K均值的
+
+304
+00:10:03,350 --> 00:10:04,680
+of K Means and that's
+另外一个常见应用
+
+305
+00:10:04,920 --> 00:10:06,840
+to the problems with non well separated clusters.
+应对没有很好分开的簇
+
+306
+00:10:08,160 --> 00:10:08,640
+Here's what I mean.
+我指的是这个
+
+307
+00:10:09,280 --> 00:10:10,320
+So far we've been picturing K Means
+到目前为止
+
+308
+00:10:10,950 --> 00:10:12,090
+and applying it to data
+我们的K均值算法
+
+309
+00:10:12,330 --> 00:10:13,520
+sets like that shown here where
+都是基于一些像图中所示的数据
+
+310
+00:10:14,150 --> 00:10:15,590
+we have three pretty
+有很好的隔离开来的
+
+311
+00:10:15,900 --> 00:10:17,380
+well separated clusters, and we'd
+三个簇
+
+312
+00:10:17,670 --> 00:10:19,930
+like an algorithm to find maybe the 3 clusters for us.
+然后我们就用这个算法找出三个簇
+
+313
+00:10:20,750 --> 00:10:21,840
+But it turns out that
+但是事实是
+
+314
+00:10:21,980 --> 00:10:23,180
+very often K Means is also
+K均值经常会用于
+
+315
+00:10:23,600 --> 00:10:24,860
+applied to data sets that
+一些这样的数据
+
+316
+00:10:25,170 --> 00:10:26,240
+look like this where there may
+看起来并没有
+
+317
+00:10:26,330 --> 00:10:28,300
+not be several very
+很好的分来的
+
+318
+00:10:28,550 --> 00:10:30,250
+well separated clusters.
+几个簇
+
+319
+00:10:30,830 --> 00:10:32,960
+Here is an example application, to t-shirt sizing.
+这是一个应用的例子 关于T恤的大小
+
+320
+00:10:34,070 --> 00:10:34,650
+Let's say you are a t-shirt
+假设你是T恤制造商
+
+321
+00:10:35,270 --> 00:10:37,360
+manufacturer you've done is you've gone
+你找到了一些人
+
+322
+00:10:38,030 --> 00:10:39,310
+to the population that you
+想把T恤卖给他们
+
+323
+00:10:39,380 --> 00:10:40,520
+want to sell t-shirts to, and
+然后
+
+324
+00:10:40,800 --> 00:10:42,190
+you've collected a number of
+你搜集了一些
+
+325
+00:10:42,580 --> 00:10:43,770
+examples of the height and weight
+这些人的
+
+326
+00:10:44,270 --> 00:10:45,350
+of these people in your
+身高和体重的数据
+
+327
+00:10:45,680 --> 00:10:46,740
+population and so, well I
+我猜
+
+328
+00:10:47,220 --> 00:10:48,280
+guess height and weight tend to
+身高体重更重要一些
+
+329
+00:10:48,370 --> 00:10:50,310
+be positively highlighted so maybe
+然后你可能
+
+330
+00:10:50,540 --> 00:10:51,160
+you end up with a data
+收集到了这样的样本
+
+331
+00:10:51,430 --> 00:10:52,590
+set like this, you know, with
+一些关于
+
+332
+00:10:52,830 --> 00:10:53,910
+a sample or set of
+人们身高和体重的样本
+
+333
+00:10:53,980 --> 00:10:56,000
+examples of different peoples heights and weight.
+就像这个图所表示的
+
+334
+00:10:56,530 --> 00:10:57,880
+Let's say you want to size your t shirts.
+然后你想确定一下T恤的大小
+
+335
+00:10:58,570 --> 00:10:59,810
+Let's say I want to design
+假设我们要设计
+
+336
+00:11:00,330 --> 00:11:01,480
+and sell t shirts of three
+三种不同大小的t恤
+
+337
+00:11:01,660 --> 00:11:03,970
+sizes, small, medium and large.
+小号 中号 和大号
+
+338
+00:11:04,660 --> 00:11:06,420
+So how big should I make my small one?
+那么小号应该是多大的?
+
+339
+00:11:06,550 --> 00:11:07,320
+How big should I my medium?
+中号呢?
+
+340
+00:11:07,690 --> 00:11:09,300
+And how big should I make my large t-shirts.
+大号呢?
+
+341
+00:11:10,370 --> 00:11:11,290
+One way to do that would
+有一种
+
+342
+00:11:11,410 --> 00:11:12,050
+to be to run my k means
+在这样的数据上
+
+343
+00:11:12,270 --> 00:11:13,570
+clustering logarithm on this data
+使用K均值算法进行聚类
+
+344
+00:11:13,830 --> 00:11:14,640
+set that I have shown on
+的方法就像我展示的那样
+
diff --git a/srt/13 - 3 - Optimization Objective (7 min)(1).srt b/srt/13 - 3 - Optimization Objective (7 min)(1).srt
new file mode 100644
index 00000000..a22f5c17
--- /dev/null
+++ b/srt/13 - 3 - Optimization Objective (7 min)(1).srt
@@ -0,0 +1,1001 @@
+1
+00:00:00,090 --> 00:00:01,540
+Most of the supervised learning algorithms
+在大多数我们已经学到的(字幕翻译:中国海洋大学,黄海广,haiguang2000@qq.com)
+
+2
+00:00:01,690 --> 00:00:02,890
+we've seen, things like linear
+监督学习算法中,类似于线性回归
+
+3
+00:00:03,130 --> 00:00:04,730
+regression, logistic regression and
+逻辑回归,以及更多的算法
+
+4
+00:00:04,930 --> 00:00:05,850
+so on. All of those
+所有的这些
+
+5
+00:00:06,300 --> 00:00:08,089
+algorithms have an optimization objective
+算法都有一个优化目标函数
+
+6
+00:00:08,670 --> 00:00:10,920
+or some cost function that the algorithm was trying to minimize.
+或者某个代价函数需要通过算法进行最小化
+
+7
+00:00:11,920 --> 00:00:13,180
+It turns out that K-means also
+事实上 K均值也有
+
+8
+00:00:13,770 --> 00:00:15,730
+has an optimization objective or
+一个优化目标函数或者
+
+9
+00:00:15,870 --> 00:00:18,720
+a cost function that is trying to minimize.
+需要最小化的代价函数
+
+10
+00:00:19,630 --> 00:00:20,180
+And in this video, I'd like to tell
+在这个视频中 我会
+
+11
+00:00:20,230 --> 00:00:23,620
+you what that optimization objective is.
+告诉大家这个优化目标函数是什么
+
+12
+00:00:23,730 --> 00:00:24,420
+And the reason I want to do so
+我这么做有两方面的目的
+
+13
+00:00:24,750 --> 00:00:26,960
+is because this will be useful to us for two purposes.
+具体来说
+
+14
+00:00:28,020 --> 00:00:29,330
+First, knowing what is the
+首先 了解什么是
+
+15
+00:00:29,480 --> 00:00:30,890
+optimization objective of K-means
+K均值的优化目标函数
+
+16
+00:00:31,150 --> 00:00:32,390
+will help us to
+这将能帮助我们
+
+17
+00:00:32,690 --> 00:00:33,970
+debug the learning algorithm and
+调试学习算法
+
+18
+00:00:34,070 --> 00:00:35,080
+just make sure that K-means is
+确保K均值算法
+
+19
+00:00:35,300 --> 00:00:37,100
+running correctly, and second,
+是在正确运行中
+
+20
+00:00:37,610 --> 00:00:39,290
+and perhaps even more importantly, in
+第二个也是最重要的一个目的是
+
+21
+00:00:39,530 --> 00:00:41,290
+a later video we'll talk
+在之后的视频中我们将讨论
+
+22
+00:00:41,490 --> 00:00:42,580
+about how we can use this to
+我们该怎样运用这个来
+
+23
+00:00:42,730 --> 00:00:44,000
+help K-means find better clusters
+帮助K均值找到更好的簇
+
+24
+00:00:44,070 --> 00:00:46,290
+and avoid local optima, but we'll do that in a later video that follows this one.
+并且避免局部最优解,不过我们在这节课之后的视频中才会涉及到
+
+25
+00:00:46,410 --> 00:00:47,330
+Just as a quick reminder, while K-means is
+另外顺便提一下 当K均值正在运行时
+
+26
+00:00:49,680 --> 00:00:52,870
+running we're going to be
+我们将
+
+27
+00:00:54,450 --> 00:00:55,820
+keeping track of two sets of variables.
+对两组变量进行跟踪
+
+28
+00:00:56,430 --> 00:00:58,390
+First is the CI's and
+首先是 c(i)
+
+29
+00:00:58,700 --> 00:00:59,830
+that keeps track of the
+它表示的是
+
+30
+00:01:00,190 --> 00:01:01,600
+index or the number of the cluster
+当前的样本 x(i) 所归为
+
+31
+00:01:02,730 --> 00:01:05,040
+to which an example x(i) is currently assigned.
+的那个簇的索引或者序号
+
+32
+00:01:05,230 --> 00:01:05,960
+And then, the other set of variables
+另外一组变量
+
+33
+00:01:06,540 --> 00:01:07,580
+we use as Mu subscript
+我们用 μk 来表示
+
+34
+00:01:08,120 --> 00:01:09,410
+K, which is the location
+第 k 个簇的
+
+35
+00:01:10,140 --> 00:01:12,110
+of cluster centroid K. And,
+聚类中心 (cluster centroid)
+
+36
+00:01:12,380 --> 00:01:13,750
+again, for K-means
+顺便再提一句
+
+37
+00:01:14,030 --> 00:01:17,230
+we use capital K to denote the total number of clusters.
+K均值中我们用大写 K
+
+38
+00:01:17,890 --> 00:01:19,310
+And here lower case K,
+来表示簇的总数
+
+39
+00:01:20,010 --> 00:01:20,910
+you know, is going to be an
+用小写 k 来表示
+
+40
+00:01:21,040 --> 00:01:22,650
+index into the cluster
+聚类中心的序号
+
+41
+00:01:22,970 --> 00:01:23,930
+centroids, and so lower
+因此
+
+42
+00:01:24,030 --> 00:01:24,940
+case k is going to be
+小写 k 的范围
+
+43
+00:01:25,140 --> 00:01:26,390
+a number between 1 and
+就应该是1到大写K之间
+
+44
+00:01:26,600 --> 00:01:29,630
+capital K. Now, here's
+除此以外
+
+45
+00:01:29,840 --> 00:01:31,040
+one more bit of notation which
+还有另一个符号
+
+46
+00:01:31,270 --> 00:01:32,280
+is going to use Mu
+我们用 μc(i)
+
+47
+00:01:32,630 --> 00:01:34,560
+subscript c(i) to denote
+来表示
+
+48
+00:01:34,970 --> 00:01:36,660
+the cluster centroid of the
+x(i) 所属的那个簇
+
+49
+00:01:36,780 --> 00:01:38,400
+cluster to which example x(i)
+的聚类中心
+
+50
+00:01:38,880 --> 00:01:40,500
+has been assigned and
+我再稍微多解释一下
+
+51
+00:01:40,710 --> 00:01:42,030
+to explain that notation
+这个符号
+
+52
+00:01:42,450 --> 00:01:43,450
+a little bit more, let's
+假如说
+
+53
+00:01:43,660 --> 00:01:45,600
+say that x(i) has been
+x(i) 被划为了
+
+54
+00:01:45,740 --> 00:01:47,760
+assigned to cluster number five.
+第5个簇
+
+55
+00:01:48,880 --> 00:01:49,830
+What that means is that c(i),
+这是什么意思呢?
+
+56
+00:01:50,850 --> 00:01:52,290
+that is the index of x(i),
+这个意思是 x(i) 的序号
+
+57
+00:01:53,130 --> 00:01:54,300
+that that is equal to 5.
+也就是 c(i) 等于5
+
+58
+00:01:54,420 --> 00:01:57,640
+Right? Because you know, having c(i) equals 5,
+因为 c(i) = 5
+
+59
+00:01:57,800 --> 00:01:59,270
+that's what it means for the
+表示的就是
+
+60
+00:02:00,500 --> 00:02:01,720
+example x(i) to be
+x(i) 这个样本
+
+61
+00:02:01,910 --> 00:02:03,440
+assigned to cluster number 5.
+被分到了第五个簇
+
+62
+00:02:03,510 --> 00:02:05,700
+And so Mu subscript
+因此
+
+63
+00:02:06,290 --> 00:02:07,960
+c(i) is going to
+μ 下标 c(i)
+
+64
+00:02:08,100 --> 00:02:09,630
+be equal to Mu subscript
+就等于 μ5
+
+65
+00:02:10,080 --> 00:02:12,260
+5 because c(i) is equal
+因为 c(i) 就是5
+
+66
+00:02:13,700 --> 00:02:14,100
+to 5.
+所以
+
+67
+00:02:15,100 --> 00:02:16,570
+This Mu substitute c(i) is the
+这里的 μc(i)
+
+68
+00:02:16,660 --> 00:02:18,420
+cluster centroid of cluster number
+就是第5个簇的聚类中心
+
+69
+00:02:18,730 --> 00:02:19,670
+5, which is the cluster
+而也正是我的样本 x(i)
+
+70
+00:02:20,120 --> 00:02:22,480
+to which my example x(i) has been assigned.
+所属的第5个簇
+
+71
+00:02:23,470 --> 00:02:24,730
+With this notation, we're now
+有了这样的符号表示
+
+72
+00:02:24,960 --> 00:02:26,040
+ready to write out what
+现在我们就能写出
+
+73
+00:02:26,200 --> 00:02:28,150
+is the optimization objective of
+K均值聚类算法的
+
+74
+00:02:28,290 --> 00:02:30,360
+the K Mu clustering algorithm.
+优化目标了
+
+75
+00:02:30,760 --> 00:02:30,800
+And here it is.
+以下便是
+
+76
+00:02:31,330 --> 00:02:32,940
+The cost function that K-means
+K均值算法需要
+
+77
+00:02:33,040 --> 00:02:34,380
+is minimizing is the
+最小化的代价函数
+
+78
+00:02:34,570 --> 00:02:35,770
+function J of all of
+J 参数是 c(1) 到 c(m)
+
+79
+00:02:35,880 --> 00:02:37,470
+these parameters c1 through
+以及 μ1 到 μk
+
+80
+00:02:37,890 --> 00:02:39,610
+cM, Mu1 through MuK, that
+随着算法的执行过程
+
+81
+00:02:39,790 --> 00:02:41,570
+K-means is varying as the algorithm runs.
+这些参数将不断变化
+
+82
+00:02:42,100 --> 00:02:43,930
+And the optimization objective is shown
+右边给出了优化目标
+
+83
+00:02:44,160 --> 00:02:45,520
+on the right, is the average of
+也就是所有的
+
+84
+00:02:45,610 --> 00:02:46,430
+one over M of the sum
+1/m 乘以
+
+85
+00:02:46,620 --> 00:02:48,730
+of i equals one through M of this term here
+i = 1 到 m 个项的求和
+
+86
+00:02:50,400 --> 00:02:52,670
+that I've just drawn the red box around.
+这里我用红色框出了这部分
+
+87
+00:02:52,870 --> 00:02:54,680
+The squared distance between
+也即每个样本 x(i)
+
+88
+00:02:55,160 --> 00:02:57,540
+each example x(i) and the
+到 x(i) 所属的
+
+89
+00:02:57,690 --> 00:02:58,740
+location of the cluster
+聚类簇的中心
+
+90
+00:02:59,130 --> 00:03:00,210
+centroid to which x(i)
+距离的平方值
+
+91
+00:03:01,320 --> 00:03:01,920
+has been assigned.
+被分配
+
+92
+00:03:03,240 --> 00:03:06,070
+So let me just draw this in, let me explain this.
+我来解释一下
+
+93
+00:03:06,240 --> 00:03:07,800
+Here is the location of training
+这是训练样本 x(i) 的位置
+
+94
+00:03:08,190 --> 00:03:09,780
+example x(i), and here's the location
+这是 x(i) 这个样本被划分到的
+
+95
+00:03:10,410 --> 00:03:11,760
+of the cluster centroid to which
+聚类簇的中心的位置
+
+96
+00:03:11,970 --> 00:03:13,660
+example x(i) has been assigned.
+我们在图上解释一下
+
+97
+00:03:14,560 --> 00:03:17,080
+So to explain this in pictures, if here is X1, X2.
+如果这是 x1 x2
+
+98
+00:03:17,420 --> 00:03:19,540
+And if a point
+并且如果这个点
+
+99
+00:03:19,760 --> 00:03:21,210
+here, is my example
+是我的第 i 个样本
+
+100
+00:03:22,080 --> 00:03:23,060
+x(i), so if that
+x(i) 那么
+
+101
+00:03:23,110 --> 00:03:24,840
+is equal to my example x(i),
+也就是说这个值等于 x(i)
+
+102
+00:03:25,860 --> 00:03:27,000
+and if x(i) has been assigned
+并且 x(i) 被分到了
+
+103
+00:03:27,240 --> 00:03:28,270
+to some cluster centroid, and
+某一个聚类中心
+
+104
+00:03:28,340 --> 00:03:30,240
+I'll denote my cluster centroid with a cross.
+我用一个叉来表示这个聚类中心
+
+105
+00:03:30,630 --> 00:03:32,130
+So if that's the location of,
+所以 如果我们假设
+
+106
+00:03:32,300 --> 00:03:33,830
+you know, Mu 5, let's
+这个聚类中心是 μ5
+
+107
+00:03:34,370 --> 00:03:35,640
+say, if x(i) has been
+也就是说 假如 x(i)
+
+108
+00:03:35,850 --> 00:03:37,960
+assigned to cluster centroid 5 in my example up there.
+被分到第五个聚类簇
+
+109
+00:03:38,810 --> 00:03:40,660
+Then, the squared distance, that's
+那么这个距离平方值
+
+110
+00:03:40,940 --> 00:03:41,840
+the squared of the distance
+也就是点 x(i)
+
+111
+00:03:43,810 --> 00:03:46,010
+between the point x(i) and this
+和 x(i) 被分配到的聚类中心的
+
+112
+00:03:46,220 --> 00:03:48,400
+cluster centroid, to which x(i) has been assigned.
+距离的平方值
+
+113
+00:03:49,570 --> 00:03:50,720
+And what K-means can be shown
+那么 K均值算法
+
+114
+00:03:51,070 --> 00:03:52,540
+to be doing is that, it
+要做的事情就是
+
+115
+00:03:52,680 --> 00:03:54,480
+is trying to find parameters c(i)
+它将找到参数 c(i) 和 μi
+
+116
+00:03:55,270 --> 00:03:57,410
+and Mu(i), try to
+并且
+
+117
+00:03:57,570 --> 00:03:58,840
+find cMU to try to
+找到能够最小化
+
+118
+00:03:58,960 --> 00:04:00,450
+minimize this cost function J.
+代价函数J的cμ
+
+119
+00:04:01,440 --> 00:04:03,180
+This cost function is sometimes
+这个代价函数
+
+120
+00:04:03,680 --> 00:04:06,770
+also called the distortion cost
+在K均值算法中
+
+121
+00:04:07,060 --> 00:04:10,030
+function or the distortion of
+有时候也叫做
+
+122
+00:04:10,240 --> 00:04:12,130
+the K-means algorithm.
+失真代价函数(distortion cost function)
+
+123
+00:04:12,790 --> 00:04:13,360
+And, just to provide a little bit more
+再解释详细点
+
+124
+00:04:13,630 --> 00:04:15,750
+detail, here's the K-means algorithm,
+这是K均值算法
+
+125
+00:04:15,820 --> 00:04:16,450
+Here's exactly the algorithm as we have it, in the real
+这跟我们之前得到的
+
+126
+00:04:16,610 --> 00:04:17,960
+form the earlier slide.
+算法是一样的
+
+127
+00:04:18,950 --> 00:04:20,200
+And what this first step
+这个算法的第一步
+
+128
+00:04:21,030 --> 00:04:23,120
+of this algorithm is, this was
+就是聚类中心的分配
+
+129
+00:04:23,830 --> 00:04:25,910
+the cluster assignment step
+在这一步中
+
+130
+00:04:27,920 --> 00:04:29,850
+where we assign each
+我们要把每一个点
+
+131
+00:04:30,030 --> 00:04:32,910
+point to the cluster centroid, and
+划分给各自所属的聚类中心
+
+132
+00:04:33,010 --> 00:04:34,830
+it's possible to show mathematically that
+可以用数学证明
+
+133
+00:04:35,050 --> 00:04:36,210
+what the cluster assignment step
+这个聚类簇的划分步骤
+
+134
+00:04:36,450 --> 00:04:38,560
+is doing is exactly minimizing
+实际上就是在
+
+135
+00:04:40,770 --> 00:04:42,950
+J with respect
+对代价函数 J 进行最小化
+
+136
+00:04:43,420 --> 00:04:45,900
+to the variables, C1, C2
+关于参数 c(1) c(2)
+
+137
+00:04:46,380 --> 00:04:48,050
+and so on, up
+等等
+
+138
+00:04:48,170 --> 00:04:52,030
+to C(m), while holding the
+一直到 c(m)
+
+139
+00:04:52,480 --> 00:04:54,240
+closest centroids, MU1 up to
+而保持最近的聚类中心 μ1
+
+140
+00:04:54,720 --> 00:04:57,000
+MUK fixed.
+到 μk 固定不变
+
+141
+00:04:58,580 --> 00:04:59,640
+So, what the first assignment step
+因此 第一步要做的
+
+142
+00:04:59,900 --> 00:05:00,990
+does is you know, it doesn't change
+其实不是改变
+
+143
+00:05:01,240 --> 00:05:02,850
+the cluster centroids, but what it's
+聚类中心的位置
+
+144
+00:05:02,960 --> 00:05:05,730
+doing is, exactly picking the values of C1, C2, up to CM
+而是选择 c(1) c(2) 一直到 c(m)
+
+145
+00:05:07,790 --> 00:05:10,240
+that minimizes the cost
+来最小化这个代价函数
+
+146
+00:05:10,500 --> 00:05:11,790
+function or the distortion function,
+或者说失真函数 J
+
+147
+00:05:12,510 --> 00:05:14,440
+J. And it's possible to prove
+不难从数学的角度证明
+
+148
+00:05:14,670 --> 00:05:16,550
+that mathematically but, I won't do so here.
+但我在这里就不做了
+
+149
+00:05:17,170 --> 00:05:18,210
+That has a pretty intuitive meaning
+有一个很直观的意义,
+
+150
+00:05:18,610 --> 00:05:19,630
+of just, yo know, well let's assign
+只是,你知道的,那么让我们将
+
+151
+00:05:20,090 --> 00:05:21,040
+these points to the cluster centroid
+这些点到聚类中心
+
+152
+00:05:21,530 --> 00:05:22,820
+that is closest to it, because
+使之最接近它,因为
+
+153
+00:05:23,120 --> 00:05:24,160
+that's what minimizes the square
+这是最小平方距离的点
+
+154
+00:05:24,660 --> 00:05:26,860
+of distance between the points and the corresponding cluster centroid.
+和相应的聚类中心之间。
+
+155
+00:05:27,840 --> 00:05:29,090
+And then the other part of
+然后
+
+156
+00:05:29,790 --> 00:05:32,880
+the second step of K-means, this second step over here.
+K-均值算法的第二步
+
+157
+00:05:33,960 --> 00:05:35,480
+This second step was the move
+第二步是再次移动
+
+158
+00:05:35,690 --> 00:05:38,770
+centroid step and,
+中心点的步骤,
+
+159
+00:05:39,000 --> 00:05:40,020
+once again, I won't prove it,
+我不会去证明,
+
+160
+00:05:40,510 --> 00:05:41,250
+but it can be shown
+但它可以
+
+161
+00:05:41,520 --> 00:05:42,590
+mathematically, that what the
+用数学方法显示, 求中心
+
+162
+00:05:43,140 --> 00:05:44,910
+root centroid step does, is
+点步骤做的是,
+
+163
+00:05:45,150 --> 00:05:46,740
+it chooses the values
+选择
+
+164
+00:05:47,260 --> 00:05:49,370
+of mu that minimizes J.
+最小化mu得到的值
+
+165
+00:05:50,150 --> 00:05:53,000
+So it minimizes the cost function J with respect to,
+因此它的对J代价函数最小化,
+
+166
+00:05:53,380 --> 00:05:54,710
+where wrt is my
+用wrt
+
+167
+00:05:54,920 --> 00:05:56,930
+abbreviation for with respect to.
+作为缩写
+
+168
+00:05:57,030 --> 00:05:58,380
+But it minimizes J with respect
+但它最小化J
+
+169
+00:05:58,790 --> 00:06:01,930
+to the locations of the cluster centroids, Mu1 through MuK.
+与聚类中心的位置,从Mu1 到 MuK.
+
+170
+00:06:02,040 --> 00:06:05,690
+So, what K-means really
+所以, K均值方法真正
+
+171
+00:06:05,790 --> 00:06:06,910
+is doing is it's taking the
+做的是,它采用
+
+172
+00:06:07,010 --> 00:06:08,380
+two sets of variables and
+两组变量并
+
+173
+00:06:09,070 --> 00:06:11,210
+partitioning them into two halves right here.
+把它们分成两半。
+
+174
+00:06:11,550 --> 00:06:14,490
+First the C set of variables and then you have the Mu sets of variables.
+C组变量和Mu组变量。
+
+175
+00:06:15,450 --> 00:06:15,990
+And what it does is it first
+其做法是
+
+176
+00:06:16,560 --> 00:06:17,750
+minimizes J with respect
+最小化J与
+
+177
+00:06:18,050 --> 00:06:19,350
+to the variable C, and then minimizes
+变量c,接着最小化
+
+178
+00:06:19,700 --> 00:06:20,610
+J with respect the variables
+与变量
+
+179
+00:06:21,120 --> 00:06:22,590
+Mu, and then it keeps on iterating.
+Mu,接着保持迭代。
+
+180
+00:06:25,180 --> 00:06:26,680
+And so that's all that K-means does.
+所以,这就是K均值所做的。
+
+181
+00:06:27,700 --> 00:06:28,570
+And, now that we understand
+而且,现在我们明白
+
+182
+00:06:29,150 --> 00:06:30,870
+K-means, let's try to
+k-均值,让我们尽量
+
+183
+00:06:31,030 --> 00:06:32,190
+minimize this cost function J. We
+减少这种代价函数J.
+我们
+
+184
+00:06:32,430 --> 00:06:33,640
+can also use this to
+还可以用这个
+
+185
+00:06:33,800 --> 00:06:34,890
+try to debug our learning
+来调试我们的学习
+
+186
+00:06:35,090 --> 00:06:36,350
+algorithm and just kind
+算法,就是
+
+187
+00:06:36,520 --> 00:06:37,980
+of make sure that our implementation
+确保我们实现
+
+188
+00:06:38,900 --> 00:06:39,950
+of K-means is running correctly.
+k-均值正确运行。
+
+189
+00:06:41,220 --> 00:06:42,560
+So, we now understand the
+所以,我们理解
+
+190
+00:06:43,070 --> 00:06:44,260
+K-means algorithm as trying to
+K-均值算法来
+
+191
+00:06:44,610 --> 00:06:45,960
+optimize this cost function J,
+优化代价函数J,
+
+192
+00:06:46,640 --> 00:06:48,790
+which is also called the dispulsion function.
+它也被叫做 dispulsion 函数。
+
+193
+00:06:50,650 --> 00:06:51,600
+We can use that to debug K-means
+我们可以用它来调试k-均值
+
+194
+00:06:52,090 --> 00:06:53,060
+and help me show that K-means
+帮我表明k-均值是
+
+195
+00:06:53,130 --> 00:06:54,050
+is converging, and that it's
+收敛的,而且它
+
+196
+00:06:54,510 --> 00:06:56,160
+running properly, and in the
+正常运行,在
+
+197
+00:06:56,240 --> 00:06:57,460
+next video, we'll also see
+下一个视频中, 我们也可以看到
+
+198
+00:06:57,690 --> 00:06:59,040
+how we can us this to
+我们怎样用这个方法
+
+199
+00:06:59,120 --> 00:07:00,650
+help K-means find better clusters
+帮助K均值找到更好的簇
+
+200
+00:07:01,300 --> 00:07:03,240
+and help K-means to avoid local optima.
+并帮助k均值避免局部最优解。
+
diff --git a/srt/13 - 3 - Optimization Objective (7 min).srt b/srt/13 - 3 - Optimization Objective (7 min).srt
new file mode 100644
index 00000000..f66ebdfa
--- /dev/null
+++ b/srt/13 - 3 - Optimization Objective (7 min).srt
@@ -0,0 +1,804 @@
+1
+00:00:00,090 --> 00:00:01,540
+Most of the supervised learning algorithms
+
+2
+00:00:01,690 --> 00:00:02,890
+we've seen, things like linear
+
+3
+00:00:03,130 --> 00:00:04,730
+regression, logistic regression and
+
+4
+00:00:04,930 --> 00:00:05,850
+so on. All of those
+
+5
+00:00:06,300 --> 00:00:08,089
+algorithms have an optimization objective
+
+6
+00:00:08,670 --> 00:00:10,920
+or some cost function that the algorithm was trying to minimize.
+
+7
+00:00:11,920 --> 00:00:13,180
+It turns out that K-means also
+
+8
+00:00:13,770 --> 00:00:15,730
+has an optimization objective or
+
+9
+00:00:15,870 --> 00:00:18,720
+a cost function that is trying to minimize.
+
+10
+00:00:19,630 --> 00:00:20,180
+And in this video, I'd like to tell
+
+11
+00:00:20,230 --> 00:00:23,620
+you what that optimization objective is.
+
+12
+00:00:23,730 --> 00:00:24,420
+And the reason I want to do so
+
+13
+00:00:24,750 --> 00:00:26,960
+is because this will be useful to us for two purposes.
+
+14
+00:00:28,020 --> 00:00:29,330
+First, knowing what is the
+
+15
+00:00:29,480 --> 00:00:30,890
+optimization objective of K-means
+
+16
+00:00:31,150 --> 00:00:32,390
+will help us to
+
+17
+00:00:32,690 --> 00:00:33,970
+debug the learning algorithm and
+
+18
+00:00:34,070 --> 00:00:35,080
+just make sure that K-means is
+
+19
+00:00:35,300 --> 00:00:37,100
+running correctly, and second,
+
+20
+00:00:37,610 --> 00:00:39,290
+and perhaps even more importantly, in
+
+21
+00:00:39,530 --> 00:00:41,290
+a later video we'll talk
+
+22
+00:00:41,490 --> 00:00:42,580
+about how we can use this to
+
+23
+00:00:42,730 --> 00:00:44,000
+help K-means find better clusters
+
+24
+00:00:44,070 --> 00:00:46,290
+and avoid local optima, but we'll do that in a later video that follows this one.
+
+25
+00:00:46,410 --> 00:00:47,330
+Just as a quick reminder, while K-means is
+
+26
+00:00:49,680 --> 00:00:52,870
+running we're going to be
+
+27
+00:00:54,450 --> 00:00:55,820
+keeping track of two sets of variables.
+
+28
+00:00:56,430 --> 00:00:58,390
+First is the CI's and
+
+29
+00:00:58,700 --> 00:00:59,830
+that keeps track of the
+
+30
+00:01:00,190 --> 00:01:01,600
+index or the number of the cluster
+
+31
+00:01:02,730 --> 00:01:05,040
+to which an example x(i) is currently assigned.
+
+32
+00:01:05,230 --> 00:01:05,960
+And then, the other set of variables
+
+33
+00:01:06,540 --> 00:01:07,580
+we use as Mu subscript
+
+34
+00:01:08,120 --> 00:01:09,410
+K, which is the location
+
+35
+00:01:10,140 --> 00:01:12,110
+of cluster centroid K. And,
+
+36
+00:01:12,380 --> 00:01:13,750
+again, for K-means
+
+37
+00:01:14,030 --> 00:01:17,230
+we use capital K to denote the total number of clusters.
+
+38
+00:01:17,890 --> 00:01:19,310
+And here lower case K,
+
+39
+00:01:20,010 --> 00:01:20,910
+you know, is going to be an
+
+40
+00:01:21,040 --> 00:01:22,650
+index into the cluster
+
+41
+00:01:22,970 --> 00:01:23,930
+centroids, and so lower
+
+42
+00:01:24,030 --> 00:01:24,940
+case k is going to be
+
+43
+00:01:25,140 --> 00:01:26,390
+a number between 1 and
+
+44
+00:01:26,600 --> 00:01:29,630
+capital K. Now, here's
+
+45
+00:01:29,840 --> 00:01:31,040
+one more bit of notation which
+
+46
+00:01:31,270 --> 00:01:32,280
+is going to use Mu
+
+47
+00:01:32,630 --> 00:01:34,560
+subscript c(i) to denote
+
+48
+00:01:34,970 --> 00:01:36,660
+the cluster centroid of the
+
+49
+00:01:36,780 --> 00:01:38,400
+cluster to which example x(i)
+
+50
+00:01:38,880 --> 00:01:40,500
+has been assigned and
+
+51
+00:01:40,710 --> 00:01:42,030
+to explain that notation
+
+52
+00:01:42,450 --> 00:01:43,450
+a little bit more, let's
+
+53
+00:01:43,660 --> 00:01:45,600
+say that x(i) has been
+
+54
+00:01:45,740 --> 00:01:47,760
+assigned to cluster number five.
+
+55
+00:01:48,880 --> 00:01:49,830
+What that means is that c(i),
+
+56
+00:01:50,850 --> 00:01:52,290
+that is the index of x(i),
+
+57
+00:01:53,130 --> 00:01:54,300
+that that is equal to 5.
+
+58
+00:01:54,420 --> 00:01:57,640
+Right? Because you know, having c(i) equals 5,
+
+59
+00:01:57,800 --> 00:01:59,270
+that's what it means for the
+
+60
+00:02:00,500 --> 00:02:01,720
+example x(i) to be
+
+61
+00:02:01,910 --> 00:02:03,440
+assigned to cluster number 5.
+
+62
+00:02:03,510 --> 00:02:05,700
+And so Mu subscript
+
+63
+00:02:06,290 --> 00:02:07,960
+c(i) is going to
+
+64
+00:02:08,100 --> 00:02:09,630
+be equal to Mu subscript
+
+65
+00:02:10,080 --> 00:02:12,260
+5 because c(i) is equal
+
+66
+00:02:13,700 --> 00:02:14,100
+to 5.
+
+67
+00:02:15,100 --> 00:02:16,570
+This Mu substitute c(i) is the
+
+68
+00:02:16,660 --> 00:02:18,420
+cluster centroid of cluster number
+
+69
+00:02:18,730 --> 00:02:19,670
+5, which is the cluster
+
+70
+00:02:20,120 --> 00:02:22,480
+to which my example x(i) has been assigned.
+
+71
+00:02:23,470 --> 00:02:24,730
+With this notation, we're now
+
+72
+00:02:24,960 --> 00:02:26,040
+ready to write out what
+
+73
+00:02:26,200 --> 00:02:28,150
+is the optimization objective of
+
+74
+00:02:28,290 --> 00:02:30,360
+the K Mu clustering algorithm.
+
+75
+00:02:30,760 --> 00:02:30,800
+And here it is.
+
+76
+00:02:31,330 --> 00:02:32,940
+The cost function that K-means
+
+77
+00:02:33,040 --> 00:02:34,380
+is minimizing is the
+
+78
+00:02:34,570 --> 00:02:35,770
+function J of all of
+
+79
+00:02:35,880 --> 00:02:37,470
+these parameters c1 through
+
+80
+00:02:37,890 --> 00:02:39,610
+cM, Mu1 through MuK, that
+
+81
+00:02:39,790 --> 00:02:41,570
+K-means is varying as the algorithm runs.
+
+82
+00:02:42,100 --> 00:02:43,930
+And the optimization objective is shown
+
+83
+00:02:44,160 --> 00:02:45,520
+on the right, is the average of
+
+84
+00:02:45,610 --> 00:02:46,430
+one over M of the sum
+
+85
+00:02:46,620 --> 00:02:48,730
+of i equals one through M of this term here
+
+86
+00:02:50,400 --> 00:02:52,670
+that I've just drawn the red box around.
+
+87
+00:02:52,870 --> 00:02:54,680
+The squared distance between
+
+88
+00:02:55,160 --> 00:02:57,540
+each example x(i) and the
+
+89
+00:02:57,690 --> 00:02:58,740
+location of the cluster
+
+90
+00:02:59,130 --> 00:03:00,210
+centroid to which x(i)
+
+91
+00:03:01,320 --> 00:03:01,920
+has been assigned.
+
+92
+00:03:03,240 --> 00:03:06,070
+So let me just draw this in, let me explain this.
+
+93
+00:03:06,240 --> 00:03:07,800
+Here is the location of training
+
+94
+00:03:08,190 --> 00:03:09,780
+example x(i), and here's the location
+
+95
+00:03:10,410 --> 00:03:11,760
+of the cluster centroid to which
+
+96
+00:03:11,970 --> 00:03:13,660
+example x(i) has been assigned.
+
+97
+00:03:14,560 --> 00:03:17,080
+So to explain this in pictures, if here is X1, X2.
+
+98
+00:03:17,420 --> 00:03:19,540
+And if a point
+
+99
+00:03:19,760 --> 00:03:21,210
+here, is my example
+
+100
+00:03:22,080 --> 00:03:23,060
+x(i), so if that
+
+101
+00:03:23,110 --> 00:03:24,840
+is equal to my example x(i),
+
+102
+00:03:25,860 --> 00:03:27,000
+and if x(i) has been assigned
+
+103
+00:03:27,240 --> 00:03:28,270
+to some cluster centroid, and
+
+104
+00:03:28,340 --> 00:03:30,240
+I'll denote my cluster centroid with a cross.
+
+105
+00:03:30,630 --> 00:03:32,130
+So if that's the location of,
+
+106
+00:03:32,300 --> 00:03:33,830
+you know, Mu 5, let's
+
+107
+00:03:34,370 --> 00:03:35,640
+say, if x(i) has been
+
+108
+00:03:35,850 --> 00:03:37,960
+assigned to cluster centroid 5 in my example up there.
+
+109
+00:03:38,810 --> 00:03:40,660
+Then, the squared distance, that's
+
+110
+00:03:40,940 --> 00:03:41,840
+the squared of the distance
+
+111
+00:03:43,810 --> 00:03:46,010
+between the point x(i) and this
+
+112
+00:03:46,220 --> 00:03:48,400
+cluster centroid, to which x(i) has been assigned.
+
+113
+00:03:49,570 --> 00:03:50,720
+And what K-means can be shown
+
+114
+00:03:51,070 --> 00:03:52,540
+to be doing is that, it
+
+115
+00:03:52,680 --> 00:03:54,480
+is trying to find parameters c(i)
+
+116
+00:03:55,270 --> 00:03:57,410
+and Mu(i), try to
+
+117
+00:03:57,570 --> 00:03:58,840
+find cMU to try to
+
+118
+00:03:58,960 --> 00:04:00,450
+minimize this cost function J.
+
+119
+00:04:01,440 --> 00:04:03,180
+This cost function is sometimes
+
+120
+00:04:03,680 --> 00:04:06,770
+also called the distortion cost
+
+121
+00:04:07,060 --> 00:04:10,030
+function or the distortion of
+
+122
+00:04:10,240 --> 00:04:12,130
+the K-means algorithm.
+
+123
+00:04:12,790 --> 00:04:13,360
+And, just to provide a little bit more
+
+124
+00:04:13,630 --> 00:04:15,750
+detail, here's the K-means algorithm,
+
+125
+00:04:15,820 --> 00:04:16,450
+Here's exactly the algorithm as we have it, in the real
+
+126
+00:04:16,610 --> 00:04:17,960
+form the earlier slide.
+
+127
+00:04:18,950 --> 00:04:20,200
+And what this first step
+
+128
+00:04:21,030 --> 00:04:23,120
+of this algorithm is, this was
+
+129
+00:04:23,830 --> 00:04:25,910
+the cluster assignment step
+
+130
+00:04:27,920 --> 00:04:29,850
+where we assign each
+
+131
+00:04:30,030 --> 00:04:32,910
+point to the cluster centroid, and
+
+132
+00:04:33,010 --> 00:04:34,830
+it's possible to show mathematically that
+
+133
+00:04:35,050 --> 00:04:36,210
+what the cluster assignment step
+
+134
+00:04:36,450 --> 00:04:38,560
+is doing is exactly minimizing
+
+135
+00:04:40,770 --> 00:04:42,950
+J with respect
+
+136
+00:04:43,420 --> 00:04:45,900
+to the variables, C1, C2
+
+137
+00:04:46,380 --> 00:04:48,050
+and so on, up
+
+138
+00:04:48,170 --> 00:04:52,030
+to C(m), while holding the
+
+139
+00:04:52,480 --> 00:04:54,240
+closest centroids, MU1 up to
+
+140
+00:04:54,720 --> 00:04:57,000
+MUK fixed.
+
+141
+00:04:58,580 --> 00:04:59,640
+So, what the first assignment step
+
+142
+00:04:59,900 --> 00:05:00,990
+does is you know, it doesn't change
+
+143
+00:05:01,240 --> 00:05:02,850
+the cluster centroids, but what it's
+
+144
+00:05:02,960 --> 00:05:05,730
+doing is, exactly picking the values of C1, C2, up to CM
+
+145
+00:05:07,790 --> 00:05:10,240
+that minimizes the cost
+
+146
+00:05:10,500 --> 00:05:11,790
+function or the distortion function,
+
+147
+00:05:12,510 --> 00:05:14,440
+J. And it's possible to prove
+
+148
+00:05:14,670 --> 00:05:16,550
+that mathematically but, I won't do so here.
+
+149
+00:05:17,170 --> 00:05:18,210
+That has a pretty intuitive meaning
+
+150
+00:05:18,610 --> 00:05:19,630
+of just, yo know, well let's assign
+
+151
+00:05:20,090 --> 00:05:21,040
+these points to the cluster centroid
+
+152
+00:05:21,530 --> 00:05:22,820
+that is closest to it, because
+
+153
+00:05:23,120 --> 00:05:24,160
+that's what minimizes the square
+
+154
+00:05:24,660 --> 00:05:26,860
+of distance between the points and the corresponding cluster centroid.
+
+155
+00:05:27,840 --> 00:05:29,090
+And then the other part of
+
+156
+00:05:29,790 --> 00:05:32,880
+the second step of K-means, this second step over here.
+
+157
+00:05:33,960 --> 00:05:35,480
+This second step was the move
+
+158
+00:05:35,690 --> 00:05:38,770
+centroid step and,
+
+159
+00:05:39,000 --> 00:05:40,020
+once again, I won't prove it,
+
+160
+00:05:40,510 --> 00:05:41,250
+but it can be shown
+
+161
+00:05:41,520 --> 00:05:42,590
+mathematically, that what the
+
+162
+00:05:43,140 --> 00:05:44,910
+root centroid step does, is
+
+163
+00:05:45,150 --> 00:05:46,740
+it chooses the values
+
+164
+00:05:47,260 --> 00:05:49,370
+of mu that minimizes J.
+
+165
+00:05:50,150 --> 00:05:51,270
+So it minimizes the cost function
+
+166
+00:05:51,650 --> 00:05:53,000
+J with respect to,
+
+167
+00:05:53,380 --> 00:05:54,710
+where wrt is my
+
+168
+00:05:54,920 --> 00:05:56,930
+abbreviation for with respect to.
+
+169
+00:05:57,030 --> 00:05:58,380
+But it minimizes J with respect
+
+170
+00:05:58,790 --> 00:06:01,930
+to the locations of the cluster centroids, Mu1 through MuK.
+
+171
+00:06:02,040 --> 00:06:05,690
+So, what K-means really
+
+172
+00:06:05,790 --> 00:06:06,910
+is doing is it's taking the
+
+173
+00:06:07,010 --> 00:06:08,380
+two sets of variables and
+
+174
+00:06:09,070 --> 00:06:11,210
+partitioning them into two halves right here.
+
+175
+00:06:11,550 --> 00:06:14,490
+First the C set of variables and then you have the Mu sets of variables.
+
+176
+00:06:15,450 --> 00:06:15,990
+And what it does is it first
+
+177
+00:06:16,560 --> 00:06:17,750
+minimizes J with respect
+
+178
+00:06:18,050 --> 00:06:19,350
+to the variable C, and then minimizes
+
+179
+00:06:19,700 --> 00:06:20,610
+J with respect the variables
+
+180
+00:06:21,120 --> 00:06:22,590
+Mu, and then it keeps on iterating.
+
+181
+00:06:25,180 --> 00:06:26,680
+And so that's all that K-means does.
+
+182
+00:06:27,700 --> 00:06:28,570
+And, now that we understand
+
+183
+00:06:29,150 --> 00:06:30,870
+K-means, let's try to
+
+184
+00:06:31,030 --> 00:06:32,190
+minimize this cost function J. We
+
+185
+00:06:32,430 --> 00:06:33,640
+can also use this to
+
+186
+00:06:33,800 --> 00:06:34,890
+try to debug our learning
+
+187
+00:06:35,090 --> 00:06:36,350
+algorithm and just kind
+
+188
+00:06:36,520 --> 00:06:37,980
+of make sure that our implementation
+
+189
+00:06:38,900 --> 00:06:39,950
+of K-means is running correctly.
+
+190
+00:06:41,220 --> 00:06:42,560
+So, we now understand the
+
+191
+00:06:43,070 --> 00:06:44,260
+K-means algorithm as trying to
+
+192
+00:06:44,610 --> 00:06:45,960
+optimize this cost function J,
+
+193
+00:06:46,640 --> 00:06:48,790
+which is also called the dispulsion function.
+
+194
+00:06:50,650 --> 00:06:51,600
+We can use that to debug K-means
+
+195
+00:06:52,090 --> 00:06:53,060
+and help me show that K-means
+
+196
+00:06:53,130 --> 00:06:54,050
+is converging, and that it's
+
+197
+00:06:54,510 --> 00:06:56,160
+running properly, and in the
+
+198
+00:06:56,240 --> 00:06:57,460
+next video, we'll also see
+
+199
+00:06:57,690 --> 00:06:59,040
+how we can us this to
+
+200
+00:06:59,120 --> 00:07:00,650
+help K-means find better clusters
+
+201
+00:07:01,300 --> 00:07:03,240
+and help K-means to avoid local optima.
+
diff --git a/srt/13 - 4 - Random Initialization (8 min).srt b/srt/13 - 4 - Random Initialization (8 min).srt
new file mode 100644
index 00000000..bbbd8a3c
--- /dev/null
+++ b/srt/13 - 4 - Random Initialization (8 min).srt
@@ -0,0 +1,1151 @@
+1
+00:00:00,170 --> 00:00:01,340
+In this video, I'd like
+在这个视频中 我想要
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,450 --> 00:00:03,230
+to talk about how to initialize
+讨论一下如何初始化
+
+3
+00:00:04,580 --> 00:00:05,970
+K-means and more importantly, this will
+K均值聚类方法 更重要的是 这将
+
+4
+00:00:06,170 --> 00:00:07,240
+lead into a discussion of
+引导我们讨论
+
+5
+00:00:07,550 --> 00:00:10,210
+how to make K-means avoid local optima as well.
+如何避开局部最优来构建K均值聚类方法
+
+6
+00:00:10,740 --> 00:00:12,390
+Here's the K-means clustering algorithm
+这是一个我们之前讨论过的
+
+7
+00:00:12,950 --> 00:00:14,420
+that we talked about earlier.
+K均值聚类算法
+
+8
+00:00:15,760 --> 00:00:16,760
+One step that we never really
+其中我们之前没有
+
+9
+00:00:17,260 --> 00:00:18,350
+talked much about was this step
+讨论得太多的是这一步
+
+10
+00:00:18,820 --> 00:00:21,560
+of how you randomly initialize the cluster centroids.
+如何初始化聚类中心这一步
+
+11
+00:00:22,390 --> 00:00:23,490
+There are few different ways that
+有几种不同的方法
+
+12
+00:00:23,710 --> 00:00:25,350
+one can imagine using to randomly
+可以用来随机
+
+13
+00:00:25,960 --> 00:00:26,860
+initialize the cluster centroids.
+初始化聚类中心
+
+14
+00:00:27,510 --> 00:00:28,580
+But, it turns out that
+但是 事实证明
+
+15
+00:00:28,720 --> 00:00:29,820
+there is one method that is
+有一种方法比其他
+
+16
+00:00:30,050 --> 00:00:31,700
+much more recommended than most
+大多数可能考虑到的方法
+
+17
+00:00:32,080 --> 00:00:33,830
+of the other options one might think about.
+更加被推荐
+
+18
+00:00:34,400 --> 00:00:35,250
+So, let me tell you about
+接下来就告诉你这个
+
+19
+00:00:35,590 --> 00:00:38,160
+that option since it's what often seems to work best.
+方法 因为它可能是效果最好的一种方法
+
+20
+00:00:39,550 --> 00:00:42,210
+Here's how I usually initialize my cluster centroids.
+这里展示了我通常是如何初始化我的聚类中心的
+
+21
+00:00:43,300 --> 00:00:44,710
+When running K-means, you should have
+当运行K均值方法时 你需要有
+
+22
+00:00:45,140 --> 00:00:47,160
+the number of cluster centroids, K,
+一个聚类中心数值K
+
+23
+00:00:47,430 --> 00:00:48,520
+set to be less than the
+K值要比
+
+24
+00:00:48,590 --> 00:00:50,090
+number of training examples M. It
+训练样本的数量m小
+
+25
+00:00:50,170 --> 00:00:51,210
+would be really weird to run
+如果运行一个
+
+26
+00:00:51,430 --> 00:00:52,600
+K-means with a number
+K均值
+
+27
+00:00:52,870 --> 00:00:54,270
+of cluster centroids that's, you know,
+聚类中心数值
+
+28
+00:00:54,520 --> 00:00:55,790
+equal or greater than the number of examples you have, right?
+等于或者大于样本数的K均值聚类方法会很奇怪
+
+29
+00:00:58,080 --> 00:00:59,010
+So the way I
+我通常用来
+
+30
+00:00:59,150 --> 00:01:00,510
+usually initialize K-means is,
+初始化K均值聚类的方法是
+
+31
+00:01:00,770 --> 00:01:02,510
+I would randomly pick k training
+随机挑选K个训练
+
+32
+00:01:02,990 --> 00:01:05,170
+examples. So, and, what
+样本 然后
+
+33
+00:01:05,610 --> 00:01:06,730
+I do is then set Mu1
+我要做的是设定μ1
+
+34
+00:01:06,850 --> 00:01:09,320
+of MuK equal to these k examples.
+到μk让它们等于这个K个样本
+
+35
+00:01:10,610 --> 00:01:11,470
+Let me show you a concrete example.
+让我展示一个具体的例子
+
+36
+00:01:12,560 --> 00:01:14,190
+Lets say that k is
+我们假设 K
+
+37
+00:01:14,470 --> 00:01:16,600
+equal to 2 and so
+等于2 那么
+
+38
+00:01:17,070 --> 00:01:19,520
+on this example on the right let's say I want to find two clusters.
+就在这个例子的右边 假设我们想找到两个聚类
+
+39
+00:01:21,170 --> 00:01:22,060
+So, what I'm going to
+那么为了初始化
+
+40
+00:01:22,200 --> 00:01:23,350
+do in order to initialize
+聚类中心
+
+41
+00:01:23,770 --> 00:01:25,340
+my cluster centroids is, I'm
+我要做的是
+
+42
+00:01:25,470 --> 00:01:27,320
+going to randomly pick a couple examples.
+随机挑选几个样本
+
+43
+00:01:27,760 --> 00:01:28,960
+And let's say, I pick
+比如说 我挑选了
+
+44
+00:01:29,230 --> 00:01:31,060
+this one and I pick that one.
+这个和这个
+
+45
+00:01:31,230 --> 00:01:32,320
+And the way I'm going
+我要
+
+46
+00:01:32,380 --> 00:01:34,100
+to initialize my cluster centroids
+初始化聚类中心的方法
+
+47
+00:01:34,310 --> 00:01:35,190
+is, I'm just going to initialize
+就是 我只需要初始化
+
+48
+00:01:36,200 --> 00:01:38,930
+my cluster centroids to be right on top of those examples.
+聚类中心正确的样本中
+
+49
+00:01:39,530 --> 00:01:40,430
+So that's my first cluster centroid
+因此这是我的第一个聚类中心
+
+50
+00:01:41,410 --> 00:01:43,230
+and that's my second cluster centroid, and
+这是我的第二个聚类中心
+
+51
+00:01:43,390 --> 00:01:45,770
+that's one random initialization of K-means.
+这就是一个随机初始化K均值聚类的方法
+
+52
+00:01:48,540 --> 00:01:50,480
+The one I drew looks like a particularly good one.
+刚刚我画的看上去是相当不错的一个例子
+
+53
+00:01:50,890 --> 00:01:51,810
+And sometimes I might get less
+但是有时候我可能不会
+
+54
+00:01:52,040 --> 00:01:53,370
+lucky and maybe I'll end
+那么幸运 也许我最后
+
+55
+00:01:53,510 --> 00:01:54,900
+up picking that as my first
+会挑选到 这一个是我第一个
+
+56
+00:01:55,330 --> 00:01:58,420
+random initial example, and that as my second one.
+挑选到的初始化样本 而这是第二个
+
+57
+00:01:59,050 --> 00:02:01,380
+And here I'm picking two examples because k equals 2.
+这就是我所挑选的两个样本 因为K等于2
+
+58
+00:02:01,590 --> 00:02:03,590
+Some we have randomly picked two
+我们随机挑选了两个
+
+59
+00:02:03,890 --> 00:02:05,030
+training examples and if
+训练样本 如果
+
+60
+00:02:05,100 --> 00:02:06,660
+I chose those two then I'll
+我挑选这两个 那么我
+
+61
+00:02:06,830 --> 00:02:08,040
+end up with, may be
+结果就有可能得到
+
+62
+00:02:08,250 --> 00:02:09,200
+this as my first cluster
+这个是第一个聚类
+
+63
+00:02:09,510 --> 00:02:10,980
+centroid and that as
+中心 这个是
+
+64
+00:02:11,140 --> 00:02:13,560
+my second initial location of the cluster centroid.
+第二个聚类中心
+
+65
+00:02:14,150 --> 00:02:15,690
+So, that's how you can randomly
+这就是如何随机
+
+66
+00:02:16,070 --> 00:02:17,560
+initialize the cluster centroids.
+初始化聚类中心
+
+67
+00:02:17,810 --> 00:02:19,670
+And so at initialization, your
+因此在初始化时 你的第一个
+
+68
+00:02:19,860 --> 00:02:21,110
+first cluster centroid Mu1 will
+聚类中心μ1
+
+69
+00:02:21,270 --> 00:02:23,350
+be equal to x(i) for
+等于x(i)
+
+70
+00:02:23,520 --> 00:02:25,870
+some randomly value of i and
+对于某一个随机的i值
+
+71
+00:02:26,980 --> 00:02:27,660
+Mu2 will be equal to x(j)
+μ2等于x(j)
+
+72
+00:02:29,240 --> 00:02:30,980
+for some different randomly chosen value
+对应另一个随机选择的不同的
+
+73
+00:02:31,380 --> 00:02:32,830
+of j and so on,
+j的值 等等
+
+74
+00:02:32,910 --> 00:02:34,440
+if you have more clusters and more cluster centroid.
+如果你有更多的聚类和更多的聚类中心的话
+
+75
+00:02:35,680 --> 00:02:37,540
+And sort of the side common.
+通常
+
+76
+00:02:38,110 --> 00:02:39,240
+I should say that in the
+应该这样说
+
+77
+00:02:39,320 --> 00:02:40,840
+earlier video where I first
+前面的视频中 在我第一次
+
+78
+00:02:41,150 --> 00:02:43,030
+illustrated K-means with the animation.
+用动画说明K均值方法时
+
+79
+00:02:44,310 --> 00:02:45,070
+In that set of slides.
+在那些幻灯片中
+
+80
+00:02:45,900 --> 00:02:46,890
+Only for the purpose of illustration.
+仅仅是为了说明
+
+81
+00:02:47,590 --> 00:02:48,690
+I actually used a different
+我实际上用了一种不同的
+
+82
+00:02:49,240 --> 00:02:51,750
+method of initialization for my cluster centroids.
+初始化方法来初始化聚类中心
+
+83
+00:02:52,460 --> 00:02:53,790
+But the method described on
+而这张幻灯片中描述的方法
+
+84
+00:02:53,900 --> 00:02:55,940
+this slide, this is really the recommended way.
+是真正被推荐的方法
+
+85
+00:02:56,430 --> 00:02:58,850
+And the way that you should probably use, when you implement K-means.
+这种方法在你实现K均值聚类的时候可能会用到
+
+86
+00:03:00,090 --> 00:03:01,560
+So, as they suggested perhaps
+根据推荐
+
+87
+00:03:02,070 --> 00:03:04,090
+by these two illustrations on the right.
+也许通过这右边的两个图
+
+88
+00:03:04,930 --> 00:03:06,050
+You might really guess that K-means
+你可能会猜到K均值方法
+
+89
+00:03:06,530 --> 00:03:08,130
+can end up converging to
+最终可能会得到
+
+90
+00:03:08,260 --> 00:03:10,150
+different solutions depending on
+不同的结果 取决于
+
+91
+00:03:10,860 --> 00:03:12,470
+exactly how the clusters
+聚类簇的初始化方法
+
+92
+00:03:12,990 --> 00:03:15,170
+were initialized, and so, depending on the random initialization.
+因此也就取决于随机的初始化
+
+93
+00:03:16,280 --> 00:03:18,180
+K-means can end up at different solutions.
+K均值方法最后可能得到不同的结果
+
+94
+00:03:18,930 --> 00:03:22,560
+And, in particular, K-means can actually end up at local optima.
+尤其是如果K均值方法落在局部最优的时候
+
+95
+00:03:23,650 --> 00:03:24,920
+If you're given the data sale like this.
+如果给你一些数据 比如说这些
+
+96
+00:03:25,400 --> 00:03:26,370
+Well, it looks like, you know, there
+这看起来好像有
+
+97
+00:03:26,660 --> 00:03:28,340
+are three clusters, and so,
+3个聚类 那么
+
+98
+00:03:28,780 --> 00:03:30,090
+if you run K-means and if
+如果你运行K均值方法 如果
+
+99
+00:03:30,150 --> 00:03:31,380
+it ends up at a good
+它最后得到一个
+
+100
+00:03:31,820 --> 00:03:32,910
+local optima this might be
+局部最优 这可能是
+
+101
+00:03:33,040 --> 00:03:35,830
+really the global optima, you might end up with that cluster ring.
+真正的全局最优 你可能会得到这样的聚类结果
+
+102
+00:03:36,820 --> 00:03:38,440
+But if you had a particularly
+但是如果你运气特别
+
+103
+00:03:39,110 --> 00:03:41,630
+unlucky, random initialization, K-means
+不好 随机初始化 K均值方法
+
+104
+00:03:42,100 --> 00:03:43,660
+can also get stuck at different
+也可能会卡在不同的
+
+105
+00:03:44,180 --> 00:03:45,740
+local optima. So, in
+局部最优上面 因此在
+
+106
+00:03:45,850 --> 00:03:47,330
+this example on the left
+有变的这个例子中
+
+107
+00:03:47,620 --> 00:03:48,700
+it looks like this blue cluster has captured
+看上去蓝色的聚类捕捉到了
+
+108
+00:03:49,470 --> 00:03:51,700
+a lot of points of the left and then the they were on the green clusters
+左边的很多点 而且它们在绿色的聚类中
+
+109
+00:03:52,050 --> 00:03:54,810
+each is captioned on the relatively small number of points.
+每一个聚类都捕捉到了相对较少的点
+
+110
+00:03:55,020 --> 00:03:56,480
+And so, this corresponds to
+这与
+
+111
+00:03:56,640 --> 00:03:58,470
+a bad local optima because it
+不好的局部最优相对应 因为
+
+112
+00:03:58,530 --> 00:04:00,060
+has basically taken these two
+它基本上是基于这两个
+
+113
+00:04:00,470 --> 00:04:01,560
+clusters and used them into
+聚类的 并且它们
+
+114
+00:04:01,780 --> 00:04:03,440
+1 and furthermore, has
+进一步合并成了1个聚类 而
+
+115
+00:04:04,150 --> 00:04:06,070
+split the second cluster into
+把第二个聚类分割成了
+
+116
+00:04:06,580 --> 00:04:09,170
+two separate sub-clusters like
+两个像这样的小的聚类
+
+117
+00:04:09,380 --> 00:04:10,270
+so, and it has also
+它也把
+
+118
+00:04:10,720 --> 00:04:12,280
+taken the second cluster and
+第二个聚类
+
+119
+00:04:12,540 --> 00:04:14,220
+split it into two
+分割成了两个
+
+120
+00:04:14,460 --> 00:04:16,630
+separate sub-clusters like so, and
+分别的像这样的小聚类簇
+
+121
+00:04:16,760 --> 00:04:17,880
+so, both of these
+这两个
+
+122
+00:04:18,000 --> 00:04:18,970
+examples on the lower
+右下方的例子
+
+123
+00:04:19,220 --> 00:04:20,890
+right correspond to different local
+对应与K均值方法的
+
+124
+00:04:21,250 --> 00:04:22,440
+optima of K-means and in fact,
+不同的局部最优 实际上
+
+125
+00:04:22,890 --> 00:04:24,440
+in this example here,
+这里的这个例子
+
+126
+00:04:25,070 --> 00:04:26,150
+the cluster, the red cluster
+这个红色的簇
+
+127
+00:04:26,550 --> 00:04:27,870
+has captured only a single optima example.
+只捕捉到了一个最好的样本
+
+128
+00:04:28,380 --> 00:04:29,810
+And the term local
+这个局部
+
+129
+00:04:30,200 --> 00:04:31,000
+optima, by the way, refers
+最优项 顺便提一下 代表
+
+130
+00:04:31,490 --> 00:04:32,930
+to local optima of this
+这个失真函数J
+
+131
+00:04:33,190 --> 00:04:35,940
+distortion function J, and
+的局部最优
+
+132
+00:04:36,320 --> 00:04:38,380
+what these solutions on the
+这些在右下方的解
+
+133
+00:04:38,590 --> 00:04:39,830
+lower left, what these local
+这些局部
+
+134
+00:04:40,120 --> 00:04:41,420
+optima correspond to is
+最优所对应的是
+
+135
+00:04:41,530 --> 00:04:42,880
+really solutions where K-means
+真正的K均值方法
+
+136
+00:04:43,330 --> 00:04:44,050
+has gotten stuck to the local
+所遇到的局部
+
+137
+00:04:44,600 --> 00:04:45,940
+optima and it's not doing
+最优 且
+
+138
+00:04:46,170 --> 00:04:47,940
+a very good job minimizing this
+通过最小化这个
+
+139
+00:04:48,110 --> 00:04:50,030
+distortion function J. So,
+失真函数J并不能得到很好的结果 因此
+
+140
+00:04:50,540 --> 00:04:52,250
+if you're worried about K-means getting
+如果你担心K均值方法会遇到
+
+141
+00:04:52,540 --> 00:04:53,810
+stuck in local optima, if
+局部最优的问题 如果
+
+142
+00:04:53,970 --> 00:04:55,110
+you want to increase the odds
+你想提高
+
+143
+00:04:55,330 --> 00:04:56,950
+of K-means finding the best
+K均值方法找到最
+
+144
+00:04:57,230 --> 00:04:58,480
+possible clustering, like that shown
+有可能的聚类的几率的话
+
+145
+00:04:58,730 --> 00:05:00,290
+on top here, what we
+就像这上面所展示的 我们能做的
+
+146
+00:05:00,350 --> 00:05:02,820
+can do, is try multiple, random initializations.
+是尝试多次 随机的初始化
+
+147
+00:05:03,580 --> 00:05:04,820
+So, instead of just initializing K-means
+而不是仅仅初始化一次K均值方法
+
+148
+00:05:05,430 --> 00:05:06,460
+once and hopping that that
+就希望它会得到
+
+149
+00:05:06,670 --> 00:05:07,680
+works, what we can do
+很好的结果 我们能做的是
+
+150
+00:05:08,040 --> 00:05:10,020
+is, initialize K-means lots of
+初始化K均值很多次
+
+151
+00:05:10,130 --> 00:05:10,990
+times and run K-means lots of
+并运行K均值方法很多次
+
+152
+00:05:11,890 --> 00:05:12,870
+times, and use that to
+通过多次尝试
+
+153
+00:05:12,950 --> 00:05:13,840
+try to make sure we get
+来保证我们最终能得到
+
+154
+00:05:14,110 --> 00:05:15,640
+as good a solution, as
+一个足够好的结果 一个
+
+155
+00:05:15,800 --> 00:05:18,380
+good a local or global optima as possible.
+尽可能局部或全局最优的结果
+
+156
+00:05:19,480 --> 00:05:22,460
+Concretely, here's how you could go about doing that.
+具体地 这就是你能够做的
+
+157
+00:05:22,720 --> 00:05:23,500
+Let's say, I decide to run
+假如我决定运行
+
+158
+00:05:23,700 --> 00:05:24,800
+K-meanss a hundred times
+K均值方法一百次
+
+159
+00:05:25,160 --> 00:05:26,790
+so I'll execute this loop
+那么我就需要执行这个循环
+
+160
+00:05:27,060 --> 00:05:28,900
+a hundred times and it's
+100次
+
+161
+00:05:29,330 --> 00:05:30,830
+fairly typical a number of
+这是一个相当典型的次数数字
+
+162
+00:05:30,920 --> 00:05:31,910
+times when came to will be
+有时会是
+
+163
+00:05:32,160 --> 00:05:33,670
+something from 50 up to may be 1000.
+从50到1000之间
+
+164
+00:05:35,090 --> 00:05:36,730
+So, let's say you decide to say K-means one hundred times.
+假设说有决定运行K均值方法100次
+
+165
+00:05:38,220 --> 00:05:39,100
+So what that means is that
+那么这就意味这
+
+166
+00:05:39,170 --> 00:05:41,490
+we would randomnly initialize K-means.
+我们要随机初始化K均值方法
+
+167
+00:05:42,350 --> 00:05:43,250
+And for each of
+对于这些
+
+168
+00:05:43,340 --> 00:05:44,710
+these one hundred random intializations
+100次随机初始化的每一次
+
+169
+00:05:45,370 --> 00:05:47,040
+we would run K-means and
+我们需要运行K均值方法
+
+170
+00:05:47,220 --> 00:05:48,200
+that would give us a set
+我们会得到一系列
+
+171
+00:05:48,430 --> 00:05:50,270
+of clusteringings, and a set of cluster
+聚类结果 和一系列聚类
+
+172
+00:05:50,590 --> 00:05:51,940
+centroids, and then we
+中心 之后
+
+173
+00:05:52,040 --> 00:05:53,760
+would then compute the distortion J,
+我们可以计算失真函数J
+
+174
+00:05:54,500 --> 00:05:55,600
+that is compute this cause function on
+用我们得到的
+
+175
+00:05:56,910 --> 00:05:58,260
+the set of cluster assignments
+这些聚类结果
+
+176
+00:05:58,720 --> 00:05:59,910
+and cluster centroids that we got.
+和聚类中心来计算这样一个结果函数
+
+177
+00:06:01,000 --> 00:06:03,470
+Finally, having done this whole procedure a hundred times.
+最后 完成整个过程100次之后
+
+178
+00:06:04,450 --> 00:06:06,330
+You will have a hundred different ways
+你会得到这个100种
+
+179
+00:06:06,710 --> 00:06:08,990
+of clustering the data and then
+聚类数据的这些方法
+
+180
+00:06:09,240 --> 00:06:10,310
+finally what you do
+最后你要做的是
+
+181
+00:06:10,590 --> 00:06:11,510
+is all of these hundred
+在所有这100种
+
+182
+00:06:11,820 --> 00:06:13,210
+ways you have found of clustering the data,
+用于聚类的方法中
+
+183
+00:06:13,800 --> 00:06:16,050
+just pick one, that gives us the lowest cost.
+选取能够给我们代价最小的一个
+
+184
+00:06:16,400 --> 00:06:18,480
+That gives us the lowest distortion.
+给我们最低畸变值的一个
+
+185
+00:06:18,960 --> 00:06:20,610
+And it turns out that
+事实证明
+
+186
+00:06:21,170 --> 00:06:22,490
+if you are running K-means with
+如果你运行K均值方法时
+
+187
+00:06:22,670 --> 00:06:24,520
+a fairly small number of
+所用的聚类数相当小
+
+188
+00:06:24,630 --> 00:06:25,260
+clusters , so you know if the number
+那么如果聚类
+
+189
+00:06:25,520 --> 00:06:26,700
+of clusters is anywhere from
+数是从
+
+190
+00:06:26,760 --> 00:06:28,180
+two up to maybe 10 -
+2到10之间的任何数的话
+
+191
+00:06:28,980 --> 00:06:30,650
+then doing multiple random initializations
+做多次的随机初始化
+
+192
+00:06:31,460 --> 00:06:32,880
+can often, can sometimes make
+通常能够保证
+
+193
+00:06:32,990 --> 00:06:34,430
+sure that you find a better local optima.
+你能有一个较好的局部最优解
+
+194
+00:06:34,690 --> 00:06:37,680
+Make sure you find the better clustering data.
+保证你能找到更好的聚类数据
+
+195
+00:06:37,870 --> 00:06:38,930
+But if K is very large, so, if
+但是如果K非常大的话 如果
+
+196
+00:06:39,080 --> 00:06:40,000
+K is much greater than 10,
+K比10大很多
+
+197
+00:06:40,160 --> 00:06:41,010
+certainly if K were, you
+当然如果K是
+
+198
+00:06:41,080 --> 00:06:42,340
+know, if you were trying to
+如果你尝试去
+
+199
+00:06:42,400 --> 00:06:44,050
+find hundreds of clusters, then,
+找到成百上千个聚类 那么
+
+200
+00:06:45,840 --> 00:06:47,310
+having multiple random initializations is
+有多个随机初始化就
+
+201
+00:06:47,940 --> 00:06:49,220
+less likely to make a huge difference
+不太可能会有太大的影响
+
+202
+00:06:49,360 --> 00:06:50,400
+and there is a much
+更有
+
+203
+00:06:50,590 --> 00:06:51,910
+higher chance that your first
+可能你的第一次
+
+204
+00:06:52,320 --> 00:06:53,610
+random initialization will give
+随机初始化就会给
+
+205
+00:06:53,730 --> 00:06:55,380
+you a pretty decent solution already
+你相当好的结果
+
+206
+00:06:56,590 --> 00:06:58,070
+and doing, doing multiple random
+做多次随机
+
+207
+00:06:58,680 --> 00:07:00,060
+initializations will probably give
+初始化可能会给
+
+208
+00:07:00,260 --> 00:07:02,500
+you a slightly better solution but, but maybe not that much.
+你稍微好一点的结果 但是不会好太多
+
+209
+00:07:02,780 --> 00:07:04,230
+But it's really in the regime of where
+但是在这样一个
+
+210
+00:07:04,540 --> 00:07:05,810
+you have a relatively small number
+聚类数相对较小的体系里
+
+211
+00:07:06,090 --> 00:07:07,740
+of clusters, especially if you
+特别是如果你
+
+212
+00:07:08,040 --> 00:07:09,080
+have, maybe 2 or 3
+有2个或者3个
+
+213
+00:07:09,150 --> 00:07:10,550
+or 4 clusters that random
+或者4个聚类的话 随机
+
+214
+00:07:11,140 --> 00:07:13,790
+initialization could make a huge difference in terms of
+初始化会有较大的影响
+
+215
+00:07:14,190 --> 00:07:15,090
+making sure you do a good
+可以保证你在
+
+216
+00:07:15,170 --> 00:07:16,920
+job minimizing the distortion
+最小化失真函数的时候得到一个很小的值
+
+217
+00:07:17,560 --> 00:07:18,730
+function and giving you a good clustering.
+并且能得到一个很好的聚类结果
+
+218
+00:07:21,390 --> 00:07:22,560
+So, that's K-means
+这就是随机初始化
+
+219
+00:07:22,640 --> 00:07:23,300
+with random initialization.
+的K均值初始化方法
+
+220
+00:07:24,350 --> 00:07:25,570
+If you're trying to learn a
+如果你尝试学习一种
+
+221
+00:07:25,710 --> 00:07:26,950
+clustering with a relatively small
+聚类数目相对较小
+
+222
+00:07:27,310 --> 00:07:28,250
+number of clusters, 2, 3,
+的聚类方法 如2,3
+
+223
+00:07:28,400 --> 00:07:30,540
+4, 5, maybe, 6, 7, using
+4,5,6,7 用
+
+224
+00:07:31,660 --> 00:07:34,040
+multiple random initializations can
+多次随机初始化
+
+225
+00:07:34,380 --> 00:07:36,830
+sometimes, help you find much better clustering of the data.
+有时能够帮助你找到更好的数据聚类结果
+
+226
+00:07:37,680 --> 00:07:39,650
+But, even if you are learning a large number of clusters, the initialization, the random
+但是 尽管你有很多聚类数目 初始化
+
+227
+00:07:40,350 --> 00:07:43,280
+initialization method that I describe here.
+我在这里介绍的随机初始化
+
+228
+00:07:43,520 --> 00:07:45,110
+That should give K-means a
+它会给K均值方法一个
+
+229
+00:07:45,370 --> 00:07:46,680
+reasonable starting point to start
+合理的起始点来开始
+
+230
+00:07:47,030 --> 00:07:48,580
+from for finding a good set of clusters.
+并找到一个好的聚类结果 【教育无边界字幕组】翻译: 星星之火 校对/审核:所罗门捷列夫
+
diff --git a/srt/13 - 5 - Choosing the Number of Clusters (8 min).srt b/srt/13 - 5 - Choosing the Number of Clusters (8 min).srt
new file mode 100644
index 00000000..36f14540
--- /dev/null
+++ b/srt/13 - 5 - Choosing the Number of Clusters (8 min).srt
@@ -0,0 +1,1306 @@
+1
+00:00:00,200 --> 00:00:01,390
+In this video I'd like to
+在这个视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,570 --> 00:00:02,780
+talk about one last detail
+详细讨论一下
+
+3
+00:00:03,350 --> 00:00:04,950
+of K-means clustering which is
+K均值方法聚类中
+
+4
+00:00:05,450 --> 00:00:06,680
+how to choose the number of
+如何去选择聚类
+
+5
+00:00:06,770 --> 00:00:07,890
+clusters, or how to choose
+类别数目 或者说是如何去选择
+
+6
+00:00:08,290 --> 00:00:09,160
+the value of the parameter
+参数K的值
+
+7
+00:00:10,230 --> 00:00:12,310
+capsule K. To be
+说实话
+
+8
+00:00:12,390 --> 00:00:13,690
+honest, there actually isn't a
+其实上没有一个
+
+9
+00:00:13,760 --> 00:00:15,420
+great way of answering this
+非常好的方法回答这个问题
+
+10
+00:00:15,680 --> 00:00:17,150
+or doing this automatically and
+或者能够自动做这件事情
+
+11
+00:00:17,820 --> 00:00:18,930
+by far the most common way
+到目前为止 用来决定聚类
+
+12
+00:00:19,110 --> 00:00:20,380
+of choosing the number of clusters,
+数目最常用的方法
+
+13
+00:00:20,520 --> 00:00:22,040
+is still choosing it manually
+任然是通过
+
+14
+00:00:22,710 --> 00:00:24,380
+by looking at visualizations or by
+看可视化的图或者通过
+
+15
+00:00:24,450 --> 00:00:26,070
+looking at the output of the clustering algorithm or something else.
+查看聚类算法的输出结果或者其他一些东西来手动地决定聚类的类别数量
+
+16
+00:00:27,340 --> 00:00:28,270
+But I do get asked
+但是 我也确实经常被问及
+
+17
+00:00:28,600 --> 00:00:29,460
+this question quite a lot of
+这样的问题
+
+18
+00:00:29,650 --> 00:00:30,510
+how do you choose the number of
+你是如何来选择聚类的数量的
+
+19
+00:00:30,810 --> 00:00:31,930
+clusters, and so I just want
+我只是想
+
+20
+00:00:32,240 --> 00:00:33,650
+to tell you know what
+告诉你
+
+21
+00:00:33,850 --> 00:00:35,020
+are peoples' current thinking on
+现在人们所思考的
+
+22
+00:00:35,230 --> 00:00:36,480
+it although, the most
+最为常见的
+
+23
+00:00:36,740 --> 00:00:38,060
+common thing is actually to
+一件事实际上是
+
+24
+00:00:38,180 --> 00:00:40,130
+choose the number of clusters by hand.
+手动去选择聚类的数目
+
+25
+00:00:42,230 --> 00:00:43,680
+A large part of
+其中一大部分
+
+26
+00:00:43,800 --> 00:00:45,020
+why it might not always
+为什么选择聚类
+
+27
+00:00:45,390 --> 00:00:46,530
+be easy to choose the
+数量并不容易
+
+28
+00:00:46,640 --> 00:00:47,940
+number of clusters is that
+的原因是
+
+29
+00:00:48,190 --> 00:00:51,920
+it is often generally ambiguous how many clusters there are in the data.
+通常在数据集中有多少个聚类是不清楚的
+
+30
+00:00:52,940 --> 00:00:53,890
+Looking at this data set
+看这样一个数据集
+
+31
+00:00:54,080 --> 00:00:55,110
+some of you may see
+有些人可能会看到
+
+32
+00:00:55,380 --> 00:00:56,830
+four clusters and that
+四个聚类
+
+33
+00:00:57,020 --> 00:00:59,440
+would suggest using K equals 4.
+那么这就意味着需要使用K=4
+
+34
+00:00:59,620 --> 00:01:00,650
+Or some of you may
+或者有些人可能
+
+35
+00:01:00,870 --> 00:01:02,620
+see two clusters and
+会看到两个聚类
+
+36
+00:01:02,730 --> 00:01:04,460
+that will suggest K equals
+这个条件下就意味着K等于
+
+37
+00:01:04,870 --> 00:01:06,630
+2 and now this may see three clusters.
+2 现在这有可能是3个聚类
+
+38
+00:01:08,070 --> 00:01:09,710
+And so, looking at the
+那么
+
+39
+00:01:09,820 --> 00:01:10,750
+data set like this, the
+相类似于这样的数据集
+
+40
+00:01:10,920 --> 00:01:12,390
+true number of clusters, it actually
+其真实的类别数对我来说实际上
+
+41
+00:01:12,810 --> 00:01:14,560
+seems genuinely ambiguous to me,
+相当地模棱两可的
+
+42
+00:01:14,690 --> 00:01:17,160
+and I don't think there is one right answer.
+我并不认为只有一个正确的答案
+
+43
+00:01:18,100 --> 00:01:19,500
+And this is part of our supervised learning.
+这就是我们监督学习的一部分
+
+44
+00:01:20,250 --> 00:01:21,450
+We are aren't given labels, and
+我们不知道分类
+
+45
+00:01:21,550 --> 00:01:23,950
+so there isn't always a clear cut answer.
+因此总是没有一个清晰的答案
+
+46
+00:01:24,830 --> 00:01:25,730
+And this is one of the
+这是其中的一个
+
+47
+00:01:25,850 --> 00:01:26,710
+things that makes it more difficult
+使得(决定聚类数目)
+
+48
+00:01:27,340 --> 00:01:28,530
+to say, have an automatic
+拥有一个自动化
+
+49
+00:01:29,160 --> 00:01:30,860
+algorithm for choosing how many clusters to have.
+算法来决定聚类数目变得困难的原因
+
+50
+00:01:32,100 --> 00:01:33,250
+When people talk about ways of
+当人们在讨论
+
+51
+00:01:33,320 --> 00:01:34,270
+choosing the number of clusters,
+选择聚类数目的方法时
+
+52
+00:01:34,840 --> 00:01:36,050
+one method that people sometimes
+有一个可能会
+
+53
+00:01:36,440 --> 00:01:39,150
+talk about is something called the Elbow Method.
+谈及的方法叫作“肘部法则”
+
+54
+00:01:39,630 --> 00:01:40,490
+Let me just tell you a little bit about that,
+让我告诉你一些关于这个法则的内容
+
+55
+00:01:40,800 --> 00:01:43,760
+and then mention some of its advantages but also shortcomings.
+之后会提及到它的一些优点和缺点
+
+56
+00:01:44,690 --> 00:01:45,980
+So the Elbow Method,
+关于“肘部法则”
+
+57
+00:01:46,420 --> 00:01:47,570
+what we're going to do is vary
+我们所需要做的是改变
+
+58
+00:01:48,340 --> 00:01:49,860
+K, which is the total number of clusters.
+K值 也就是聚类类别数目的总数
+
+59
+00:01:50,250 --> 00:01:51,570
+So, we're going to run K-means
+我们用一个聚类来运行K均值聚类方法
+
+60
+00:01:52,050 --> 00:01:53,340
+with one cluster, that means really,
+这就意味着
+
+61
+00:01:53,630 --> 00:01:54,840
+everything gets grouped into a
+所有的数据都会分到一个
+
+62
+00:01:54,980 --> 00:01:56,530
+single cluster and compute the
+聚类里 然后计算
+
+63
+00:01:56,660 --> 00:01:57,850
+cost function or compute the distortion
+成本函数或者计算畸变函数
+
+64
+00:01:58,460 --> 00:01:59,490
+J and plot that here.
+J 并将其画在这儿
+
+65
+00:02:00,410 --> 00:02:01,090
+And then we're going to run K
+之后我们又用两个聚类来运行K
+
+66
+00:02:01,310 --> 00:02:03,270
+means with two clusters, maybe
+均值聚类 也许
+
+67
+00:02:03,610 --> 00:02:05,430
+with multiple random initial agents, maybe not.
+有多个随机的初始中心 也许没有
+
+68
+00:02:06,140 --> 00:02:07,150
+But then, you know,
+但是之后 你知道的
+
+69
+00:02:07,280 --> 00:02:08,280
+with two clusters we should
+有两个聚类 我们
+
+70
+00:02:08,500 --> 00:02:09,510
+get, hopefully, a smaller distortion,
+所期望得到的是一个较小的畸变值
+
+71
+00:02:10,710 --> 00:02:11,820
+and so plot that there.
+把它画在这儿
+
+72
+00:02:11,950 --> 00:02:13,100
+And then run K-means with three
+之后用选用3个聚类来运行K均值聚类
+
+73
+00:02:13,310 --> 00:02:14,590
+clusters, hopefully, you get even
+你期望得到一个更
+
+74
+00:02:14,760 --> 00:02:16,680
+smaller distortion and plot that there.
+小的畸变值 并把它画在这儿
+
+75
+00:02:16,990 --> 00:02:19,710
+I'm gonna run K-means with four, five and so on.
+之后再用4,5等聚类数来运行均值聚类
+
+76
+00:02:19,780 --> 00:02:20,790
+And so we end up with
+最后我们就能得到
+
+77
+00:02:20,970 --> 00:02:22,840
+a curve showing how the
+一条曲线显示
+
+78
+00:02:23,240 --> 00:02:24,560
+distortion, you know, goes
+随着我们的聚类数量的增多
+
+79
+00:02:24,800 --> 00:02:27,170
+down as we increase the number of clusters.
+畸变值是如何下降的
+
+80
+00:02:27,440 --> 00:02:29,870
+And so we get a curve that maybe looks like this.
+我们可能会得到一条类似于这样的曲线
+
+81
+00:02:31,390 --> 00:02:32,210
+And if you look at this
+看看这条
+
+82
+00:02:32,300 --> 00:02:33,400
+curve, what the Elbow Method does
+曲线 这就是“肘部法则”所做的
+
+83
+00:02:33,720 --> 00:02:35,770
+it says "Well, let's look at this plot.
+让我们来看这样一个图
+
+84
+00:02:36,450 --> 00:02:39,340
+Looks like there's a clear elbow there".
+看起来就好像有一个很清楚的肘在那儿
+
+85
+00:02:40,230 --> 00:02:41,620
+Right, this is, would be by
+对吧 这就
+
+86
+00:02:41,830 --> 00:02:43,210
+analogy to the human arm where,
+类比于人的手臂
+
+87
+00:02:43,550 --> 00:02:44,620
+you know, if you imagine that
+想象一下
+
+88
+00:02:45,370 --> 00:02:46,460
+you reach out your arm,
+如果你伸出你的胳膊
+
+89
+00:02:47,240 --> 00:02:48,940
+then, this is your
+那么这就是你的
+
+90
+00:02:49,160 --> 00:02:50,340
+shoulder joint, this is your
+肩关节 这就是你的
+
+91
+00:02:50,550 --> 00:02:52,960
+elbow joint and I guess, your hand is at the end over here.
+肘关节 我猜你的手就在这里的终端
+
+92
+00:02:53,260 --> 00:02:54,170
+And so this is the Elbow Method.
+这就是“肘部法则”
+
+93
+00:02:54,490 --> 00:02:55,930
+Then you find this sort of pattern
+你会发现这种模式
+
+94
+00:02:56,250 --> 00:02:57,630
+where the distortion goes down rapidly
+它的畸变值会迅速下降
+
+95
+00:02:58,550 --> 00:02:59,120
+from 1 to 2, and 2 to
+从1到2 从2到
+
+96
+00:02:59,280 --> 00:03:01,330
+3, and then you reach an
+3 之后你会
+
+97
+00:03:01,520 --> 00:03:03,160
+elbow at 3, and then
+在3的时候达到一个肘点 在此之后
+
+98
+00:03:03,330 --> 00:03:05,260
+the distortion goes down very slowly after that.
+畸变值就下降的非常慢
+
+99
+00:03:05,430 --> 00:03:06,520
+And then it looks like, you
+看起来就像
+
+100
+00:03:06,580 --> 00:03:08,700
+know what, maybe using three
+使用3个聚类来
+
+101
+00:03:08,960 --> 00:03:09,920
+clusters is the right
+进行聚类是正确的
+
+102
+00:03:10,040 --> 00:03:11,340
+number of clusters, because that's
+这是因为那个点
+
+103
+00:03:12,020 --> 00:03:14,430
+the elbow of this curve, right?
+是曲线的肘点
+
+104
+00:03:14,700 --> 00:03:16,040
+That it goes down, distortion goes
+畸变值下降
+
+105
+00:03:16,250 --> 00:03:17,290
+down rapidly until K equals
+得很快知道K等于
+
+106
+00:03:17,610 --> 00:03:19,700
+3, really goes down very slowly after that.
+3 之后就下降得很慢
+
+107
+00:03:19,820 --> 00:03:20,850
+So let's pick K equals 3.
+那么我们就选K等于3
+
+108
+00:03:23,460 --> 00:03:24,570
+If you apply the Elbow Method,
+当你应用“肘部法则” 的时候
+
+109
+00:03:25,110 --> 00:03:26,240
+and if you get a plot
+如果你得到了一个
+
+110
+00:03:26,540 --> 00:03:27,450
+that actually looks like this,
+像这样的图
+
+111
+00:03:27,890 --> 00:03:29,120
+then, that's pretty good, and
+那么这非常好
+
+112
+00:03:29,240 --> 00:03:30,160
+this would be a reasonable way
+这将是一种用来
+
+113
+00:03:30,700 --> 00:03:32,590
+of choosing the number of clusters.
+选择聚类个数的合理方法
+
+114
+00:03:33,620 --> 00:03:34,600
+It turns out the Elbow Method
+而事实证明“肘部法则”
+
+115
+00:03:35,040 --> 00:03:37,170
+isn't used that often, and one
+并不那么常用 其中一个
+
+116
+00:03:37,340 --> 00:03:38,270
+reason is that, if you
+原因是如果你
+
+117
+00:03:38,350 --> 00:03:39,470
+actually use this on
+把这种方法用到
+
+118
+00:03:39,720 --> 00:03:41,060
+a clustering problem, it turns out that
+一个聚类问题上 事实证明
+
+119
+00:03:41,210 --> 00:03:42,640
+fairly often, you know,
+这种想象是相当常见的
+
+120
+00:03:42,740 --> 00:03:43,610
+you end up with a curve
+你最后得到了一条
+
+121
+00:03:43,910 --> 00:03:46,940
+that looks much more ambiguous, maybe something like this.
+看上去相当模棱两可的曲线 也许就像这样
+
+122
+00:03:47,700 --> 00:03:48,370
+And if you look at this,
+请看这个
+
+123
+00:03:48,920 --> 00:03:50,220
+I don't know, maybe there's
+我不知道 也许没有
+
+124
+00:03:50,390 --> 00:03:51,580
+no clear elbow, but it
+一个清晰的肘点 但是
+
+125
+00:03:51,720 --> 00:03:53,090
+looks like distortion continuously goes down,
+看上去畸变值是连续下降的
+
+126
+00:03:53,440 --> 00:03:54,570
+maybe 3 is a
+也许3是比较
+
+127
+00:03:54,620 --> 00:03:55,680
+good number, maybe 4 is
+好的一个数字 也许4是
+
+128
+00:03:55,750 --> 00:03:58,180
+a good number, maybe 5 is also not bad.
+一个比较好的数字 也许5也并不糟糕
+
+129
+00:03:58,390 --> 00:03:59,190
+And so, if you actually
+那么如果你在
+
+130
+00:03:59,600 --> 00:04:00,710
+do this in a practice, you know,
+实际操作中做这样一个事情的话
+
+131
+00:04:00,820 --> 00:04:02,690
+if your plot looks like the one on the left and that's great.
+如果你的图像左边这个的话 那么就太好了
+
+132
+00:04:03,400 --> 00:04:04,990
+It gives you a clear answer, but
+它会给你一个清晰的答案 但是
+
+133
+00:04:05,490 --> 00:04:06,550
+just as often, you end
+通常 你最终
+
+134
+00:04:06,740 --> 00:04:07,580
+up with a plot that looks
+得到的图是像
+
+135
+00:04:07,750 --> 00:04:09,020
+like the one on the right and
+右边的那个
+
+136
+00:04:09,110 --> 00:04:11,000
+is not clear where the
+并不能清晰指定
+
+137
+00:04:11,790 --> 00:04:13,230
+ready location of the elbow
+肘点合适的位置
+
+138
+00:04:13,490 --> 00:04:14,440
+is. It makes it harder to
+这使得
+
+139
+00:04:14,640 --> 00:04:16,700
+choose a number of clusters using this method.
+用这个方法来选择聚类数目变得较为困难
+
+140
+00:04:17,370 --> 00:04:18,220
+So maybe the quick summary
+对于“肘部法则”快速的小结
+
+141
+00:04:18,700 --> 00:04:20,500
+of the Elbow Method is that is worth the shot
+就是它使一个值得尝试的方法
+
+142
+00:04:21,010 --> 00:04:22,350
+but I wouldn't necessarily,
+但是我不会必然地
+
+143
+00:04:23,610 --> 00:04:24,700
+you know, have a very high
+对它有很高的
+
+144
+00:04:24,870 --> 00:04:27,360
+expectation of it working for any particular problem.
+期望来解决任何一个特定的问题
+
+145
+00:04:29,880 --> 00:04:31,030
+Finally, here's one other way
+最后 有另外一种方法
+
+146
+00:04:31,300 --> 00:04:32,850
+of how, thinking about how
+引导你如何
+
+147
+00:04:32,990 --> 00:04:33,980
+you choose the value of K,
+选择K值
+
+148
+00:04:34,930 --> 00:04:36,030
+very often people are running
+通常人们运行
+
+149
+00:04:36,310 --> 00:04:37,380
+K-means in order you
+K均值聚类方法是为了
+
+150
+00:04:37,530 --> 00:04:38,770
+get clusters for some later
+得到一些聚类用于后面的
+
+151
+00:04:39,240 --> 00:04:40,880
+purpose, or for some sort of downstream purpose.
+一些用途 或者是一些下游的目的
+
+152
+00:04:41,460 --> 00:04:42,900
+Maybe you want to use K-means
+也许你会用K均值聚类方法
+
+153
+00:04:43,380 --> 00:04:44,460
+in order to do market segmentation,
+来做市场分割
+
+154
+00:04:45,310 --> 00:04:47,600
+like in the T-shirt sizing example that we talked about.
+如我们之前谈论的T恤尺寸的例子
+
+155
+00:04:48,140 --> 00:04:50,570
+Maybe you want K-means to organize
+也许你会用K均值聚类来使得
+
+156
+00:04:51,130 --> 00:04:52,300
+a computer cluster better, or
+电脑的聚类变得更好 或者
+
+157
+00:04:52,480 --> 00:04:53,430
+maybe a learning cluster for some
+也有可能是用于某种不同目的一个
+
+158
+00:04:53,630 --> 00:04:55,070
+different purpose, and so,
+学习聚类 等等
+
+159
+00:04:55,450 --> 00:04:57,020
+if that later, downstream purpose,
+如果是后续下游的目的
+
+160
+00:04:57,510 --> 00:04:59,050
+such as market segmentation. If
+如市场分割 如果
+
+161
+00:04:59,180 --> 00:05:00,420
+that gives you an evaluation metric,
+那能给你一个评估标准
+
+162
+00:05:01,310 --> 00:05:02,670
+then often, a better
+那么通常 更好
+
+163
+00:05:02,800 --> 00:05:03,890
+way to determine the number of
+的方式是决定聚类的数量
+
+164
+00:05:03,960 --> 00:05:05,680
+clusters, is to see
+来看
+
+165
+00:05:06,010 --> 00:05:07,740
+how well different numbers of
+不同的聚类数值能
+
+166
+00:05:07,930 --> 00:05:10,140
+clusters serve that later downstream purpose.
+为后续下游的目的提供多好的结果
+
+167
+00:05:11,230 --> 00:05:13,050
+Let me step through a specific example.
+让我们来涉足一个特别的例子
+
+168
+00:05:14,190 --> 00:05:15,080
+Let me go through the T-shirt
+让我重新举例T恤尺寸
+
+169
+00:05:15,440 --> 00:05:17,420
+size example again, and I'm
+的例子 我
+
+170
+00:05:17,570 --> 00:05:19,700
+trying to decide, do I want three T-shirt sizes?
+尝试决定我是否需要3种T恤尺寸
+
+171
+00:05:20,330 --> 00:05:22,320
+So, I choose K equals 3, then
+因此我选择K等于3 那么
+
+172
+00:05:22,560 --> 00:05:25,360
+I might have small, medium and large T-shirts.
+可能会有小号 中号 大号三类T恤
+
+173
+00:05:26,320 --> 00:05:27,250
+Or maybe, I want to choose
+或者我可以选择
+
+174
+00:05:27,470 --> 00:05:28,240
+K equals 5, and then I
+K等于5 那么我就
+
+175
+00:05:29,030 --> 00:05:30,140
+might have, you know, extra
+可能会有 特小
+
+176
+00:05:30,390 --> 00:05:33,130
+small, small, medium, large
+号 小号 中号 大号
+
+177
+00:05:33,620 --> 00:05:35,070
+and extra large T-shirt sizes.
+和特大号尺寸的T恤
+
+178
+00:05:35,860 --> 00:05:38,580
+So, you can have like 3 T-shirt sizes or four or five T-shirt sizes.
+所以 你可能有3种T恤尺寸或者4种或者5种
+
+179
+00:05:39,270 --> 00:05:40,100
+We could also have four T-shirt
+我们也可以有四种T恤
+
+180
+00:05:40,430 --> 00:05:41,740
+sizes, but I'm just
+尺寸,但是我只是
+
+181
+00:05:41,930 --> 00:05:43,240
+showing three and five here,
+在这里展示了3和5这两种情况
+
+182
+00:05:43,490 --> 00:05:45,670
+just to simplify this slide for now.
+只是为了是这张幻灯片变得简洁一些
+
+183
+00:05:46,930 --> 00:05:49,020
+So, if I run K-means with
+因此如果我用K等于
+
+184
+00:05:49,130 --> 00:05:50,290
+K equals 3, maybe I end
+3来运行K均值方法 最后我可能会得到
+
+185
+00:05:50,670 --> 00:05:51,860
+up with, that's my small
+这是小号
+
+186
+00:05:53,100 --> 00:05:55,020
+and that's my
+这是
+
+187
+00:05:55,140 --> 00:05:56,720
+medium and that's my large.
+中号 这是大号
+
+188
+00:05:58,580 --> 00:06:00,370
+Whereas, if I run K-means with
+然而 如果我用
+
+189
+00:06:00,650 --> 00:06:03,540
+5 clusters, maybe I
+5个聚类来运行K均值方法 也许我
+
+190
+00:06:03,700 --> 00:06:05,170
+end up with, those are
+最后会得到 这些是
+
+191
+00:06:05,330 --> 00:06:07,400
+my extra small T-shirts, these
+超小号T恤 这些
+
+192
+00:06:07,740 --> 00:06:10,920
+are my small, these are
+是小号 这些是
+
+193
+00:06:11,050 --> 00:06:13,740
+my medium, these are my
+中号 这些是
+
+194
+00:06:13,990 --> 00:06:17,110
+large and these are my extra large.
+大号 这些是超大号
+
+195
+00:06:19,320 --> 00:06:20,150
+And the nice thing about this
+这个例子的一个亮点
+
+196
+00:06:20,320 --> 00:06:21,510
+example is that, this then
+是这之后
+
+197
+00:06:21,810 --> 00:06:22,940
+maybe gives us another way
+可能会给我们一种方法
+
+198
+00:06:23,550 --> 00:06:24,730
+to choose whether we want
+来选择究竟我们想要的
+
+199
+00:06:24,970 --> 00:06:26,070
+3 or 4 or 5 clusters,
+聚类数目是3或4 还是5
+
+200
+00:06:28,570 --> 00:06:29,630
+and in particular, what you can
+特别的 你所能
+
+201
+00:06:29,730 --> 00:06:30,630
+do is, you know, think
+做的是去从
+
+202
+00:06:30,810 --> 00:06:31,770
+about this from the perspective
+T恤商业的角度
+
+203
+00:06:32,380 --> 00:06:33,810
+of the T-shirt business and
+去思考 并且
+
+204
+00:06:34,320 --> 00:06:35,150
+ask: "Well if I have
+提出问题 “如果我有
+
+205
+00:06:35,620 --> 00:06:37,190
+five segments, then how well
+5个分段 那么
+
+206
+00:06:38,060 --> 00:06:39,610
+will my T-shirts fit my
+我的T恤将如何很好地区满足
+
+207
+00:06:39,780 --> 00:06:42,100
+customers and so, how many T-shirts can I sell?
+我的顾客呢?我可以卖出多少T恤?
+
+208
+00:06:42,420 --> 00:06:44,390
+How happy will my customers be?"
+我的顾客将会有多高兴呢?”
+
+209
+00:06:44,550 --> 00:06:45,920
+What really makes sense, from the
+其中正真有意义的是 从
+
+210
+00:06:46,080 --> 00:06:47,530
+perspective of the T-shirt business,
+T恤的商业角度去考虑
+
+211
+00:06:47,590 --> 00:06:49,390
+in terms of whether, I
+也就是我是否需要
+
+212
+00:06:49,520 --> 00:06:51,480
+want to have Goer T-shirt sizes
+更多的T恤尺寸
+
+213
+00:06:51,990 --> 00:06:54,040
+so that my T-shirts fit my customers better.
+来更好地满足我的顾客
+
+214
+00:06:54,970 --> 00:06:56,130
+Or do I want to have fewer
+或者我是否想要更少的
+
+215
+00:06:56,360 --> 00:06:57,570
+T-shirt sizes so that
+T恤尺码以便
+
+216
+00:06:58,410 --> 00:07:00,220
+I make fewer sizes of T-shirts.
+我制造更少尺码的T恤
+
+217
+00:07:00,610 --> 00:07:02,290
+And I can sell them to the customers more cheaply.
+且我可以将它们卖得更加便宜一些
+
+218
+00:07:02,840 --> 00:07:04,700
+And so, the t-shirt selling
+因此T恤销售的
+
+219
+00:07:05,040 --> 00:07:06,150
+business, that might give you
+商业可能会给你
+
+220
+00:07:06,660 --> 00:07:09,260
+a way to decide, between three clusters versus five clusters.
+一种方法来决定究竟是采用3还是5
+
+221
+00:07:10,780 --> 00:07:12,000
+So, that gives you an
+这就是给你的
+
+222
+00:07:12,480 --> 00:07:13,880
+example of how a
+一个例子 一个
+
+223
+00:07:14,130 --> 00:07:15,810
+later downstream purpose like
+后续的下游目的 如
+
+224
+00:07:16,010 --> 00:07:17,260
+the problem of deciding what
+决定
+
+225
+00:07:17,390 --> 00:07:19,230
+T-shirts to manufacture, how that
+生产什么样的T恤 来
+
+226
+00:07:19,380 --> 00:07:21,980
+can give you an evaluation metric for choosing the number of clusters.
+给你一个评价标准来选择聚类数量
+
+227
+00:07:22,900 --> 00:07:23,800
+For those of you that are
+对于你们正在
+
+228
+00:07:23,880 --> 00:07:25,490
+doing the program exercises, if
+编程练习的同学来说 如果
+
+229
+00:07:25,670 --> 00:07:27,070
+you look at this week's
+你去看一下这周的
+
+230
+00:07:27,290 --> 00:07:29,540
+program exercise associative K-means, that's
+K均值方法相关的编程练习 是
+
+231
+00:07:29,790 --> 00:07:32,000
+an example there of using K-means for image compression.
+一个将K均值用于图片压缩的例子
+
+232
+00:07:32,910 --> 00:07:33,960
+And so if you were trying to
+如果你尝试
+
+233
+00:07:34,070 --> 00:07:35,170
+choose how many clusters
+选择多少个聚类
+
+234
+00:07:35,410 --> 00:07:36,950
+to use for that problem, you could
+来解决这个问题的话 你也可以
+
+235
+00:07:37,260 --> 00:07:38,550
+also, again use the
+再一次用到
+
+236
+00:07:39,030 --> 00:07:40,330
+evaluation metric of image compression
+图片压缩的评估标准
+
+237
+00:07:40,890 --> 00:07:42,470
+to choose the number of clusters, K?
+来选择聚类数目K
+
+238
+00:07:43,130 --> 00:07:43,870
+So, how good do you want the
+你想要图片
+
+239
+00:07:44,000 --> 00:07:45,430
+image to look versus, how much
+看起来有多好 还是
+
+240
+00:07:45,680 --> 00:07:46,680
+do you want to compress the file
+你想要压缩图片大小
+
+241
+00:07:46,970 --> 00:07:48,390
+size of the image, and,
+的多少
+
+242
+00:07:48,610 --> 00:07:49,830
+you know, if you do the
+如果你做
+
+243
+00:07:50,050 --> 00:07:50,980
+programming exercise, what I've just
+编程练习 我刚刚
+
+244
+00:07:51,160 --> 00:07:52,480
+said will make more sense at that time.
+所说的可以在那时候起到更好的作用
+
+245
+00:07:53,760 --> 00:07:56,500
+So, just summarize, for the
+总结一下 对于
+
+246
+00:07:56,590 --> 00:07:57,800
+most part, the number of
+大部分时候
+
+247
+00:07:58,030 --> 00:07:59,560
+customers K is still chosen
+聚类数目仍然是通过
+
+248
+00:08:00,150 --> 00:08:01,900
+by hand by human input or human insight.
+手动 人工输入或我们的经验来决定
+
+249
+00:08:02,800 --> 00:08:03,810
+One way to try to
+一种可以尝试的
+
+250
+00:08:03,950 --> 00:08:05,010
+do so is to use
+方法是使用
+
+251
+00:08:05,170 --> 00:08:06,360
+the Elbow Method, but I
+“肘部原则” 但是我
+
+252
+00:08:06,520 --> 00:08:07,660
+wouldn't always expect that to
+并不总是期望它能
+
+253
+00:08:07,760 --> 00:08:08,620
+work well, but I think
+很有效果 但是我认为
+
+254
+00:08:08,820 --> 00:08:09,730
+the better way to think about
+更好的方法是思考
+
+255
+00:08:09,970 --> 00:08:10,800
+how to choose the number of
+如何去选择聚类
+
+256
+00:08:10,920 --> 00:08:12,310
+clusters is to ask, for
+基于
+
+257
+00:08:12,520 --> 00:08:13,890
+what purpose are you running K-means?
+运行K均值聚类的目的来决定
+
+258
+00:08:15,490 --> 00:08:16,610
+And then to think, what is
+然后想一想
+
+259
+00:08:16,830 --> 00:08:18,210
+the number of clusters K that
+聚类的数目所能
+
+260
+00:08:18,350 --> 00:08:19,490
+serves that, you know, whatever
+提供的东西
+
+261
+00:08:19,670 --> 00:08:21,710
+later purpose that you actually run the K-means for.
+无论你后续运行K均值聚类的目的是什么
+
diff --git a/srt/14 - 1 - Motivation I_ Data Compression (10 min).srt b/srt/14 - 1 - Motivation I_ Data Compression (10 min).srt
new file mode 100644
index 00000000..1164b3b4
--- /dev/null
+++ b/srt/14 - 1 - Motivation I_ Data Compression (10 min).srt
@@ -0,0 +1,1484 @@
+1
+00:00:00,090 --> 00:00:01,330
+In this video, I'd like to
+这个视频,我想(字幕翻译:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,500 --> 00:00:02,560
+start talking about a second
+开始谈论第二
+
+3
+00:00:03,030 --> 00:00:04,620
+type of unsupervised learning problem
+类型的无监督学习问题
+
+4
+00:00:04,950 --> 00:00:06,320
+called dimensionality reduction.
+称为降维。
+
+5
+00:00:07,600 --> 00:00:08,460
+There are a couple of different
+有几个不同的
+
+6
+00:00:08,660 --> 00:00:09,710
+reasons why one might
+的原因使你可能
+
+7
+00:00:09,890 --> 00:00:11,270
+want to do dimensionality reduction.
+想要做降维。
+
+8
+00:00:12,220 --> 00:00:14,420
+One is data compression, and as
+一是数据压缩,
+
+9
+00:00:14,600 --> 00:00:15,860
+we'll see later, a few videos
+后面我们会看到,一些视频
+
+10
+00:00:16,570 --> 00:00:18,200
+later, data compression not only
+后,数据压缩不仅
+
+11
+00:00:18,490 --> 00:00:19,660
+allows us to compress the
+允许我们压缩
+
+12
+00:00:19,970 --> 00:00:20,940
+data and have it therefore
+数据,因此
+
+13
+00:00:21,330 --> 00:00:22,670
+use up less computer memory
+使用较少的计算机内存
+
+14
+00:00:23,050 --> 00:00:24,410
+or disk space, but it will
+或磁盘空间,但它
+
+15
+00:00:24,730 --> 00:00:26,960
+also allow us to speed up our learning algorithms.
+也让我们加快我们的学习算法。
+
+16
+00:00:27,980 --> 00:00:29,490
+But first, let's start by
+但首先,让我们开始
+
+17
+00:00:29,620 --> 00:00:31,840
+talking about what is dimensionality reduction.
+谈论降维是什么。
+
+18
+00:00:33,490 --> 00:00:35,800
+As a motivating example, let's say
+作为一种生动的例子,让我们说
+
+19
+00:00:35,990 --> 00:00:37,440
+that we've collected a data set
+我们收集的数据集
+
+20
+00:00:37,680 --> 00:00:38,700
+with many, many, many features,
+带许多,许多,许多特征,
+
+21
+00:00:39,290 --> 00:00:40,600
+and I've plotted just two of them here.
+我绘制两个在这里。
+
+22
+00:00:41,600 --> 00:00:42,770
+And let's say that unknown to
+和假设未知
+
+23
+00:00:42,890 --> 00:00:44,000
+us two of the
+我们两个的
+
+24
+00:00:44,070 --> 00:00:45,730
+features were actually the length
+特征实际上是长度
+
+25
+00:00:45,860 --> 00:00:47,150
+of something in centimeters, and
+用厘米表示
+
+26
+00:00:47,550 --> 00:00:48,920
+a different feature, x2, is
+的不同特征,X2,是
+
+27
+00:00:49,460 --> 00:00:51,150
+the length of the same thing in inches.
+用英寸的同一物体的长度。
+
+28
+00:00:52,250 --> 00:00:53,030
+So, this gives us a highly
+所以,这给了我们高度
+
+29
+00:00:53,500 --> 00:00:55,910
+redundant representation and maybe
+冗余表示,也许
+
+30
+00:00:56,170 --> 00:00:57,920
+instead of having two separate features x1
+不是两个分开的特征x1
+
+31
+00:00:58,430 --> 00:00:58,820
+then x2,
+和X2,
+
+32
+00:00:59,090 --> 00:01:00,240
+both of which basically measure the
+这两个基本的长度度量,
+
+33
+00:01:00,370 --> 00:01:01,490
+length, maybe what we
+也许我们
+
+34
+00:01:01,640 --> 00:01:03,340
+want to do is reduce the
+想要做的是减少
+
+35
+00:01:03,430 --> 00:01:06,800
+data to one-dimensional and
+数据到一维
+
+36
+00:01:06,920 --> 00:01:08,840
+just have one number measuring this length.
+只有一个数测量这个长度。
+
+37
+00:01:09,620 --> 00:01:11,080
+In case this example seems a
+这个例子似乎
+
+38
+00:01:11,150 --> 00:01:13,920
+bit contrived, this centimeter and
+有点做作,这里厘米
+
+39
+00:01:14,030 --> 00:01:15,850
+inches example is actually not that
+英寸的例子实际上不是那么
+
+40
+00:01:16,220 --> 00:01:17,140
+unrealistic, and not that different
+不切实际的,两者并没有什么不同
+
+41
+00:01:17,510 --> 00:01:18,870
+from things that I see happening in industry.
+从这件事情我看到的东西发生在工业上的事。
+
+42
+00:01:19,970 --> 00:01:21,320
+If you have hundreds
+如果你有几百个
+
+43
+00:01:21,790 --> 00:01:23,160
+or thousands of features, it is
+或成千上万的特点,它是
+
+44
+00:01:23,240 --> 00:01:24,450
+often this easy to
+它这往往容易
+
+45
+00:01:24,680 --> 00:01:26,580
+lose track of exactly what features you have.
+失去你需要的特征。
+
+46
+00:01:26,930 --> 00:01:28,190
+And sometimes may have
+有时可能有
+
+47
+00:01:28,420 --> 00:01:29,650
+a few different engineering teams, maybe
+几个不同的工程团队,也许
+
+48
+00:01:30,110 --> 00:01:31,090
+one engineering team gives you
+一个工程队给你
+
+49
+00:01:31,200 --> 00:01:32,500
+two hundred features, a second
+二百个特征,第二
+
+50
+00:01:32,770 --> 00:01:34,000
+engineering team gives you another
+工程队给你另外
+
+51
+00:01:34,340 --> 00:01:35,420
+three hundred features, and a
+三百个的特征,
+
+52
+00:01:35,550 --> 00:01:36,640
+third engineering team gives you five
+第三工程队给你
+
+53
+00:01:36,940 --> 00:01:38,150
+hundred features so you have
+五百个特征
+
+54
+00:01:38,290 --> 00:01:39,220
+a thousand features all together,
+一千多个特征都在一起,
+
+55
+00:01:39,940 --> 00:01:40,910
+and it actually becomes hard to
+它实际上会变得非常困难
+
+56
+00:01:41,040 --> 00:01:42,820
+keep track of you know, exactly which features
+去跟踪你知道的那些特征
+
+57
+00:01:43,200 --> 00:01:44,540
+you got from which team, and
+你从那些工程队得到的。
+
+58
+00:01:44,860 --> 00:01:47,310
+it's actually not that want to have highly redundant features like these.
+其实不想有高度冗余的特征一样。
+
+59
+00:01:47,530 --> 00:01:49,440
+And so if the
+所以如果,
+
+60
+00:01:50,090 --> 00:01:51,520
+length in centimeters were rounded
+厘米的长度被舍入到
+
+61
+00:01:51,940 --> 00:01:53,920
+off to the nearest centimeter and
+最接近的厘米和
+
+62
+00:01:54,060 --> 00:01:56,480
+lengthened inches was rounded off to the nearest inch.
+英寸加长舍入到最近的英寸。
+
+63
+00:01:57,070 --> 00:01:58,050
+Then, that's why these examples
+这就是为什么这些例子
+
+64
+00:01:58,720 --> 00:01:59,900
+don't lie perfectly on a
+不在于完美地结合在一条
+
+65
+00:02:00,100 --> 00:02:01,270
+straight line, because of, you know, round-off
+直线,因为,你知道,
+
+66
+00:02:01,740 --> 00:02:03,420
+error to the nearest centimeter or the nearest inch.
+舍入误差在最近厘米或最近的英寸。
+
+67
+00:02:04,260 --> 00:02:05,160
+And if we can reduce
+如果我们可以减少
+
+68
+00:02:05,610 --> 00:02:06,680
+the data to one dimension
+数据到一维
+
+69
+00:02:07,130 --> 00:02:10,320
+instead of two dimensions, that reduces the redundancy.
+取代二维,减少冗余。
+
+70
+00:02:11,590 --> 00:02:14,030
+For a different example, again maybe when there seems fairly less contrives.
+一个不同的例子,又或许当似乎相当少的设计。
+
+71
+00:02:14,590 --> 00:02:16,560
+For may years I've
+多年我
+
+72
+00:02:16,920 --> 00:02:19,920
+been working with autonomous helicopter pilots.
+一直在研究直升飞机自动驾驶。
+
+73
+00:02:20,990 --> 00:02:22,610
+Or I've been working with pilots that fly helicopters.
+我一直与直升机飞行员一起。
+
+74
+00:02:23,950 --> 00:02:24,040
+And so.
+诸如此类。
+
+75
+00:02:25,080 --> 00:02:28,090
+If you were to measure--if you
+如果你想测量——如果你
+
+76
+00:02:28,250 --> 00:02:29,100
+were to, you know, do a survey
+想做,你知道,做一个调查
+
+77
+00:02:29,590 --> 00:02:30,500
+or do a test of these different
+或做这些不同飞行员的测试
+
+78
+00:02:30,770 --> 00:02:32,200
+pilots--you might have one
+——你可能有一个
+
+79
+00:02:32,440 --> 00:02:33,780
+feature, x1, which is maybe
+特征,X1,这也许是
+
+80
+00:02:34,050 --> 00:02:35,600
+the skill of these
+他们的技能
+
+81
+00:02:35,820 --> 00:02:38,190
+helicopter pilots, and maybe
+(直升机飞行员),也许
+
+82
+00:02:38,460 --> 00:02:41,810
+"x2" could be the pilot enjoyment.
+“X2”可能是飞行员的爱好。
+
+83
+00:02:42,700 --> 00:02:43,770
+That is, you know, how
+这是表示
+
+84
+00:02:43,870 --> 00:02:45,050
+much they enjoy flying, and maybe
+他们是否喜欢飞行,也许
+
+85
+00:02:45,280 --> 00:02:46,810
+these two features will be highly correlated. And
+这两个特征将高度相关。
+
+86
+00:02:48,310 --> 00:02:49,730
+what you really care about might
+你真正关心的可能
+
+87
+00:02:49,940 --> 00:02:52,530
+be this sort of
+是这样的
+
+88
+00:02:53,610 --> 00:02:55,120
+this sort of, this direction, a different feature that really
+类别,方向,不同的特征,决定
+
+89
+00:02:55,370 --> 00:02:57,190
+measures pilot aptitude.
+飞行员的能力。
+
+90
+00:03:00,450 --> 00:03:01,240
+And I'm making up the name
+和我的名字命名
+
+91
+00:03:01,590 --> 00:03:03,220
+aptitude of course, but again, if
+资质的课程,但是,如果
+
+92
+00:03:03,320 --> 00:03:04,780
+you highly correlated features, maybe
+你高度相关的特征,也许
+
+93
+00:03:04,990 --> 00:03:06,500
+you really want to reduce the dimension.
+你真需要降低维数。
+
+94
+00:03:07,570 --> 00:03:08,760
+So, let me say a
+所以,让我说
+
+95
+00:03:09,040 --> 00:03:09,950
+little bit more about what it
+点什么
+
+96
+00:03:10,060 --> 00:03:11,390
+really means to reduce the
+真的意味着减少
+
+97
+00:03:11,520 --> 00:03:12,950
+dimension of the data from
+数据的维度,从
+
+98
+00:03:13,150 --> 00:03:14,400
+2 dimensions down from 2D
+从2维(2D)
+
+99
+00:03:14,600 --> 00:03:16,300
+to 1 dimensional or to 1D.
+到1维(1D)。
+
+100
+00:03:16,840 --> 00:03:18,660
+Let me color in
+让我把
+
+101
+00:03:18,830 --> 00:03:19,940
+these examples by using different colors.
+这些样本通过使用不同的颜色标注。
+
+102
+00:03:21,730 --> 00:03:22,890
+And in this case
+在这种情况下,
+
+103
+00:03:23,370 --> 00:03:24,740
+by reducing the dimension what
+的降维是什么。
+
+104
+00:03:25,010 --> 00:03:26,320
+I mean is that I would
+我的意思是,我会
+
+105
+00:03:26,540 --> 00:03:28,400
+like to find maybe this
+想找到这也许
+
+106
+00:03:28,660 --> 00:03:30,560
+line, this, you know, direction on
+线,方向等你知道的东西
+
+107
+00:03:30,710 --> 00:03:31,700
+which most of the data seems
+其中的绝大部分数据似乎
+
+108
+00:03:31,910 --> 00:03:33,150
+to lie and project all
+在项目的所有
+
+109
+00:03:33,380 --> 00:03:34,740
+the data onto that line which
+数据上的线
+
+110
+00:03:34,910 --> 00:03:36,230
+is true, and by doing
+是真实的,并通过实践的。
+
+111
+00:03:36,510 --> 00:03:37,430
+so, what I can do
+所以,我能做的
+
+112
+00:03:37,970 --> 00:03:39,420
+is just measure the
+只是测量
+
+113
+00:03:39,580 --> 00:03:41,480
+position of each of the examples on that line.
+每个样本在线上的位置。
+
+114
+00:03:42,010 --> 00:03:42,820
+And what I can do is come
+我能做的就是
+
+115
+00:03:43,100 --> 00:03:45,080
+up with a new feature, z1,
+建立新的特征,Z1,
+
+116
+00:03:46,830 --> 00:03:48,200
+and to specify the position
+和在线上指定的位置
+
+117
+00:03:48,730 --> 00:03:49,530
+on the line I need only
+就行,我只需要
+
+118
+00:03:49,890 --> 00:03:50,940
+one number, so it says
+一个数,所以说
+
+119
+00:03:51,200 --> 00:03:51,980
+z1 is a new feature
+Z1是一个新的特征
+
+120
+00:03:52,750 --> 00:03:54,630
+that specifies the location of
+指定位置
+
+121
+00:03:54,830 --> 00:03:57,610
+each of those points on this green line.
+这些点在这个绿色的线。
+
+122
+00:03:58,060 --> 00:03:59,300
+And what this means, is
+这意味着什么,是
+
+123
+00:03:59,400 --> 00:04:00,680
+that where as previously if i
+以前如果我
+
+124
+00:04:00,930 --> 00:04:02,540
+had an example x1, maybe
+有个样本X1,也许
+
+125
+00:04:03,430 --> 00:04:04,740
+this was my first example, x1.
+这是我的第一个例子,X1。
+
+126
+00:04:05,040 --> 00:04:06,480
+So in order to
+为了
+
+127
+00:04:06,820 --> 00:04:08,550
+represent x1 originally x1.
+代表X1原先的x1。
+
+128
+00:04:09,620 --> 00:04:10,760
+I needed a two dimensional number,
+我需要一个二维数,
+
+129
+00:04:11,570 --> 00:04:12,800
+or a two dimensional feature vector.
+或一个二维特征向量。
+
+130
+00:04:13,700 --> 00:04:14,920
+Instead now I can represent
+现在我可以代表
+
+131
+00:04:18,120 --> 00:04:20,330
+z1. I could
+z1。我可以
+
+132
+00:04:20,520 --> 00:04:22,170
+use just z1 to represent my first
+用Z1代表我的第一个
+
+133
+00:04:23,270 --> 00:04:25,380
+example, and that's going to be a real number.
+样本,这将是一个实数。
+
+134
+00:04:25,940 --> 00:04:29,260
+And similarly x2 you know, if x2
+同样X2你知道,如果x2
+
+135
+00:04:29,590 --> 00:04:31,400
+is my second example there,
+是我的第二个样本,
+
+136
+00:04:32,690 --> 00:04:35,110
+then previously, whereas this required
+然后以前,而这需要
+
+137
+00:04:35,830 --> 00:04:37,520
+two numbers to represent if I
+两个数字来表示如果我
+
+138
+00:04:37,720 --> 00:04:39,930
+instead compute the projection
+代替计算投影
+
+139
+00:04:40,930 --> 00:04:42,730
+of that black cross
+那黑色的十字架
+
+140
+00:04:43,130 --> 00:04:44,250
+onto the line.
+到线。
+
+141
+00:04:44,700 --> 00:04:45,980
+And now I only need one
+,现在我只需要一个
+
+142
+00:04:46,210 --> 00:04:47,350
+real number which is
+实数
+
+143
+00:04:47,550 --> 00:04:49,580
+z2 to represent the
+Z2代表
+
+144
+00:04:49,620 --> 00:04:51,230
+location of this point
+点的位置
+
+145
+00:04:51,790 --> 00:04:53,070
+z2 on the line.
+Z2就行了。
+
+146
+00:04:54,300 --> 00:04:56,730
+And so on through my M examples.
+我的M个样本也一样。
+
+147
+00:04:57,790 --> 00:04:59,560
+So, just to summarize, if
+所以,只是总结,如果
+
+148
+00:04:59,810 --> 00:05:01,310
+we allow ourselves to approximate
+我们允许自己近似
+
+149
+00:05:02,340 --> 00:05:03,800
+the original data set by
+设置由原始数据
+
+150
+00:05:04,000 --> 00:05:05,270
+projecting all of my
+将我所有的
+
+151
+00:05:05,590 --> 00:05:07,690
+original examples onto this green
+原始的例子在这绿色
+
+152
+00:05:07,880 --> 00:05:10,260
+line over here, then I
+线在这里,然后我
+
+153
+00:05:10,360 --> 00:05:12,090
+need only one number, I
+只需要一个数,我
+
+154
+00:05:12,170 --> 00:05:13,700
+need only real number to
+只需要实数
+
+155
+00:05:13,820 --> 00:05:15,270
+specify the position of
+指定位置
+
+156
+00:05:15,370 --> 00:05:16,710
+a point on the line,
+就行了一点,
+
+157
+00:05:17,080 --> 00:05:18,220
+and so what I can
+所以我可以
+
+158
+00:05:18,300 --> 00:05:19,730
+do is therefore use just
+这么做,只使用
+
+159
+00:05:20,070 --> 00:05:21,850
+one number to represent the
+一个数表示
+
+160
+00:05:21,930 --> 00:05:23,170
+location of each of
+每一个
+
+161
+00:05:23,280 --> 00:05:26,520
+my training examples after they've been projected onto that green line.
+我的训练例子的位置的,他们被投射到绿色线。
+
+162
+00:05:27,570 --> 00:05:29,060
+So this is an approximation to
+这是一个近似
+
+163
+00:05:29,210 --> 00:05:30,300
+the original training self because
+原先训练自己因为
+
+164
+00:05:30,570 --> 00:05:32,770
+I have projected all of my training examples onto a line.
+我投射我所有的训练样本上的线。
+
+165
+00:05:33,630 --> 00:05:34,790
+But
+但
+
+166
+00:05:35,130 --> 00:05:36,140
+now, I need to keep around
+现在,我需要保持
+
+167
+00:05:36,530 --> 00:05:39,800
+only one number for each of my examples.
+为我的每个实例设置唯一的数字。
+
+168
+00:05:41,220 --> 00:05:42,960
+And so this halves the memory
+所以这两半的记忆
+
+169
+00:05:43,340 --> 00:05:44,640
+requirement, or a space requirement,
+的要求,或空间的要求,
+
+170
+00:05:45,090 --> 00:05:47,760
+or what have you, for how to store my data.
+或者你有什么,如何存储我的数据。
+
+171
+00:05:49,100 --> 00:05:50,530
+And perhaps more interestingly, more
+也许更有趣的是,更多的
+
+172
+00:05:50,700 --> 00:05:51,940
+importantly, what we'll see
+更重要的是,我们会今后看到什么
+
+173
+00:05:52,200 --> 00:05:53,520
+later, in the later
+在以后的
+
+174
+00:05:53,780 --> 00:05:55,730
+video as well is that this
+视频也是这样
+
+175
+00:05:55,930 --> 00:05:56,940
+will allow us to make
+会使我们
+
+176
+00:05:57,200 --> 00:05:59,170
+our learning algorithms run more quickly as well.
+我们的学习算法,跑得更快。
+
+177
+00:05:59,480 --> 00:06:00,600
+And that is actually,
+实际上是,
+
+178
+00:06:00,920 --> 00:06:02,060
+perhaps, even the more interesting
+也许,甚至更有趣
+
+179
+00:06:02,140 --> 00:06:03,800
+application of this data compression
+应用这种数据压缩
+
+180
+00:06:04,580 --> 00:06:06,220
+rather than reducing the memory
+而不是降低内存
+
+181
+00:06:06,680 --> 00:06:08,620
+or disk space requirement for storing the data.
+或磁盘空间用于存储数据的要求。
+
+182
+00:06:10,250 --> 00:06:11,490
+On the previous slide we
+上一个幻灯片我们
+
+183
+00:06:11,580 --> 00:06:13,140
+showed an example of reducing
+显示的一个例子是减少
+
+184
+00:06:13,620 --> 00:06:15,060
+data from 2D to 1D.
+数据维度从2D到1D。
+
+185
+00:06:15,210 --> 00:06:16,290
+On this slide, I'm going
+在此幻灯片,我会
+
+186
+00:06:16,660 --> 00:06:18,010
+to show another example of reducing
+显示另一个减少
+
+187
+00:06:18,450 --> 00:06:21,080
+data from three dimensional 3D to two dimensional 2D.
+数据从三维到二维。
+
+188
+00:06:22,590 --> 00:06:23,360
+By the way, in the more typical
+顺便说一句,在更典型的
+
+189
+00:06:23,750 --> 00:06:25,570
+example of dimensionality reduction
+降维的例子
+
+190
+00:06:26,390 --> 00:06:27,790
+we might have a thousand dimensional
+我们可能有一千维
+
+191
+00:06:28,230 --> 00:06:30,330
+data or 1000D data that
+数据或数据,
+
+192
+00:06:30,720 --> 00:06:31,880
+we might want to reduce to
+我们可能要减少
+
+193
+00:06:32,150 --> 00:06:34,080
+let's say a hundred dimensional or
+到我们说的一百维,
+
+194
+00:06:34,110 --> 00:06:35,590
+100D, but because of
+(100D),而是因为
+
+195
+00:06:35,700 --> 00:06:37,760
+the limitations of what I can plot on the slide.
+画图的局限性,我可以在幻灯片里画出来。
+
+196
+00:06:38,460 --> 00:06:41,520
+I'm going to use examples of 3D to 2D, or 2D to 1D.
+我要使用三维到二维的例子,或二维到一维。
+
+197
+00:06:43,160 --> 00:06:45,830
+So, let's have a data set like that shown here.
+在这里。我们有一个数据集显示的一样,
+
+198
+00:06:46,050 --> 00:06:47,420
+And so, I would have a set of examples
+我会有一系列的样本
+
+199
+00:06:48,110 --> 00:06:49,430
+x(i) which are points
+X(i)---
+
+200
+00:06:49,800 --> 00:06:51,790
+in r3. So, I have three dimension examples.
+R3。所以,我有三维的例子。
+
+201
+00:06:52,740 --> 00:06:53,300
+I know it might be a little
+我知道这可能是一个小
+
+202
+00:06:53,690 --> 00:06:54,610
+bit hard to see this on the slide,
+位置很难在这幻灯片里看到,
+
+203
+00:06:54,920 --> 00:06:55,980
+but I'll show a 3D point
+不过我会显示一个三维点
+
+204
+00:06:56,310 --> 00:06:58,190
+cloud in a little bit.
+像云那样一点点。
+
+205
+00:06:59,050 --> 00:07:00,280
+And it might be hard to see
+它可能是很难在这里看清楚,
+
+206
+00:07:00,380 --> 00:07:01,970
+here, but all of this
+但是所有
+
+207
+00:07:02,230 --> 00:07:04,020
+data maybe lies roughly on
+的数据大致
+
+208
+00:07:04,130 --> 00:07:05,700
+the plane, like so.
+平铺在一个平面,像这样。
+
+209
+00:07:07,110 --> 00:07:08,130
+And so what we can do
+所以我们能做的就是
+
+210
+00:07:08,380 --> 00:07:09,970
+with dimensionality reduction, is take
+降维,是把
+
+211
+00:07:10,210 --> 00:07:11,960
+all of this data and
+所有数据,
+
+212
+00:07:12,110 --> 00:07:13,800
+project the data down onto
+投影到
+
+213
+00:07:14,630 --> 00:07:15,350
+a two dimensional plane.
+一个二维平面。
+
+214
+00:07:15,700 --> 00:07:16,670
+So, here what I've done is,
+所以,在这里我所做的是,
+
+215
+00:07:16,730 --> 00:07:18,060
+I've taken all the data and I've
+我已经把所有的数据和我
+
+216
+00:07:18,300 --> 00:07:19,250
+projected all of the data,
+将所有的数据,
+
+217
+00:07:19,770 --> 00:07:21,390
+so that it all lies on the plane.
+投影在平面上。
+
+218
+00:07:22,590 --> 00:07:23,910
+Now, finally, in order to
+现在,最后,为了
+
+219
+00:07:24,040 --> 00:07:25,580
+specify the location of a
+指定一个位置
+
+220
+00:07:25,750 --> 00:07:27,810
+point within a plane, we need two numbers, right?
+点在一个平面,我们需要两个数字,对吗?
+
+221
+00:07:28,000 --> 00:07:29,150
+We need to, maybe, specify the
+我们需要,也许,指定
+
+222
+00:07:29,290 --> 00:07:30,660
+location of a point along
+的一个点的沿着轴位置
+
+223
+00:07:30,970 --> 00:07:32,370
+this axis, and then also
+这个轴,然后还
+
+224
+00:07:32,650 --> 00:07:35,090
+specify it's location along that axis.
+指定它在轴上的位置。
+
+225
+00:07:35,730 --> 00:07:37,470
+So, we need two numbers, maybe called
+所以,我们需要两个数,或者说
+
+226
+00:07:37,850 --> 00:07:39,900
+z1 and z2 to specify
+指定Z1和Z2
+
+227
+00:07:40,600 --> 00:07:42,450
+the location of a point within a plane.
+的平面内的点的位置。
+
+228
+00:07:43,290 --> 00:07:44,730
+And so, what that means,
+所以,这意味着什么,
+
+229
+00:07:44,890 --> 00:07:45,910
+is that we can now represent
+是我们现在可以把
+
+230
+00:07:46,690 --> 00:07:48,310
+each example, each training example,
+每一个样本,每个训练样本,
+
+231
+00:07:48,740 --> 00:07:50,310
+using two numbers that
+使用两个数字来代表,
+
+232
+00:07:50,630 --> 00:07:52,950
+I've drawn here, z1, and z2.
+我已经在这里,Z1和Z2。
+
+233
+00:07:53,990 --> 00:07:55,890
+So, our data can be represented
+因此,我们的数据可以表示为
+
+234
+00:07:56,610 --> 00:07:59,130
+using vector z which are in r2.
+使用向量Z是R2的子集
+
+235
+00:08:00,580 --> 00:08:02,110
+And these subscript, z subscript
+这些下标,
+
+236
+00:08:02,350 --> 00:08:03,990
+1, z subscript 2, what
+z下标1,Z下标2,
+
+237
+00:08:04,560 --> 00:08:05,440
+I just mean by that is that my
+我的意思是,我
+
+238
+00:08:05,500 --> 00:08:07,520
+vectors here, z, you know, are two
+向量Z,在这里,你知道,是个
+
+239
+00:08:07,750 --> 00:08:09,680
+dimensional vectors, z1, z2.
+二维向量,Z1,Z2。
+
+240
+00:08:10,600 --> 00:08:11,580
+And so if I have some
+所以如果我有一些
+
+241
+00:08:11,790 --> 00:08:13,690
+particular examples, z(i), or
+
+
+的具体样本,Z(i),或
+
+242
+00:08:13,760 --> 00:08:15,700
+that's the two dimensional vector, z(i)1,
+是一个二维向量,Z(i)1,
+
+243
+00:08:16,350 --> 00:08:19,110
+z(i)2.
+Z(i)2。
+
+244
+00:08:20,580 --> 00:08:21,990
+And on the previous slide when
+在上一个幻灯片中,
+
+245
+00:08:22,230 --> 00:08:23,750
+I was reducing data to one
+我是减少数据的一个
+
+246
+00:08:23,950 --> 00:08:25,270
+dimensional data then I
+维数据然后我
+
+247
+00:08:25,360 --> 00:08:27,500
+had only z1, right?
+只有Z1,对吗?
+
+248
+00:08:27,760 --> 00:08:28,610
+And that is what a z1 subscript 1
+这就是Z1下标1
+
+249
+00:08:28,700 --> 00:08:29,830
+on the previous slide was,
+在上一个幻灯片里,
+
+250
+00:08:30,550 --> 00:08:31,720
+but here I have two dimensional data,
+不过我这里有两维数据,
+
+251
+00:08:32,100 --> 00:08:32,730
+so I have z1 and z2 as
+Z1和Z2做为
+
+252
+00:08:33,040 --> 00:08:34,940
+the two components of the data.
+数据的两个组件。
+
+253
+00:08:36,690 --> 00:08:37,830
+Now, let me just make sure
+现在,让我确定一下
+
+254
+00:08:38,020 --> 00:08:39,200
+that these figures make sense. So
+这些数字的意义。所以
+
+255
+00:08:39,290 --> 00:08:40,790
+let me just reshow these exact
+让我的再现这些精确的
+
+256
+00:08:41,600 --> 00:08:45,080
+three figures again but with 3D plots.
+三维图。
+
+257
+00:08:45,540 --> 00:08:46,570
+So the process we went through was that
+所以过程我们经历的是
+
+258
+00:08:47,040 --> 00:08:48,110
+shown in the lab is the optimal
+在实验室证明是最优的
+
+259
+00:08:48,480 --> 00:08:49,520
+data set, in the middle the
+数据集,在中间
+
+260
+00:08:49,590 --> 00:08:50,540
+data set projects on the 2D,
+数据集项目对应二维,
+
+261
+00:08:51,040 --> 00:08:52,140
+and on the right the 2D
+和右边的2D
+
+262
+00:08:52,820 --> 00:08:54,900
+data sets with z1 and z2 as the axis.
+Z1和Z2为轴的数据集,
+
+263
+00:08:55,780 --> 00:08:56,610
+Let's look at them a little
+让我们看他们再深入一点。
+
+264
+00:08:56,820 --> 00:08:57,960
+bit further. Here's my original
+这是我的原始
+
+265
+00:08:58,270 --> 00:08:59,210
+data set, shown on the
+数据集,显示在
+
+266
+00:08:59,410 --> 00:09:00,680
+left, and so I had started
+左变,所以我开始
+
+267
+00:09:01,380 --> 00:09:02,420
+off with a 3D point
+和一个三维点
+
+268
+00:09:02,660 --> 00:09:04,000
+cloud like so, where the
+像云一样,在那里
+
+269
+00:09:04,360 --> 00:09:05,390
+axis are labeled x1,
+轴标记为X1,
+
+270
+00:09:05,570 --> 00:09:07,410
+x2, x3, and so there's a 3D
+
+
+X2,X3,所以这是一个三维
+
+271
+00:09:07,960 --> 00:09:08,970
+point but most of the data,
+点,但多数数据,
+
+272
+00:09:09,500 --> 00:09:10,750
+maybe roughly lies on some,
+也许大约位于,
+
+273
+00:09:10,850 --> 00:09:12,800
+you know, not too far from some 2D plain.
+你知道,不会太远离二维平面。
+
+274
+00:09:13,930 --> 00:09:14,950
+So, what we can
+所以,我们可以
+
+275
+00:09:15,040 --> 00:09:17,460
+do is take this data and here's my middle figure.
+做的是把这个数据,这是我的中间图。
+
+276
+00:09:17,800 --> 00:09:19,110
+I'm going to project it onto 2D.
+我将它投影到二维。
+
+277
+00:09:19,370 --> 00:09:20,790
+So, I've projected this data so
+所以,我预计这一数据
+
+278
+00:09:20,900 --> 00:09:23,220
+that all of it now lies on this 2D surface.
+,这一切点都在这个2D表面。
+
+279
+00:09:23,750 --> 00:09:25,330
+As you can see all the data
+你可以看到所有的数据
+
+280
+00:09:26,190 --> 00:09:27,470
+lies on a plane, 'cause we've
+在是一个平面,因为我们
+
+281
+00:09:27,700 --> 00:09:30,520
+projected everything onto a
+投射到的一切到一个
+
+282
+00:09:30,570 --> 00:09:31,490
+plane, and so what this means is that
+平面,所以这意味着
+
+283
+00:09:31,800 --> 00:09:33,190
+now I need only two numbers,
+现在我只需要两个数,
+
+284
+00:09:33,820 --> 00:09:35,090
+z1 and z2, to represent
+Z1和Z2,代表
+
+285
+00:09:35,620 --> 00:09:37,470
+the location of point on the plane.
+点在的平面上的位置。
+
+286
+00:09:40,530 --> 00:09:41,480
+And so that's the process that
+这样的过程
+
+287
+00:09:41,810 --> 00:09:42,990
+we can go through to reduce our
+我们可以通过减少我们的
+
+288
+00:09:43,500 --> 00:09:45,180
+data from three dimensional to
+数据从三维到
+
+289
+00:09:45,340 --> 00:09:48,520
+two dimensional. So that's
+二维。这样的
+
+290
+00:09:49,230 --> 00:09:50,850
+dimensionality reduction and how
+降维及
+
+291
+00:09:51,070 --> 00:09:52,740
+we can use it to compress our data.
+我们可以用它来压缩数据。
+
+292
+00:09:54,010 --> 00:09:55,400
+And as we'll see
+正如我们所看到的,
+
+293
+00:09:55,580 --> 00:09:56,970
+later this will allow us to
+最后,这将使我们能够
+
+294
+00:09:57,110 --> 00:09:58,020
+make some of our learning algorithms
+使我们的一些学习算法
+
+295
+00:09:58,580 --> 00:09:59,670
+run much later as well, but
+运行也较晚,但
+
+296
+00:09:59,740 --> 00:10:01,210
+we'll get to that only in a later video.
+我们会在以后的视频提到它。
+
diff --git a/srt/14 - 2 - Motivation II_ Visualization (6 min).srt b/srt/14 - 2 - Motivation II_ Visualization (6 min).srt
new file mode 100644
index 00000000..d9b07ac6
--- /dev/null
+++ b/srt/14 - 2 - Motivation II_ Visualization (6 min).srt
@@ -0,0 +1,755 @@
+1
+00:00:00,130 --> 00:00:01,140
+In the last video, we talked
+在上一个的视频,我们谈了(字幕翻译:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,470 --> 00:00:03,380
+about dimensionality reduction for
+关于降维
+
+3
+00:00:03,530 --> 00:00:05,090
+the purpose of compressing the data.
+压缩数据的目的。
+
+4
+00:00:05,830 --> 00:00:06,770
+In this video, I'd like
+这个视频,我想
+
+5
+00:00:06,910 --> 00:00:08,140
+to tell you about a second application
+告诉你第二个应用程序
+
+6
+00:00:08,870 --> 00:00:12,490
+of dimensionality reduction and that is to visualize the data.
+降维,可视化的数据。
+
+7
+00:00:13,440 --> 00:00:14,210
+For a lot of machine learning
+很多机器学习
+
+8
+00:00:14,560 --> 00:00:15,890
+applications, it really helps
+应用程序,它真的很有帮助
+
+9
+00:00:16,220 --> 00:00:17,650
+us to develop effective learning
+我们发展有效的学习
+
+10
+00:00:17,990 --> 00:00:20,260
+algorithms, if we can understand our data better.
+算法,如果我们能更好地了解我们的数据。
+
+11
+00:00:20,610 --> 00:00:21,890
+If there is some way of visualizing
+如果有一些可视化方法
+
+12
+00:00:22,100 --> 00:00:23,790
+the data better, and so,
+数据更好,所以,
+
+13
+00:00:24,080 --> 00:00:25,810
+dimensionality reduction offers us,
+降维提供了我们,
+
+14
+00:00:25,990 --> 00:00:27,870
+often, another useful tool to do so.
+的时候,另一个有用的工具可以完成。
+
+15
+00:00:28,700 --> 00:00:29,290
+Let's start with an example.
+开始吧。
+
+16
+00:00:30,840 --> 00:00:31,370
+Let's say we've collected a large
+我们说我们已经收集了大量
+
+17
+00:00:31,720 --> 00:00:33,190
+data set of many statistics
+数据集的统计
+
+18
+00:00:33,840 --> 00:00:35,730
+and facts about different countries around the world.
+和事实世界不同的国家。
+
+19
+00:00:36,030 --> 00:00:37,190
+So, maybe the first feature, X1
+所以,也许第一个特征,X1
+
+20
+00:00:38,090 --> 00:00:39,530
+is the country's GDP, or the
+是该国的国内生产总值,或
+
+21
+00:00:39,720 --> 00:00:41,710
+Gross Domestic Product, and
+国内生产总值,和
+
+22
+00:00:41,850 --> 00:00:43,210
+X2 is a per capita, meaning
+X2是人均,意义
+
+23
+00:00:43,600 --> 00:00:45,770
+the per person GDP, X3
+人均GDP,X3
+
+24
+00:00:46,080 --> 00:00:48,340
+human development index, life
+人类发展指数,生活
+
+25
+00:00:48,530 --> 00:00:51,290
+expectancy, X5, X6 and so on.
+期望,X5和X6,等等。
+
+26
+00:00:51,560 --> 00:00:52,670
+And we may have a huge data set
+,我们可能有一个巨大的数据集
+
+27
+00:00:52,880 --> 00:00:54,080
+like this, where, you know,
+这样,在那里,你知道的,
+
+28
+00:00:54,290 --> 00:00:56,890
+maybe 50 features for
+也许50个特征
+
+29
+00:00:57,650 --> 00:00:59,660
+every country, and we have a huge set of countries.
+每一个国家,我们有一个庞大的国家。
+
+30
+00:01:01,310 --> 00:01:02,300
+So is there something
+所以有什么
+
+31
+00:01:02,810 --> 00:01:05,210
+we can do to try to understand our data better?
+我们能做的更好的理解我们的数据?
+
+32
+00:01:05,490 --> 00:01:07,200
+I've given this huge table of numbers.
+我给了这么庞大的数据表。
+
+33
+00:01:07,850 --> 00:01:11,010
+How do you visualize this data?
+你如何表述这个数据?
+
+34
+00:01:11,510 --> 00:01:12,420
+If you have 50 features, it's
+如果你有50个特征,它的
+
+35
+00:01:12,600 --> 00:01:13,970
+very difficult to plot 50-dimensional
+很难绘制50维
+
+36
+00:01:15,620 --> 00:01:16,469
+data.
+数据。
+
+37
+00:01:16,470 --> 00:01:19,060
+What is a good way to examine this data?
+检查这个数据的一个很好的方法是什么?
+
+38
+00:01:20,750 --> 00:01:22,820
+Using dimensionality reduction, what
+采用降维,
+
+39
+00:01:22,960 --> 00:01:24,920
+we can do is, instead of
+我们能做的,而不是
+
+40
+00:01:25,200 --> 00:01:27,240
+having each country represented by
+有各个国家的代表
+
+41
+00:01:27,430 --> 00:01:30,220
+this featured vector, xi, which
+这一特征向量,而xi,
+
+42
+00:01:30,460 --> 00:01:33,140
+is 50-dimensional, so instead
+是50维的,而不是
+
+43
+00:01:33,410 --> 00:01:34,800
+of, say, having a country
+,说,有一个国家
+
+44
+00:01:35,330 --> 00:01:37,260
+like Canada, instead of
+比如加拿大,而不是
+
+45
+00:01:37,380 --> 00:01:38,880
+having 50 numbers to represent the features
+50个数字来代表的特征
+
+46
+00:01:39,320 --> 00:01:41,030
+of Canada, let's say we
+加拿大,让我们说我们
+
+47
+00:01:41,240 --> 00:01:42,350
+can come up with a different feature
+会有不同的特征
+
+48
+00:01:42,900 --> 00:01:44,930
+representation that is these
+表示这些
+
+49
+00:01:45,320 --> 00:01:47,650
+z vectors, that is in R2.
+z向量,即在R2。
+
+50
+00:01:49,590 --> 00:01:50,780
+If that's the case, if we
+如果是这样的话,如果我们
+
+51
+00:01:50,910 --> 00:01:51,930
+can have just a pair of
+只能有一对
+
+52
+00:01:52,230 --> 00:01:53,640
+numbers, z1 and z2 that
+号码,Z1和Z2,
+
+53
+00:01:53,790 --> 00:01:55,500
+somehow, summarizes my 50
+不知何故,概述了我的50个
+
+54
+00:01:55,640 --> 00:01:56,730
+numbers, maybe what we
+数字,也许我们
+
+55
+00:01:56,810 --> 00:01:57,880
+can do [xx] is to plot
+可以做[ XX ]来画
+
+56
+00:01:58,190 --> 00:01:59,750
+these countries in R2 and
+这些国家在R2和
+
+57
+00:01:59,970 --> 00:02:01,500
+use that to try to
+利用
+
+58
+00:02:01,590 --> 00:02:03,810
+understand the space in
+了解空间
+
+59
+00:02:03,950 --> 00:02:05,630
+[xx] of features of different
+[ XX ]的特征关于不同的
+
+60
+00:02:05,900 --> 00:02:08,250
+countries [xx] the better and
+国家[ XX ]更好
+
+61
+00:02:08,520 --> 00:02:10,690
+so, here, what you
+所以,在这里,你
+
+62
+00:02:10,780 --> 00:02:11,980
+can do is reduce the
+可以做的是减少
+
+63
+00:02:12,070 --> 00:02:14,630
+data from 50
+数据从50
+
+64
+00:02:14,850 --> 00:02:16,580
+D, from 50 dimensions
+D,从50个维度
+
+65
+00:02:17,470 --> 00:02:18,380
+to 2D, so you can
+二维的,所以你可以
+
+66
+00:02:18,740 --> 00:02:19,960
+plot this as a 2
+画作2维
+
+67
+00:02:20,170 --> 00:02:21,470
+dimensional plot, and, when
+图,而且,当
+
+68
+00:02:21,610 --> 00:02:23,060
+you do that, it turns out
+你这么做,原来
+
+69
+00:02:23,270 --> 00:02:24,110
+that, if you look at the
+如果你看看
+
+70
+00:02:24,280 --> 00:02:25,770
+output of the Dimensionality Reduction algorithms,
+降维算法的输出,
+
+71
+00:02:26,720 --> 00:02:28,650
+It usually doesn't astride a
+通常不在
+
+72
+00:02:28,920 --> 00:02:32,340
+physical meaning to these new features you want [xx] to.
+物理意义的这些新特征,你需要 [XX ]。
+
+73
+00:02:32,710 --> 00:02:35,210
+It's often up to us to figure out you know, roughly what these features means.
+往往是由我们来弄清楚你知道,大概是这些特征。
+
+74
+00:02:36,810 --> 00:02:39,440
+But, And if you plot those features, here is what you might find.
+但是,如果情节的这些特征,这里是你可能会发现什么。
+
+75
+00:02:39,750 --> 00:02:41,090
+So, here, every country
+所以,在这里,每一个国家
+
+76
+00:02:41,760 --> 00:02:43,060
+is represented by a point
+是由一个点表示
+
+77
+00:02:43,820 --> 00:02:44,640
+ZI, which is an R2
+zi,这是一个R2
+
+78
+00:02:44,990 --> 00:02:46,440
+and so each of those.
+等每个人。
+
+79
+00:02:46,790 --> 00:02:47,780
+Dots, and this figure
+点,这个数字
+
+80
+00:02:48,050 --> 00:02:48,980
+represents a country, and so,
+代表一个国家,所以,
+
+81
+00:02:49,200 --> 00:02:50,830
+here's Z1 and here's
+这里Z1和这里的
+
+82
+00:02:51,200 --> 00:02:53,380
+Z2, and [xx] [xx] of these.
+Z2,[xx] [xx]这些。
+
+83
+00:02:54,090 --> 00:02:55,310
+So, you might find,
+的话,你会发现,
+
+84
+00:02:55,680 --> 00:02:57,270
+for example, That the horizontial
+为例,这个水平
+
+85
+00:02:57,690 --> 00:02:59,240
+axis the Z1 axis
+轴Z1轴
+
+86
+00:03:00,270 --> 00:03:01,980
+corresponds roughly to the
+大约相当于
+
+87
+00:03:02,260 --> 00:03:05,150
+overall country size, or
+整体国家大小,或
+
+88
+00:03:05,230 --> 00:03:07,410
+the overall economic activity of a country.
+一个国家的整体经济活动。
+
+89
+00:03:07,800 --> 00:03:09,950
+So the overall GDP, overall
+所以整体GDP,整体
+
+90
+00:03:10,750 --> 00:03:13,490
+economic size of a country.
+经济国家的大小。
+
+91
+00:03:14,350 --> 00:03:15,860
+Whereas the vertical axis in our
+而在垂直轴
+
+92
+00:03:15,920 --> 00:03:18,250
+data might correspond to the
+数据可能对应于
+
+93
+00:03:18,390 --> 00:03:21,430
+per person GDP.
+人均GDP。
+
+94
+00:03:22,290 --> 00:03:23,900
+Or the per person well being,
+或每人幸福,
+
+95
+00:03:24,160 --> 00:03:30,730
+or the per person economic activity, and,
+或每个人的经济活动,并,
+
+96
+00:03:31,030 --> 00:03:32,370
+you might find that, given these
+你可能会发现,在这些
+
+97
+00:03:32,570 --> 00:03:33,540
+50 features, you know, these
+50个特征,你知道,这些
+
+98
+00:03:34,040 --> 00:03:35,160
+are really the 2 main dimensions
+真的2个主要方面
+
+99
+00:03:35,800 --> 00:03:37,760
+of the deviation, and so, out
+的偏差,因此,具
+
+100
+00:03:38,170 --> 00:03:39,140
+here you may have a country
+在这里你可以有一个国家
+
+101
+00:03:39,820 --> 00:03:41,220
+like the U.S.A., which
+像美国,它
+
+102
+00:03:41,500 --> 00:03:43,370
+is a relatively large GDP,
+是一个比较大的国内生产总值,
+
+103
+00:03:43,690 --> 00:03:44,990
+you know, is a very
+你知道,是一个非常
+
+104
+00:03:45,270 --> 00:03:46,490
+large GDP and a relatively
+大的国内生产总值和相对
+
+105
+00:03:46,710 --> 00:03:48,760
+high per-person GDP as well.
+高人均GDP和。
+
+106
+00:03:49,470 --> 00:03:50,710
+Whereas here you might have
+而在这里你可能
+
+107
+00:03:51,410 --> 00:03:53,720
+a country like Singapore, which
+新加坡这样的国家,其
+
+108
+00:03:53,970 --> 00:03:55,040
+actually has a very
+其实有一个很
+
+109
+00:03:55,390 --> 00:03:56,760
+high per person GDP as well,
+高人均GDP和,
+
+110
+00:03:57,030 --> 00:03:58,010
+but because Singapore is a much
+但因为新加坡是一个多
+
+111
+00:03:58,100 --> 00:03:59,820
+smaller country the overall
+较小国家的整体
+
+112
+00:04:01,030 --> 00:04:02,230
+economy size of Singapore
+经济规模新加坡
+
+113
+00:04:03,460 --> 00:04:05,060
+is much smaller than the US.
+比美国小得多。
+
+114
+00:04:06,270 --> 00:04:08,140
+And, over here, you would
+,在这里,你会
+
+115
+00:04:08,290 --> 00:04:10,880
+have countries where individuals
+有国家的人
+
+116
+00:04:12,020 --> 00:04:13,320
+are unfortunately some are less
+是不幸的是有些不太
+
+117
+00:04:13,430 --> 00:04:14,660
+well off, maybe shorter life expectancy,
+小康,也许更短的寿命,
+
+118
+00:04:15,820 --> 00:04:17,000
+less health care, less economic
+不卫生,不经济
+
+119
+00:04:18,290 --> 00:04:19,370
+maturity that's why smaller
+成熟,所以小
+
+120
+00:04:19,700 --> 00:04:21,950
+countries, whereas a point
+的国家,而这一点
+
+121
+00:04:22,280 --> 00:04:23,780
+like this will correspond to
+这样会对应
+
+122
+00:04:24,450 --> 00:04:26,000
+a country that has a
+一个国家有一个
+
+123
+00:04:26,160 --> 00:04:27,870
+fair, has a substantial amount of
+公平,有大量的
+
+124
+00:04:28,090 --> 00:04:29,540
+economic activity, but where individuals
+经济活动,但那里的人
+
+125
+00:04:30,520 --> 00:04:32,520
+tend to be somewhat less well off.
+往往有点不太好了。
+
+126
+00:04:32,600 --> 00:04:33,700
+So you might find that
+所以你会发现
+
+127
+00:04:33,840 --> 00:04:35,610
+the axes Z1 and Z2
+轴Z1和Z2
+
+128
+00:04:35,680 --> 00:04:37,140
+can help you to most succinctly
+可以帮助你以最简洁的
+
+129
+00:04:37,670 --> 00:04:39,010
+capture really what are the
+捕获真的是什么
+
+130
+00:04:39,120 --> 00:04:40,120
+two main dimensions of the variations
+两个主要维度的变化
+
+131
+00:04:41,360 --> 00:04:42,120
+amongst different countries.
+在不同的国家。
+
+132
+00:04:43,430 --> 00:04:44,910
+Such as the overall economic
+如总体经济
+
+133
+00:04:45,400 --> 00:04:46,850
+activity of the country projected
+活动的国家计划
+
+134
+00:04:47,390 --> 00:04:48,800
+by the size of the
+的大小
+
+135
+00:04:49,090 --> 00:04:50,770
+country's overall economy as well
+国家整体经济
+
+136
+00:04:51,320 --> 00:04:53,440
+as the per-person individual
+以及每个个体
+
+137
+00:04:54,040 --> 00:04:55,290
+well-being, measured by per-person
+幸福,通过人均
+
+138
+00:04:56,960 --> 00:04:58,470
+GDP, per-person healthcare, and things like that.
+GDP,人均医疗保健,以及类似的东西。
+
+139
+00:05:00,930 --> 00:05:02,130
+So that's how you can
+所以你可以
+
+140
+00:05:02,290 --> 00:05:04,410
+use dimensionality reduction, in
+使用降维,
+
+141
+00:05:04,540 --> 00:05:06,230
+order to reduce data from
+为了减少数据
+
+142
+00:05:06,470 --> 00:05:07,860
+50 dimensions or whatever, down
+50维或什么的,下降到
+
+143
+00:05:08,150 --> 00:05:09,520
+to two dimensions, or maybe
+二维,或者
+
+144
+00:05:09,680 --> 00:05:11,270
+down to three dimensions, so that
+下降到三维,所以
+
+145
+00:05:11,380 --> 00:05:13,740
+you can plot it and understand your data better.
+你可以绘制和更好地了解您的数据。
+
+146
+00:05:14,840 --> 00:05:16,010
+In the next video, we'll start
+在接下来的视频中,我们将开始
+
+147
+00:05:16,440 --> 00:05:17,580
+to develop a specific algorithm,
+制定具体的算法,
+
+148
+00:05:18,200 --> 00:05:19,500
+called PCA, or Principal Component
+称为PCA,或主成分
+
+149
+00:05:20,010 --> 00:05:21,360
+Analysis, which will allow
+分析,这将允许
+
+150
+00:05:21,550 --> 00:05:22,630
+us to do this and also
+我们这样做也
+
+151
+00:05:23,820 --> 00:05:26,690
+do the earlier application I talked about of compressing the data.
+使用我谈到的较早的应用来压缩数据。
+
diff --git a/srt/14 - 3 - Principal Component Analysis Problem Formulation (9 min).srt b/srt/14 - 3 - Principal Component Analysis Problem Formulation (9 min).srt
new file mode 100644
index 00000000..c5f9a45d
--- /dev/null
+++ b/srt/14 - 3 - Principal Component Analysis Problem Formulation (9 min).srt
@@ -0,0 +1,1346 @@
+1
+00:00:00,090 --> 00:00:01,010
+For the problem of dimensionality
+对于降维问题来说
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,920 --> 00:00:03,420
+reduction, by far the
+目前
+
+3
+00:00:03,490 --> 00:00:04,620
+most popular, by far the
+最流行
+
+4
+00:00:04,690 --> 00:00:06,180
+most commonly used algorithm is
+最常用的算法是
+
+5
+00:00:06,390 --> 00:00:08,460
+something called principal components analysis or PCA.
+主成分分析方法(Principal Componet Analysis, PCA)
+
+6
+00:00:10,200 --> 00:00:11,160
+It In this video, I would
+在这段视频中
+
+7
+00:00:11,220 --> 00:00:12,610
+like to start talking about the
+我想首先开始讨论
+
+8
+00:00:12,740 --> 00:00:14,240
+problem formulation for PCA,
+PCA问题的公式描述
+
+9
+00:00:14,910 --> 00:00:16,090
+in other words let us try
+也就是说
+
+10
+00:00:16,260 --> 00:00:18,630
+to formulate precisely exactly what
+我们用公式准确地精确地描述
+
+11
+00:00:18,900 --> 00:00:19,980
+we would like PCA to do.
+我们想让 PCA 来做什么
+
+12
+00:00:20,670 --> 00:00:21,820
+Let's say we have a dataset like
+假设 我们有这样的一个数据集
+
+13
+00:00:22,020 --> 00:00:23,050
+this, so this is a dataset
+这个数据集含有
+
+14
+00:00:23,360 --> 00:00:24,710
+of example X in R2,
+二维实数R^2空间内的样本X
+
+15
+00:00:25,040 --> 00:00:26,140
+and let's say I want
+假设我想
+
+16
+00:00:26,470 --> 00:00:27,640
+to reduce the dimension of the
+对数据进行降维
+
+17
+00:00:27,810 --> 00:00:29,850
+data from two dimensional to one dimensional.
+从二维降到一维
+
+18
+00:00:31,170 --> 00:00:32,130
+In other words I would like to find
+也就是说,我想找到
+
+19
+00:00:32,690 --> 00:00:34,400
+a line onto which to project the data.
+一条直线,将数据投影到这条直线上
+
+20
+00:00:35,140 --> 00:00:37,680
+So what seems like a good line onto which to project the data?
+那怎么找到一条好的直线来投影这些数据呢?
+
+21
+00:00:38,730 --> 00:00:40,760
+A line like this might be a pretty good choice.
+这样的一条直线也许是个不错的选择
+
+22
+00:00:41,510 --> 00:00:42,790
+And the reason you think
+你认为
+
+23
+00:00:43,020 --> 00:00:43,990
+this might be a good choice is
+这是一个不错的选择的原因是
+
+24
+00:00:44,150 --> 00:00:45,420
+that if you look at
+如果你观察
+
+25
+00:00:46,020 --> 00:00:48,230
+where the projected versions of the points goes.
+投影到直线上的点的位置
+
+26
+00:00:48,530 --> 00:00:51,180
+I'm gonna take this point and project it down here and get that.
+我将这个点 投影到直线上 得到这个点
+
+27
+00:00:51,640 --> 00:00:53,500
+This point gets projected here, to
+这点被投影到这里
+
+28
+00:00:53,640 --> 00:00:55,220
+here, to here, to here
+这里 这里 以及这里
+
+29
+00:00:56,120 --> 00:00:57,360
+what we find is that the
+我们发现
+
+30
+00:00:57,420 --> 00:00:58,860
+distance between each point
+每个点到它们对应的
+
+31
+00:00:59,460 --> 00:01:02,520
+and the projected version is pretty small.
+投影到直线上的点之间的距离非常小
+
+32
+00:01:03,790 --> 00:01:06,490
+That is, these blue
+也就是说 这些蓝色的
+
+33
+00:01:06,690 --> 00:01:08,210
+line segments are pretty short.
+线段非常的短
+
+34
+00:01:09,270 --> 00:01:10,260
+So what PCA
+所以 正式的说
+
+35
+00:01:10,430 --> 00:01:11,730
+does, formally, is it tries
+PCA所作的就是
+
+36
+00:01:12,180 --> 00:01:14,320
+to find a lower-dimensional surface
+寻找一个低维的面
+
+37
+00:01:14,340 --> 00:01:15,250
+really a line in
+在这个例子中
+
+38
+00:01:15,330 --> 00:01:16,660
+this case, onto which to
+其实是一条直线
+
+39
+00:01:16,740 --> 00:01:18,260
+project the data, so that
+数据投射在上面 使得
+
+40
+00:01:18,520 --> 00:01:20,130
+the sum of squares of these
+这些蓝色小线段的平方和
+
+41
+00:01:20,360 --> 00:01:22,570
+little blue line segments is minimized.
+达到最小化
+
+42
+00:01:23,550 --> 00:01:24,780
+The length of those blue line
+这些蓝色线段的长度
+
+43
+00:01:25,020 --> 00:01:26,530
+segments, that's sometimes also called
+时常被叫做
+
+44
+00:01:27,100 --> 00:01:29,710
+the projection error, and so what PCA
+投影误差
+
+45
+00:01:29,750 --> 00:01:30,480
+does is it tries to find
+所以PCA所做的就是寻找
+
+46
+00:01:30,770 --> 00:01:31,840
+the surface onto which to
+一个投影平面
+
+47
+00:01:32,010 --> 00:01:33,350
+project the data so as to
+对数据进行投影
+
+48
+00:01:33,480 --> 00:01:35,050
+minimize that. As an a
+使得这个能够最小化
+
+49
+00:01:35,090 --> 00:01:37,460
+side, before applying PCA
+另外,在应用PCA之前
+
+50
+00:01:37,960 --> 00:01:39,750
+it's standard practice to
+常规的做法是
+
+51
+00:01:39,960 --> 00:01:41,300
+first perform mean normalization and
+先进行均值归一化和
+
+52
+00:01:41,820 --> 00:01:43,190
+feature scaling so that the
+特征规范化,使得
+
+53
+00:01:43,560 --> 00:01:44,760
+features, X1 and X2,
+特征X1和X2
+
+54
+00:01:44,880 --> 00:01:46,770
+should have zero mean and
+均值为0
+
+55
+00:01:46,880 --> 00:01:48,740
+should have comparable ranges of values.
+数值在可比较的范围之内
+
+56
+00:01:49,110 --> 00:01:50,320
+I've already done this for this
+在这个例子里,我已经这么做了
+
+57
+00:01:50,490 --> 00:01:51,590
+example, but I'll come
+但是
+
+58
+00:01:51,680 --> 00:01:52,990
+back to this later and talk more
+在后面,我将回过来,讨论更多的
+
+59
+00:01:53,190 --> 00:01:54,960
+about feature scaling and mean normalization in the context of PCA later.
+PCA背景下的特征规范化和均值归一化问题
+
+60
+00:01:58,600 --> 00:01:59,420
+Coming back to this example,
+回到这个例子
+
+61
+00:02:00,260 --> 00:02:01,470
+in contrast to the red
+对比
+
+62
+00:02:01,710 --> 00:02:03,300
+lines that I just drew here's
+我刚画好的红线
+
+63
+00:02:03,530 --> 00:02:05,970
+a different line onto which I could project my data.
+这是我对数据进行投影的一条不同的直线
+
+64
+00:02:06,810 --> 00:02:08,260
+This magenta line.
+这条品红色的线
+
+65
+00:02:08,520 --> 00:02:09,260
+And as you can see, you
+如你所见
+
+66
+00:02:09,370 --> 00:02:10,660
+know, this magenta line is a
+你知道,这条品红色直线
+
+67
+00:02:10,810 --> 00:02:13,920
+much worse direction onto which to project my data, right?
+是一个非常糟糕的方向来投影我的数据,对吧?
+
+68
+00:02:14,090 --> 00:02:15,020
+So if I were to project my
+所以,如果我将数据
+
+69
+00:02:15,120 --> 00:02:16,430
+data onto the magenta
+投影到这条品红色的直线上
+
+70
+00:02:16,730 --> 00:02:18,050
+line, like the other set of points like that.
+像我们刚才做的那样
+
+71
+00:02:19,140 --> 00:02:21,240
+And the projection errors, that
+这样,投影误差
+
+72
+00:02:21,420 --> 00:02:24,460
+is these blue line segments would be huge.
+就是这些蓝色的线段,将会很大
+
+73
+00:02:24,910 --> 00:02:25,930
+So these points have to
+所以,这些点将会
+
+74
+00:02:26,010 --> 00:02:28,170
+move a huge distance in
+移动很长一段距离
+
+75
+00:02:28,320 --> 00:02:29,840
+order to get onto
+才能投影到
+
+76
+00:02:30,360 --> 00:02:31,760
+the, in order to
+才能
+
+77
+00:02:31,930 --> 00:02:33,440
+get projected onto the magenta line.
+投影到这条品红色直线上
+
+78
+00:02:33,740 --> 00:02:35,390
+And so that's why PCA
+因此,这就是为什么PCA
+
+79
+00:02:36,010 --> 00:02:37,540
+principle component analysis would choose
+主成分分析方法会选择
+
+80
+00:02:37,860 --> 00:02:38,840
+something like the red line
+红色的这条直线
+
+81
+00:02:39,230 --> 00:02:41,410
+rather than like the magenta line down here.
+而不是品红色的这条直线
+
+82
+00:02:42,870 --> 00:02:45,280
+Let's write out the PCA problem a little more formally.
+我们正式一点地写出PCA问题
+
+83
+00:02:46,140 --> 00:02:47,660
+The goal of PCA if we
+PCA的目标是
+
+84
+00:02:47,810 --> 00:02:49,150
+want to reduce data from two-
+将数据从二维
+
+85
+00:02:49,360 --> 00:02:50,580
+dimensional to one-dimensional is
+降到一维
+
+86
+00:02:51,450 --> 00:02:52,160
+we're going to try to find
+我们将试着寻找
+
+87
+00:02:52,640 --> 00:02:54,590
+a vector, that is
+一个向量
+
+88
+00:02:54,970 --> 00:02:56,160
+a vector Ui
+向量Ui
+
+89
+00:02:57,150 --> 00:02:58,250
+which is going to be in Rn,
+属于Rn空间中的向量
+
+90
+00:02:58,780 --> 00:03:00,170
+so that would be in R2 in this case,
+在这个例子中是R2空间中的
+
+91
+00:03:01,130 --> 00:03:02,300
+going to find direction onto
+寻找一个对数据进行投影的方向
+
+92
+00:03:02,600 --> 00:03:04,990
+which to project the data so as to minimize the projection error.
+使得投影误差能够最小
+
+93
+00:03:05,400 --> 00:03:06,710
+So in this example I'm
+在这个例子里
+
+94
+00:03:07,190 --> 00:03:09,180
+hoping that PCA will find this
+希望PCA寻找到
+
+95
+00:03:09,380 --> 00:03:10,590
+vector, which I'm going
+向量,我将它叫做
+
+96
+00:03:10,720 --> 00:03:12,960
+to call U1, so that
+U1,所以
+
+97
+00:03:13,120 --> 00:03:14,340
+when I project the data onto
+当我把数据投影到
+
+98
+00:03:15,590 --> 00:03:17,620
+the line that I defined
+我定义的这条直线
+
+99
+00:03:18,170 --> 00:03:19,840
+by extending out this vector,
+通过延长这个向量得到的直线
+
+100
+00:03:20,370 --> 00:03:21,650
+I end up with pretty small
+最后我得到非常小的
+
+101
+00:03:22,100 --> 00:03:23,400
+reconstruction errors and reference
+重建误差
+
+102
+00:03:24,310 --> 00:03:25,220
+data looks like this.
+参考数据看上去是这样的
+
+103
+00:03:26,180 --> 00:03:26,640
+And by the way,
+此外
+
+104
+00:03:26,840 --> 00:03:28,310
+I should mention that whether PCA
+我应该指出的是,无论PCA
+
+105
+00:03:28,920 --> 00:03:32,150
+gives me U1 or negative U1, it doesn't matter.
+给出的是这个U1,还是负的U1,都没关系
+
+106
+00:03:32,650 --> 00:03:33,630
+So if it gives me a positive
+如果它给出的是正的向量
+
+107
+00:03:33,890 --> 00:03:35,530
+vector in this direction that's fine, if it gives me,
+在这个方向上,这没问题
+
+108
+00:03:35,950 --> 00:03:37,910
+sort of the opposite vector facing
+如果给出的是相反的向量
+
+109
+00:03:38,330 --> 00:03:40,160
+in the opposite direction, so that
+在相反的方向上
+
+110
+00:03:40,720 --> 00:03:43,150
+would be -U1, draw that in
+也就是-U1,用蓝色来画
+
+111
+00:03:43,300 --> 00:03:44,400
+blue instead, whether it gives me positive
+无论给的是正的
+
+112
+00:03:45,120 --> 00:03:46,310
+U1 negative U1,
+还是负的U1
+
+113
+00:03:46,440 --> 00:03:48,120
+it doesn't matter, because each of
+都没关系,因为
+
+114
+00:03:48,230 --> 00:03:50,030
+these vectors defines the
+每一个方向都定义了
+
+115
+00:03:50,110 --> 00:03:51,660
+same red line onto which
+相同的红色直线
+
+116
+00:03:51,870 --> 00:03:54,430
+I'm projecting my data. So this
+也就是我将投影的方向
+
+117
+00:03:54,610 --> 00:03:56,300
+is a case of reducing data
+这就是将
+
+118
+00:03:56,680 --> 00:03:58,120
+from 2 dimensional to 1 dimensional.
+2维数据降到1维的例子
+
+119
+00:03:58,920 --> 00:04:00,220
+In the more general case we
+更一般的情况是
+
+120
+00:04:00,350 --> 00:04:01,680
+have N dimensional data and
+我们有N维的数据
+
+121
+00:04:01,840 --> 00:04:03,790
+we want to reduce it K dimensions.
+想降到K维
+
+122
+00:04:04,970 --> 00:04:06,010
+In that case, we want to
+在这种情况下
+
+123
+00:04:06,160 --> 00:04:07,450
+find not just a single vector
+我们不仅仅只寻找单个的向量
+
+124
+00:04:07,940 --> 00:04:09,020
+onto which to project the data
+来对数据进行投影
+
+125
+00:04:09,320 --> 00:04:10,660
+but we want to find K dimensions
+我们想寻找K个方向
+
+126
+00:04:11,520 --> 00:04:12,420
+onto which to project the data.
+来对数据进行投影
+
+127
+00:04:13,290 --> 00:04:15,680
+So as to minimize this projection error.
+为了最小化投影误差
+
+128
+00:04:16,440 --> 00:04:17,100
+So here's an example.
+这是一个例子
+
+129
+00:04:17,480 --> 00:04:19,100
+If I have a 3D point
+如果我有3维数据点
+
+130
+00:04:19,390 --> 00:04:21,030
+cloud like this then maybe
+比如说像这样的
+
+131
+00:04:21,290 --> 00:04:22,620
+what I want to do is find
+我想要做的是
+
+132
+00:04:23,880 --> 00:04:26,120
+vectors, sorry find a pair of vectors,
+寻找向量,对不起,是寻找两个向量
+
+133
+00:04:27,020 --> 00:04:28,180
+and I'm gonna call these vectors
+我将这些向量叫做...
+
+134
+00:04:29,080 --> 00:04:30,530
+lets draw these in red, I'm going
+我们用红线画出来
+
+135
+00:04:30,710 --> 00:04:32,210
+to find a pair of vectors,
+我要寻找两个向量
+
+136
+00:04:32,580 --> 00:04:33,580
+extending for the origin here's
+从原点延伸出来
+
+137
+00:04:34,490 --> 00:04:37,280
+U1, and here's
+这是U1
+
+138
+00:04:37,580 --> 00:04:39,800
+my second vector U2,
+这是第二个向量U2
+
+139
+00:04:40,180 --> 00:04:42,110
+and together these two
+这两个向量一起
+
+140
+00:04:42,320 --> 00:04:43,850
+vectors define a plane,
+定义了一个平面
+
+141
+00:04:44,400 --> 00:04:45,590
+or they define a 2D surface,
+或者说,定义了一个2维平面
+
+142
+00:04:46,790 --> 00:04:47,900
+kind of like this, sort of,
+类似于这样的
+
+143
+00:04:48,270 --> 00:04:51,140
+2D surface, onto which I'm going to project my data.
+2维平面,我将把数据投影到上面
+
+144
+00:04:52,050 --> 00:04:52,900
+For those of you that are
+对于你们其中
+
+145
+00:04:53,080 --> 00:04:54,980
+familiar with linear algebra, for
+熟悉线性代数的人来说
+
+146
+00:04:55,170 --> 00:04:56,010
+those of you that are really experts
+对于你们其中真的
+
+147
+00:04:56,230 --> 00:04:57,380
+in linear algebra, the formal
+精通线性代数的人来说,
+
+148
+00:04:57,780 --> 00:04:58,820
+definition of this is that
+对这个正式的定义是
+
+149
+00:04:59,230 --> 00:05:00,500
+we're going to find a set of
+我们将寻找一组向量
+
+150
+00:05:00,610 --> 00:05:01,680
+vectors, U1 U2 maybe up
+U1,U2,也许
+
+151
+00:05:01,800 --> 00:05:03,370
+to Uk, and what
+一直到Uk
+
+152
+00:05:03,460 --> 00:05:04,490
+we're going to do is project
+我们将要做的是
+
+153
+00:05:04,980 --> 00:05:06,600
+the data onto the linear
+将数据投影到
+
+154
+00:05:06,830 --> 00:05:09,520
+subspace spanned by this set of k vectors.
+这k个向量张开的线性子空间上
+
+155
+00:05:10,520 --> 00:05:11,570
+But if you are not familiar
+但是如果你不熟悉
+
+156
+00:05:12,070 --> 00:05:13,200
+with linear algebra, just think
+线性代数,那就想成是
+
+157
+00:05:13,400 --> 00:05:14,790
+of it as finding k directions
+寻找k个方向
+
+158
+00:05:15,510 --> 00:05:18,380
+instead of just one direction, onto which to project the data.
+而不是只寻找一个方向,对数据进行投影
+
+159
+00:05:18,740 --> 00:05:19,950
+So, finding a k-dimensional surface,
+所以,寻找一个k维的平面
+
+160
+00:05:20,610 --> 00:05:21,560
+really finding a 2D plane
+在这里是寻找2维的平面
+
+161
+00:05:22,370 --> 00:05:23,870
+in this case, shown in this
+如图所示
+
+162
+00:05:24,040 --> 00:05:25,340
+figure, where we can
+这里我们用
+
+163
+00:05:26,800 --> 00:05:29,700
+define the position of the points in the plane using k directions.
+k个方向来定义平面中这些点的位置
+
+164
+00:05:30,410 --> 00:05:31,690
+That's why for PCA, we
+这就是为什么,对于PCA
+
+165
+00:05:31,950 --> 00:05:34,440
+want to find k vectors onto which to project the data.
+我们要寻找k个向量来对数据进行投影
+
+166
+00:05:35,030 --> 00:05:36,920
+And so, more formally, in
+因此,更正式一点的说
+
+167
+00:05:37,050 --> 00:05:38,430
+PCA what we want
+在PCA中,我们想做的就是
+
+168
+00:05:38,700 --> 00:05:40,400
+to do is find this way
+寻找到这种方式
+
+169
+00:05:40,590 --> 00:05:41,940
+to project the data so as
+对数据进行投影
+
+170
+00:05:42,040 --> 00:05:43,570
+to minimize the, sort of,
+进而最小化投影距离
+
+171
+00:05:43,850 --> 00:05:46,210
+projection distance, which is distance between points and projections.
+也就是数据点和投影后的点之间的距离
+
+172
+00:05:47,060 --> 00:05:48,060
+In this so 3D example,
+在这个3维的例子里
+
+173
+00:05:48,560 --> 00:05:50,100
+too, given a point we
+给定一个点
+
+174
+00:05:50,280 --> 00:05:51,450
+would take the point and project
+我们想将这个点
+
+175
+00:05:51,980 --> 00:05:53,950
+it onto this 2D surface.
+投影到2维平面上
+
+176
+00:05:55,560 --> 00:05:56,580
+When you're done with that,
+当你完成了那个
+
+177
+00:05:57,280 --> 00:05:58,690
+and so the projection error would
+因此投影误差就是
+
+178
+00:05:58,870 --> 00:06:00,830
+be, you know, the distance between the
+也就是
+
+179
+00:06:01,440 --> 00:06:03,160
+point and where it gets projected down.
+这点与投影到
+
+180
+00:06:03,970 --> 00:06:05,360
+to my 2D surface,
+2维平面之后的点之间的距离
+
+181
+00:06:05,880 --> 00:06:06,990
+and so what PCA does is it'll
+因此PCA做的是
+
+182
+00:06:07,070 --> 00:06:08,480
+try to find a line or
+寻找一条直线
+
+183
+00:06:08,620 --> 00:06:10,430
+plane or whatever onto which
+或者平面,诸如此类等等
+
+184
+00:06:10,660 --> 00:06:11,810
+to project the data, to try
+对数据进行投影
+
+185
+00:06:12,010 --> 00:06:14,160
+to minimize that squared projection,
+来最小化平方投影
+
+186
+00:06:15,100 --> 00:06:17,430
+that 90 degree, or that orthogonal projection error.
+90度的或者正交的投影误差
+
+187
+00:06:18,100 --> 00:06:19,240
+Finally, one question that I
+最后
+
+188
+00:06:19,280 --> 00:06:20,060
+sometimes get asked is how
+一个我有时会被问到的问题是
+
+189
+00:06:20,280 --> 00:06:22,100
+does PCA relate to
+PCA和线性回归有怎么样的关系?
+
+190
+00:06:22,350 --> 00:06:24,180
+linear regression, because when explaining
+因为当我解释PCA的时候
+
+191
+00:06:24,600 --> 00:06:25,780
+PCA I sometimes end up
+我有时候会以
+
+192
+00:06:26,190 --> 00:06:28,720
+drawing diagrams like these and that looks a little bit like linear regression.
+画这样的图表来结束,看上去有点像线性回归
+
+193
+00:06:30,790 --> 00:06:32,130
+It turns out PCA is not
+但是,事实是
+
+194
+00:06:32,370 --> 00:06:33,950
+linear regression, and despite
+PCA不是线性回国
+
+195
+00:06:34,350 --> 00:06:37,560
+some cosmetic similarity these are actually totally different algorithms.
+尽管看上去有一些相似性,但是它们确实是两种不同的算法
+
+196
+00:06:38,680 --> 00:06:39,680
+If we were doing linear regression
+如果我们做线性回国
+
+197
+00:06:40,770 --> 00:06:42,170
+what we would do would be, on
+我们 做的是
+
+198
+00:06:42,270 --> 00:06:42,940
+the left we would be trying
+看左边,我们想要
+
+199
+00:06:43,230 --> 00:06:44,400
+to predict the value of some
+在给定某个输入特征x的情况下
+
+200
+00:06:44,540 --> 00:06:45,830
+variable y given some input
+预测某个变量y的数值
+
+201
+00:06:46,120 --> 00:06:47,330
+features x. And so linear
+因此,对于线性回归
+
+202
+00:06:47,570 --> 00:06:48,760
+regression, what we're doing
+我们想做的是
+
+203
+00:06:49,150 --> 00:06:50,350
+is we're fitting a straight line
+拟合一条直线
+
+204
+00:06:51,900 --> 00:06:52,970
+so as to minimize the squared
+来最小化
+
+205
+00:06:53,390 --> 00:06:56,160
+error between a point and the straight line.
+点和直线之间的平方误差
+
+206
+00:06:56,360 --> 00:06:57,270
+And so what we'd be minimizing
+所以我们要最小化的是
+
+207
+00:06:57,900 --> 00:07:00,320
+would be the squared magnitude of these blue lines.
+这些蓝线幅值的平方
+
+208
+00:07:00,790 --> 00:07:02,240
+And notice I'm drawing these
+注意我画的这些
+
+209
+00:07:02,550 --> 00:07:04,650
+blue lines vertically, that they
+蓝色的垂直线
+
+210
+00:07:05,150 --> 00:07:06,500
+are the vertical distance between
+这是垂直距离
+
+211
+00:07:06,520 --> 00:07:07,700
+a point and the value
+它是某个点
+
+212
+00:07:08,090 --> 00:07:10,470
+predicted by the hypothesis, whereas in
+与通过假设的得到的其预测值之间的距离
+
+213
+00:07:10,510 --> 00:07:13,100
+contrast, in PCA, what it
+与此想反
+
+214
+00:07:13,190 --> 00:07:14,170
+does is it tries to
+PCA要做的是
+
+215
+00:07:14,320 --> 00:07:16,890
+minimize the magnitude of these blue lines,
+最小化这些蓝色直线的幅值
+
+216
+00:07:17,460 --> 00:07:19,550
+which are drawn at an angle,
+倾斜地画出来的
+
+217
+00:07:19,980 --> 00:07:21,590
+these are really the shortest orthogonal
+这实际上是最短的
+
+218
+00:07:22,090 --> 00:07:23,900
+distances, the shortest distance between
+直角距离
+
+219
+00:07:24,050 --> 00:07:26,620
+the point X and this
+点X
+
+220
+00:07:27,000 --> 00:07:28,320
+red line, and this
+跟红色直线之间的最短距离
+
+221
+00:07:28,530 --> 00:07:29,870
+gives very different effects, depending
+这是一种非常不同的效果
+
+222
+00:07:30,600 --> 00:07:32,050
+on the data set.
+取决于数据集
+
+223
+00:07:32,400 --> 00:07:34,610
+And more generally generally and
+更更更一般的是
+
+224
+00:07:34,760 --> 00:07:35,890
+more generally when you're doing
+当你做
+
+225
+00:07:36,150 --> 00:07:37,740
+linear regression there is this
+线性回归的时候,有一个
+
+226
+00:07:38,160 --> 00:07:39,810
+distinguished variable y that
+特别的变量y
+
+227
+00:07:40,000 --> 00:07:41,130
+we're trying to predict, all that
+是我们将要预测的
+
+228
+00:07:41,560 --> 00:07:43,610
+linear regression is about is taking all the values
+线性回归所要做的就是
+
+229
+00:07:44,060 --> 00:07:45,060
+of X and try to use
+用X的所有的值来
+
+230
+00:07:45,260 --> 00:07:46,930
+that to predict Y. Whereas
+预测y
+
+231
+00:07:47,210 --> 00:07:48,920
+in PCA, there is no
+然而在PCA中
+
+232
+00:07:49,230 --> 00:07:50,200
+distinguished or there is
+没有这么一个特别的或者
+
+233
+00:07:50,400 --> 00:07:51,900
+no special variable Y that
+特殊的变量y
+
+234
+00:07:52,040 --> 00:07:52,770
+we're trying to predict, and instead
+是我们要预测的
+
+235
+00:07:53,230 --> 00:07:54,100
+we have a list of features
+我们所拥有的是
+
+236
+00:07:54,740 --> 00:07:56,130
+x1, x2, and so on
+特征x1,x2,等
+
+237
+00:07:56,280 --> 00:07:57,830
+up to xN, and all of
+一直到xN
+
+238
+00:07:57,940 --> 00:07:59,460
+these features are treated equally.
+所有的这些特征都是被同样地对待
+
+239
+00:08:00,360 --> 00:08:01,560
+So, no one of them is special.
+因此,它们中没有一个是特殊的
+
+240
+00:08:02,980 --> 00:08:05,180
+As one last example, if I
+最后一个例子中
+
+241
+00:08:05,400 --> 00:08:07,220
+have three-dimensional data, and
+如果我有3维数据
+
+242
+00:08:07,390 --> 00:08:08,660
+I want to reduce data from
+我要将这些数据
+
+243
+00:08:08,820 --> 00:08:10,110
+3D to 2D, so maybe
+从3维降到2维
+
+244
+00:08:10,380 --> 00:08:11,630
+I want to find two directions,
+我就要找到2个方向
+
+245
+00:08:12,780 --> 00:08:14,110
+you know, u1, and u2,
+也就是u1和u2
+
+246
+00:08:14,920 --> 00:08:16,030
+onto which to project my data,
+将数据投影到它们上面
+
+247
+00:08:16,960 --> 00:08:17,840
+then what I have is I
+然后我得到的是
+
+248
+00:08:18,390 --> 00:08:20,190
+have three features, x1, x2,
+我有3个特征,x1,x2
+
+249
+00:08:20,860 --> 00:08:22,410
+x3, and all of these are treated alike.
+x3,所有的这些都是被同样地对待
+
+250
+00:08:22,780 --> 00:08:24,100
+All of these are treated symmetrically
+这些都是被均等地对待
+
+251
+00:08:25,020 --> 00:08:26,240
+and there is no special variable
+没有特殊的变量
+
+252
+00:08:26,740 --> 00:08:27,740
+y that I'm trying to predict.
+y,需要被预测
+
+253
+00:08:28,870 --> 00:08:30,320
+And so PCA is not
+因此,PCA
+
+254
+00:08:30,650 --> 00:08:33,210
+linear regression, and even
+不是线性回归
+
+255
+00:08:34,020 --> 00:08:35,870
+though at some cosmetic level they
+尽管一定程度的相似性
+
+256
+00:08:36,040 --> 00:08:37,260
+might look related, these are
+使得它们看上去是有关联的
+
+257
+00:08:37,600 --> 00:08:41,580
+actually very different algorithms. So,
+但它们实际上非常不同的算法
+
+258
+00:08:41,810 --> 00:08:43,360
+hopefully you now understand what
+因此,希望你们能理解
+
+259
+00:08:43,630 --> 00:08:44,960
+PCA is doing: it's trying
+PCA是做什么的
+
+260
+00:08:45,220 --> 00:08:46,520
+to find a lower dimensional
+它是寻找到一个低维的平面
+
+261
+00:08:47,130 --> 00:08:48,290
+surface onto which to project
+对数据进行投影
+
+262
+00:08:48,680 --> 00:08:50,230
+the data so as to
+以便
+
+263
+00:08:50,450 --> 00:08:52,420
+minimize this squared projection error,
+最小化投影误差的平方
+
+264
+00:08:52,650 --> 00:08:54,140
+to minimize the squared distance between
+最小化每个点
+
+265
+00:08:54,390 --> 00:08:56,660
+each point and the location of where it gets projected.
+与投影后的对应点之间的距离的平方值
+
+266
+00:08:57,800 --> 00:08:59,040
+In the next video we'll start
+在下一段视频中
+
+267
+00:08:59,340 --> 00:09:00,490
+to talk about how to actually
+我们将开始讨论
+
+268
+00:09:00,900 --> 00:09:02,350
+find this lower dimensional surface
+如果真正地找到这个低维平面
+
+269
+00:09:03,210 --> 00:09:04,470
+onto which to project the data.
+来对数据进行投影
+
diff --git a/srt/14 - 4 - Principal Component Analysis Algorithm (15 min).srt b/srt/14 - 4 - Principal Component Analysis Algorithm (15 min).srt
new file mode 100644
index 00000000..98cee69b
--- /dev/null
+++ b/srt/14 - 4 - Principal Component Analysis Algorithm (15 min).srt
@@ -0,0 +1,2132 @@
+1
+00:00:00,340 --> 00:00:01,410
+In this video I'd like
+在这个视频中,我将(字幕翻译:中国海洋大学,孙中卫)
+
+2
+00:00:01,550 --> 00:00:03,020
+to tell you about the principle
+介绍
+
+3
+00:00:03,340 --> 00:00:04,570
+components analysis algorithm.
+主成分分析算法
+
+4
+00:00:05,600 --> 00:00:06,560
+And by the end of this
+在视频的最后,
+
+5
+00:00:06,710 --> 00:00:09,200
+video you know to implement PCA for yourself.
+你应该能独自使用主成分分析算法。
+
+6
+00:00:10,170 --> 00:00:12,540
+And use it reduce the dimension of your data.
+同时,能够使用PCA对自己的数据进行降维。
+
+7
+00:00:13,100 --> 00:00:14,690
+Before applying PCA, there is
+在使用PCA之前,
+
+8
+00:00:14,800 --> 00:00:17,760
+a data pre-processing step which you should always do.
+将首先进行数据预处理。
+
+9
+00:00:18,510 --> 00:00:20,220
+Given the trading sets of the
+给定一个交易例子的集合,
+
+10
+00:00:20,520 --> 00:00:22,290
+examples is important to
+重要的步骤是
+
+11
+00:00:22,600 --> 00:00:24,070
+always perform mean normalization,
+要总是执行均值标准化。
+
+12
+00:00:25,330 --> 00:00:26,140
+and then depending on your data,
+依据你的数据,
+
+13
+00:00:26,840 --> 00:00:28,540
+maybe perform feature scaling as well.
+大概也需要进行特征的缩放。
+
+14
+00:00:29,620 --> 00:00:30,950
+this is very similar to the
+这是很相似的,
+
+15
+00:00:31,650 --> 00:00:33,250
+mean normalization and feature scaling
+在均值标准化过程与特征缩放的过程之间
+
+16
+00:00:34,080 --> 00:00:36,580
+process that we have for supervised learning.
+在我们已学习的监督学习中。
+
+17
+00:00:36,910 --> 00:00:38,240
+In fact it's exactly the
+实际上,确实是
+
+18
+00:00:38,390 --> 00:00:40,160
+same procedure except that we're
+相同过程,除了我们现在
+
+19
+00:00:40,310 --> 00:00:41,790
+doing it now to our unlabeled
+对未标记数据做的,
+
+20
+00:00:42,930 --> 00:00:43,670
+data, X1 through Xm.
+X1到Xm。
+
+21
+00:00:44,180 --> 00:00:45,530
+So for mean normalization we
+对于均值标准化,
+
+22
+00:00:45,720 --> 00:00:47,080
+first compute the mean of
+我们首先计算
+
+23
+00:00:47,390 --> 00:00:49,070
+each feature and then
+每个特征的均值,
+
+24
+00:00:49,340 --> 00:00:50,900
+we replace each feature, X,
+我们取代每个特征X用
+
+25
+00:00:51,150 --> 00:00:52,680
+with X minus its mean,
+X减去它的均值。
+
+26
+00:00:52,810 --> 00:00:54,120
+and so this makes each feature
+这将使每个特征
+
+27
+00:00:54,520 --> 00:00:57,450
+now have exactly zero mean
+现在有个恰当的零均值。
+
+28
+00:00:58,690 --> 00:01:00,690
+The different features have very different scales.
+不同的特征有非常不同的缩放。
+
+29
+00:01:01,540 --> 00:01:03,050
+So for example, if x1
+例如,x1
+
+30
+00:01:03,080 --> 00:01:04,060
+is the size of a house, and
+是房子的尺寸,
+
+31
+00:01:04,100 --> 00:01:05,390
+x2 is the number of bedrooms, to
+x2是卧室的数量,
+
+32
+00:01:05,580 --> 00:01:07,370
+use our earlier example, we
+对于我们先前的例子,
+
+33
+00:01:07,480 --> 00:01:08,680
+then also scale each feature
+我们也可以缩放每个特征
+
+34
+00:01:09,130 --> 00:01:10,540
+to have a comparable range of values.
+获取一个相对的价值范围。
+
+35
+00:01:10,980 --> 00:01:12,490
+And so, similar to what
+相似于
+
+36
+00:01:12,680 --> 00:01:13,860
+we had with supervised learning,
+我们之前的监督学习,
+
+37
+00:01:14,060 --> 00:01:16,200
+we would take x, i substitute
+我们提取xi
+
+38
+00:01:16,680 --> 00:01:17,620
+j, that's the j feature
+的第j个特征。
+
+39
+00:01:23,250 --> 00:01:25,530
+and so we would
+所以我们能够
+
+40
+00:01:25,890 --> 00:01:27,610
+subtract of the mean,
+减去均值。
+
+41
+00:01:28,370 --> 00:01:29,520
+now that's what we have on top, and then divide by sj.
+想在对于我们上面做的,除以sj.
+
+42
+00:01:29,610 --> 00:01:30,020
+Here, sj is some measure of the beta values of feature j. So, it could be the max minus
+在这里,sj是特征j的beta值的一些测量值。所以,它是最大减去
+
+43
+00:01:30,080 --> 00:01:31,310
+min value, or more commonly,
+最小值,或更普通的。
+
+44
+00:01:31,890 --> 00:01:33,540
+it is the standard deviation of
+sj是一个特征j的偏差标准。
+
+45
+00:01:33,640 --> 00:01:35,520
+feature j. Having done
+
+46
+00:01:36,230 --> 00:01:39,480
+this sort of data pre-processing, here's what the PCA algorithm does.
+做了数据预处理流程后,这是PCA算法做的。
+
+47
+00:01:40,620 --> 00:01:41,630
+We saw from the previous video
+我们能从先前的视频中看到
+
+48
+00:01:41,960 --> 00:01:43,050
+that what PCA does is, it
+PCA所做的,
+
+49
+00:01:43,170 --> 00:01:44,520
+tries to find a lower
+它尝试着找到一个
+
+50
+00:01:44,790 --> 00:01:46,080
+dimensional sub-space onto which to
+低维子空间,
+
+51
+00:01:46,170 --> 00:01:47,500
+project the data, so as
+进行数据处理,
+
+52
+00:01:47,650 --> 00:01:49,780
+to minimize the squared projection
+只为了最小化平方投影
+
+53
+00:01:50,540 --> 00:01:51,660
+errors, sum of the
+误差,
+
+54
+00:01:51,740 --> 00:01:53,080
+squared projection errors, as the
+最小化平方和投影误差,
+
+55
+00:01:53,420 --> 00:01:54,800
+square of the length of
+随着这些蓝线所表示的
+
+56
+00:01:54,870 --> 00:01:56,790
+those blue lines that and so
+的平方,
+
+57
+00:01:57,110 --> 00:01:58,510
+what we wanted to do specifically
+我们想做的是
+
+58
+00:01:59,210 --> 00:02:02,730
+is find a vector, u1, which
+找到一个向量,u1,
+
+59
+00:02:03,280 --> 00:02:04,750
+specifies that direction or
+指定这个方向或者
+
+60
+00:02:05,040 --> 00:02:06,630
+in the 2D case we want
+在2维中,我们
+
+61
+00:02:06,880 --> 00:02:08,760
+to find two vectors, u1 and
+想找到2个向量,u1和
+
+62
+00:02:10,640 --> 00:02:12,980
+u2, to define this surface
+u2,来定义这个表面,
+
+63
+00:02:13,590 --> 00:02:14,610
+onto which to project the data.
+用于投射数据。
+
+64
+00:02:16,620 --> 00:02:17,920
+So, just as a
+正如
+
+65
+00:02:18,040 --> 00:02:19,160
+quick reminder of what reducing
+一个快速的提醒对于减少
+
+66
+00:02:19,730 --> 00:02:20,820
+the dimension of the data means,
+数据均值的维数。
+
+67
+00:02:21,490 --> 00:02:22,430
+for this example on the
+对左边的例子,
+
+68
+00:02:22,470 --> 00:02:23,560
+left we were given
+我们给予
+
+69
+00:02:23,680 --> 00:02:26,010
+the examples xI, which are in r2.
+西,在r2中。
+
+70
+00:02:26,300 --> 00:02:28,390
+And what we
+我们
+
+71
+00:02:28,660 --> 00:02:29,500
+like to do is find
+想找到
+
+72
+00:02:29,970 --> 00:02:32,400
+a set of numbers zI in
+一个数据集合zi在r
+
+73
+00:02:33,000 --> 00:02:34,950
+r push to represent our data.
+中代表我们的数据。
+
+74
+00:02:36,000 --> 00:02:37,820
+So that's what from reduction from 2D to 1D means.
+所以我们的均值从2维降到1维。
+
+75
+00:02:39,020 --> 00:02:41,450
+So specifically by projecting
+所以特别提醒
+
+76
+00:02:42,710 --> 00:02:44,080
+data onto this red line there.
+把数据投射到红线上。
+
+77
+00:02:44,800 --> 00:02:46,320
+We need only one number to
+我们仅需一个数字
+
+78
+00:02:46,450 --> 00:02:48,340
+specify the position of the points on the line.
+来表示在线上的点的位置。
+
+79
+00:02:48,590 --> 00:02:49,380
+So i'm going to call that number
+所以我们叫这个数字为
+
+80
+00:02:50,700 --> 00:02:51,830
+z or z1.
+z或z1.
+
+81
+00:02:52,020 --> 00:02:54,850
+Z here [xx] real number, so that's like a one dimensional vector.
+Z是真实数字,以便像一个1维向量。
+
+82
+00:02:55,380 --> 00:02:56,650
+So z1 just refers to
+所以z1仅涉及
+
+83
+00:02:56,690 --> 00:02:58,080
+the first component of this,
+一个主成分,
+
+84
+00:02:58,280 --> 00:03:00,430
+you know, one by one matrix, or this one dimensional vector.
+它是一个1:1矩阵,或是一个一维向量。
+
+85
+00:03:01,670 --> 00:03:03,170
+And so we need only
+所以我们仅需要
+
+86
+00:03:03,490 --> 00:03:05,590
+one number to specify the position of a point.
+一个数字来指明一个点的位置。
+
+87
+00:03:06,330 --> 00:03:07,940
+So if this example
+所以如果这个例子
+
+88
+00:03:08,460 --> 00:03:09,510
+here was my example
+是我的一个例子X1,
+
+89
+00:03:10,610 --> 00:03:13,160
+X1, then maybe that gets mapped here.
+大概得到相应的映射。
+
+90
+00:03:13,900 --> 00:03:15,450
+And if this example was X2
+如果这个例子是x2
+
+91
+00:03:15,680 --> 00:03:17,250
+maybe that example gets mapped
+大概例子也得到映射。
+
+92
+00:03:17,530 --> 00:03:18,790
+And so this point
+所以对应的点
+
+93
+00:03:19,060 --> 00:03:20,400
+here will be Z1
+将是z1
+
+94
+00:03:20,840 --> 00:03:21,920
+and this point here will be
+这个点对应的将是
+
+95
+00:03:22,080 --> 00:03:24,240
+Z2, and similarly we
+z2,相似的
+
+96
+00:03:24,620 --> 00:03:26,410
+would have those other points
+我们将有其他的点对于这些,
+
+97
+00:03:26,840 --> 00:03:30,230
+for These, maybe X3,
+大概是x3,
+
+98
+00:03:30,510 --> 00:03:32,550
+X4, X5 get mapped to Z1, Z2, Z3.
+x4,x5,得到映射z1,z2,和z3.
+
+99
+00:03:34,360 --> 00:03:35,940
+So What PCA has
+所以PCA要做的是
+
+100
+00:03:36,050 --> 00:03:36,830
+to do is we need to
+我们需要
+
+101
+00:03:36,930 --> 00:03:38,920
+come up with a way to compute two things.
+想出一个方法计算两个事情。
+
+102
+00:03:39,310 --> 00:03:40,710
+One is to compute these vectors,
+一个是计算这些向量
+
+103
+00:03:41,830 --> 00:03:44,970
+u1, and in this case u1 and u2.
+例如u1,和在哪个事件中的u1和u2.
+
+104
+00:03:45,230 --> 00:03:46,880
+And the other is
+另一个是
+
+105
+00:03:47,130 --> 00:03:48,140
+how do we compute these numbers,
+如何来计算这些数字
+
+106
+00:03:49,360 --> 00:03:51,200
+Z. So on the
+Z.所以
+
+107
+00:03:51,430 --> 00:03:53,910
+example on the left we're reducing the data from 2D to 1D.
+在左边的例子汇总我们降维数据从2维到1维。
+
+108
+00:03:55,290 --> 00:03:56,100
+In the example on the right,
+在右边的例子中,
+
+109
+00:03:56,510 --> 00:03:58,100
+we would be reducing data from
+我们能降维数据从
+
+110
+00:03:58,450 --> 00:04:00,600
+3 dimensional as in
+三维
+
+111
+00:04:00,710 --> 00:04:04,840
+r3, to zi, which is now two dimensional.
+到zi,它是一个二维数据。
+
+112
+00:04:05,390 --> 00:04:07,790
+So these z vectors would now be two dimensional.
+所以这些z向量现在将是二维的。
+
+113
+00:04:08,450 --> 00:04:09,590
+So it would be z1
+所以它将是z1
+
+114
+00:04:10,150 --> 00:04:11,410
+z2 like so, and so
+z2这样的,所以
+
+115
+00:04:11,640 --> 00:04:12,940
+we need to give away to compute
+我们需要有方法计算
+
+116
+00:04:13,670 --> 00:04:15,410
+these new representations, the z1
+这些新的模型代表,
+
+117
+00:04:15,570 --> 00:04:17,350
+and z2 of the data as well.
+这些z1和z2的数据也一样。
+
+118
+00:04:18,280 --> 00:04:20,350
+So how do you compute all of these quantities?
+所以你如何来计算所有的参数那?
+
+119
+00:04:20,520 --> 00:04:21,520
+It turns out that a mathematical
+已经证明一个数学的
+
+120
+00:04:22,490 --> 00:04:23,660
+derivation, also the mathematical
+推导,也就是数学的
+
+121
+00:04:24,300 --> 00:04:26,020
+proof, for what is
+证明对于
+
+122
+00:04:26,090 --> 00:04:27,970
+the right value U1, U2, Z1,
+U1,U2,Z1,Z2等是正确的。
+
+123
+00:04:28,290 --> 00:04:29,480
+Z2, and so on.
+
+124
+00:04:29,690 --> 00:04:31,230
+That mathematical proof is very
+这个数学证明是非常
+
+125
+00:04:31,480 --> 00:04:32,890
+complicated and beyond the
+复杂的,超出了
+
+126
+00:04:32,950 --> 00:04:34,620
+scope of the course.
+课程的范围。
+
+127
+00:04:35,280 --> 00:04:37,290
+But once you've done [xx] it
+但是你已经做的,
+
+128
+00:04:37,590 --> 00:04:38,590
+turns out that the procedure
+证明这个过程
+
+129
+00:04:39,350 --> 00:04:40,570
+to actually find the value
+实际能够找到
+
+130
+00:04:41,200 --> 00:04:42,210
+of u1 that you want
+你想要的u1值,
+
+131
+00:04:42,950 --> 00:04:43,950
+is not that hard, even though
+这是不困难的,尽管
+
+132
+00:04:44,180 --> 00:04:45,640
+so that the mathematical proof that
+这个数学证明
+
+133
+00:04:45,840 --> 00:04:46,940
+this value is the correct
+这个值是正确的。
+
+134
+00:04:47,260 --> 00:04:48,450
+value is someone more
+这个值有些人想更多
+
+135
+00:04:48,700 --> 00:04:49,960
+involved and more than i want to get into.
+的涉及,超出想得到的i.
+
+136
+00:04:50,880 --> 00:04:52,070
+But let me just describe the
+但让我描述
+
+137
+00:04:52,480 --> 00:04:53,830
+specific procedure that you
+具体的过程,
+
+138
+00:04:53,960 --> 00:04:55,250
+have to implement in order
+你想执行为了
+
+139
+00:04:55,440 --> 00:04:56,450
+to compute all of these
+计算所有的
+
+140
+00:04:56,570 --> 00:04:57,840
+things, the vectors, u1, u2,
+参数,向量u1,u2
+
+141
+00:04:58,910 --> 00:05:00,980
+the vector z. Here's the procedure.
+向量z,这是一个过程。
+
+142
+00:05:02,070 --> 00:05:02,970
+Let's say we want to reduce
+我们想把
+
+143
+00:05:03,170 --> 00:05:04,220
+the data to n dimensions
+n维的数据降到
+
+144
+00:05:04,840 --> 00:05:05,760
+to k dimension What we're
+k维,
+
+145
+00:05:06,760 --> 00:05:07,640
+going to do is first
+我们首先要做的
+
+146
+00:05:07,900 --> 00:05:09,400
+compute something called the
+是计算
+
+147
+00:05:09,830 --> 00:05:11,140
+covariance matrix, and the covariance
+协方差,这个协方差
+
+148
+00:05:11,700 --> 00:05:13,620
+matrix is commonly denoted by
+通常用
+
+149
+00:05:13,820 --> 00:05:15,050
+this Greek alphabet which is
+希腊字母表中
+
+150
+00:05:15,190 --> 00:05:16,880
+the capital Greek alphabet sigma.
+的 sigma表示。
+
+151
+00:05:18,000 --> 00:05:19,210
+It's a bit unfortunate that the
+不幸的是
+
+152
+00:05:19,310 --> 00:05:21,080
+Greek alphabet sigma looks exactly
+这个希腊字母sigma看起来
+
+153
+00:05:21,760 --> 00:05:22,710
+like the summation symbols.
+像一个求和标记。
+
+154
+00:05:23,210 --> 00:05:24,620
+So this is the
+所以这里的
+
+155
+00:05:24,700 --> 00:05:26,220
+Greek alphabet Sigma is used
+希腊字母Sigma被用来
+
+156
+00:05:26,420 --> 00:05:29,540
+to denote a matrix and this here is a summation symbol.
+标记一个矩阵,在那里是一个求和标记。
+
+157
+00:05:30,510 --> 00:05:32,330
+So hopefully in these slides
+所以希望在幻灯片
+
+158
+00:05:32,680 --> 00:05:34,190
+there won't be ambiguity about which
+不要有模糊对于
+
+159
+00:05:34,410 --> 00:05:36,340
+is Sigma Matrix, the
+Sigma 矩阵,
+
+160
+00:05:36,520 --> 00:05:37,850
+matrix, which is a
+另一个是
+
+161
+00:05:38,090 --> 00:05:39,620
+summation symbol, and hopefully
+求和标记,希望
+
+162
+00:05:39,940 --> 00:05:41,460
+it will be clear from context when
+它能被区分从内容上,
+
+163
+00:05:41,820 --> 00:05:43,510
+I'm using each one.
+当我们使用每一个的时候。
+
+164
+00:05:43,740 --> 00:05:44,790
+How do you compute this matrix
+如何计算这个矩阵
+
+165
+00:05:45,530 --> 00:05:46,550
+let's say we want to
+比如说我们想
+
+166
+00:05:47,135 --> 00:05:47,640
+store it in an octave variable
+存储它到一个octave变量中,
+
+167
+00:05:48,120 --> 00:05:49,970
+variable called sigma.
+变量叫sigma。
+
+168
+00:05:50,840 --> 00:05:51,890
+What we need to do is
+我们需要做什么
+
+169
+00:05:52,030 --> 00:05:53,660
+compute something called the
+来计算被叫做
+
+170
+00:05:54,130 --> 00:05:56,190
+eigenvectors of the matrix sigma.
+矩阵sigma的特征向量。
+
+171
+00:05:57,560 --> 00:05:58,450
+And an octave, the way you
+对于一个octave,你的方法
+
+172
+00:05:58,590 --> 00:05:59,820
+do that is you use this
+是用这个
+
+173
+00:05:59,970 --> 00:06:01,020
+command, u s v equals
+命令
+
+174
+00:06:01,350 --> 00:06:02,600
+s v d of sigma.
+或公式来计算。
+
+175
+00:06:03,650 --> 00:06:06,090
+SVD, by the way, stands for singular value decomposition.
+SVD代表奇异值分解,
+
+176
+00:06:08,520 --> 00:06:10,590
+This is a Much
+这是一个更
+
+177
+00:06:10,790 --> 00:06:12,660
+more advanced single value composition.
+加高级的单值组成。
+
+178
+00:06:14,450 --> 00:06:15,560
+It is much more advanced linear
+它也是更加高级的线性
+
+179
+00:06:15,800 --> 00:06:16,950
+algebra than you actually need
+代数比你实际需要
+
+180
+00:06:16,950 --> 00:06:18,770
+to know but now It turns out
+知道的但结果是
+
+181
+00:06:18,950 --> 00:06:20,250
+that when sigma is equal
+sigma等同
+
+182
+00:06:20,480 --> 00:06:21,800
+to matrix there is
+矩阵,有几个
+
+183
+00:06:21,880 --> 00:06:23,420
+a few ways to compute these are
+方法可以在高维向量中
+
+184
+00:06:23,610 --> 00:06:25,810
+high in vectors and If you
+进行计算,如果你
+
+185
+00:06:25,960 --> 00:06:27,350
+are an expert in linear algebra
+是一个专家在线性代数上,
+
+186
+00:06:27,700 --> 00:06:28,610
+and if you've heard of high in
+如果你已经听过高维的
+
+187
+00:06:28,860 --> 00:06:30,170
+vectors before you may know
+向量在你知道
+
+188
+00:06:30,350 --> 00:06:31,660
+that there is another octet function
+有另一个叫I的octet函数,
+
+189
+00:06:31,990 --> 00:06:33,420
+called I, which can
+它能够
+
+190
+00:06:33,520 --> 00:06:35,030
+also be used to compute the same thing.
+被用来计算相同的事。
+
+191
+00:06:35,950 --> 00:06:36,980
+and It turns out that the
+已经证明这个
+
+192
+00:06:37,370 --> 00:06:39,180
+SVD function and the
+SVD函数和
+
+193
+00:06:39,290 --> 00:06:40,310
+I function it will give
+这个I函数将给
+
+194
+00:06:40,370 --> 00:06:42,170
+you the same vectors, although SVD
+你相同的向量,但SVD
+
+195
+00:06:42,840 --> 00:06:44,210
+is a little more numerically stable.
+将更加具有数据稳定性。
+
+196
+00:06:44,540 --> 00:06:45,890
+So I tend to use SVD, although
+所以我趋向于用SVD,尽管
+
+197
+00:06:46,140 --> 00:06:47,040
+I have a few friends that use
+我有几个朋友用
+
+198
+00:06:47,280 --> 00:06:48,720
+the I function to do
+I函数也能很好的做这些,
+
+199
+00:06:48,920 --> 00:06:50,050
+this as well but when you
+很好的做这些,当你
+
+200
+00:06:50,130 --> 00:06:51,270
+apply this to a covariance matrix
+用I函数求一个协方差矩阵
+
+201
+00:06:51,750 --> 00:06:52,960
+sigma it gives you the same thing.
+sigma,给你的结果是相同。
+
+202
+00:06:53,870 --> 00:06:55,070
+This is because the covariance matrix
+这是因为协方差矩阵
+
+203
+00:06:55,500 --> 00:06:57,250
+always satisfies a mathematical
+总是满足一个数学
+
+204
+00:06:57,940 --> 00:07:00,560
+Property called symmetric positive definite
+属性叫正定矩阵。
+
+205
+00:07:01,360 --> 00:07:02,140
+You really don't need to know
+你不需要知道
+
+206
+00:07:02,280 --> 00:07:03,890
+what that means, but the SVD
+他的含义,但这个SVD
+
+207
+00:07:05,340 --> 00:07:07,090
+and I-functions are different functions but
+和I函数是不同的函数
+208
+00:07:07,400 --> 00:07:08,670
+when they are applied to a
+当他们被运用到
+
+208
+00:07:08,780 --> 00:07:10,410
+covariance matrix which can
+协方差矩阵时这被
+
+209
+00:07:10,550 --> 00:07:12,080
+be proved to always satisfy this
+证明总是满足
+
+210
+00:07:13,190 --> 00:07:15,220
+mathematical property; they'll always give you the same thing.
+这个数学属性;这两个函数给予相同的结果。
+
+211
+00:07:16,580 --> 00:07:19,180
+Okay, that was probably much more linear algebra than you needed to know.
+好的,这大概将是更高深的线性代数比我们知道的。
+
+212
+00:07:19,260 --> 00:07:22,380
+In case none of that made sense, don't worry about it.
+如果没有明白的话,不用担心。
+
+213
+00:07:22,560 --> 00:07:23,490
+All you need to know is that
+你需要知道的
+
+214
+00:07:24,130 --> 00:07:27,840
+this system command you
+这个系统命令,你能够
+
+215
+00:07:28,140 --> 00:07:29,690
+should implement in Octave.
+用它在Octave。
+
+216
+00:07:30,080 --> 00:07:30,550
+And if you're implementing this in a
+你也可以用它在
+
+217
+00:07:30,710 --> 00:07:32,120
+different language than Octave or MATLAB,
+不同语言如Octave或MATLAB。
+
+218
+00:07:32,710 --> 00:07:33,790
+what you should do is find
+我们应该做的是找到
+
+219
+00:07:34,190 --> 00:07:35,860
+the numerical linear algebra library
+数值线性代数库
+
+220
+00:07:36,730 --> 00:07:37,960
+that can compute the SVD
+能够计算 SVD
+
+221
+00:07:38,230 --> 00:07:40,460
+or singular value decomposition, and
+或奇异值分解,
+
+222
+00:07:40,970 --> 00:07:42,680
+there are many such libraries for
+有许多这样的库
+
+223
+00:07:43,570 --> 00:07:45,060
+probably all of the major programming languages.
+对大多数的主要编程语言。
+
+224
+00:07:45,300 --> 00:07:46,920
+People can use that to
+人们能够用这些来
+
+225
+00:07:47,050 --> 00:07:48,920
+compute the matrices u,
+计算矩阵u,
+
+226
+00:07:49,200 --> 00:07:52,770
+s, and d of the covariance matrix sigma.
+s,和协方差矩阵sigma d.
+
+227
+00:07:53,340 --> 00:07:54,490
+So just to fill
+仅为了满足
+
+228
+00:07:54,620 --> 00:07:56,090
+in some more details, this covariance
+一些细节,这个协方差
+
+229
+00:07:56,660 --> 00:07:58,080
+matrix sigma will be
+矩阵sigma将
+
+230
+00:07:58,250 --> 00:08:01,480
+an n by n matrix.
+是 N—N的矩阵。
+
+231
+00:08:02,250 --> 00:08:03,240
+And one way to see that
+一个明白的方法是
+
+232
+00:08:03,510 --> 00:08:04,220
+is if you look at the definition
+你能够看到定义
+
+233
+00:08:05,250 --> 00:08:06,280
+this is an n by 1
+一个N-1的
+
+234
+00:08:06,660 --> 00:08:08,680
+vector and this
+向量,
+
+235
+00:08:08,920 --> 00:08:10,830
+here I transpose is
+另一个是
+
+236
+00:08:11,010 --> 00:08:13,260
+1 by N so the
+1-N的,所以
+
+237
+00:08:13,380 --> 00:08:14,480
+product of these two things
+这两个的结果
+
+238
+00:08:15,150 --> 00:08:15,800
+is going to be an N
+是一个N-N
+
+239
+00:08:16,570 --> 00:08:17,530
+by N matrix.
+的矩阵。
+
+240
+00:08:19,100 --> 00:08:22,130
+1xN transfers, 1xN, so
+1-N 转移,1-N,所以
+
+241
+00:08:22,280 --> 00:08:22,840
+there's an NxN matrix and when
+一个N-N的矩阵,当
+
+242
+00:08:22,910 --> 00:08:23,710
+we add up all of these you still
+我们合计这些后,你还是
+
+243
+00:08:23,840 --> 00:08:26,140
+have an NxN matrix.
+有一个N-N的矩阵。
+
+244
+00:08:27,600 --> 00:08:29,920
+And what the SVD outputs three
+SVD输出的是三
+
+245
+00:08:30,500 --> 00:08:32,580
+matrices, u, s, and
+个矩阵,u,s,和
+
+246
+00:08:32,830 --> 00:08:35,070
+v. The thing you really need out of the SVD is the u matrix.
+v. 你真的需要的SVD输出是u矩阵。
+
+247
+00:08:36,230 --> 00:08:40,160
+The u matrix will also be a NxN matrix.
+这个 u矩阵也将是一个N-N的矩阵。
+
+248
+00:08:41,510 --> 00:08:42,280
+And if we look at the
+我们看到的
+
+249
+00:08:42,350 --> 00:08:43,260
+columns of the U
+u矩阵的列
+
+250
+00:08:43,480 --> 00:08:45,330
+matrix it turns
+被证明:
+
+251
+00:08:45,630 --> 00:08:47,210
+out that the columns
+u矩阵
+
+252
+00:08:48,570 --> 00:08:50,180
+of the U matrix will be
+的列将是
+
+253
+00:08:50,350 --> 00:08:53,860
+exactly those vectors, u1,
+这些向量, u1,
+
+254
+00:08:54,260 --> 00:08:56,290
+u2 and so on.
+u2等。
+
+255
+00:08:57,640 --> 00:08:59,330
+So u, will be matrix.
+所以u ,将是矩阵。
+
+256
+00:09:00,910 --> 00:09:01,830
+And if we want to reduce
+如果我们想减
+
+257
+00:09:02,230 --> 00:09:03,200
+the data from n dimensions
+少数据的维数从n维
+
+258
+00:09:03,800 --> 00:09:05,380
+down to k dimensions, then what
+到k维。
+
+259
+00:09:05,490 --> 00:09:07,950
+we need to do is take the first k vectors.
+我们需要做的是提取前k个向量。
+
+260
+00:09:09,800 --> 00:09:12,670
+that gives us u1 up
+这将给我们u1
+
+261
+00:09:12,860 --> 00:09:14,700
+to uK which gives
+到uK的向量,给了我们
+
+262
+00:09:14,780 --> 00:09:16,930
+us the K direction onto which
+K个方向,我们想把数据
+
+263
+00:09:17,200 --> 00:09:19,770
+we want to project the data.
+投射的方向。
+
+264
+00:09:20,090 --> 00:09:21,640
+the rest of the procedure from
+这个剩余的过程是
+
+265
+00:09:22,410 --> 00:09:24,170
+this SVD numerical linear
+这个SVD数值线性
+
+266
+00:09:24,490 --> 00:09:25,580
+algebra routine we get this
+代数的程序,我们得到
+
+267
+00:09:25,840 --> 00:09:27,140
+matrix u. We'll call
+矩阵u. 我们称
+
+268
+00:09:27,530 --> 00:09:29,080
+these columns u1-uN.
+这些列为u1-uN。
+
+269
+00:09:30,580 --> 00:09:31,620
+So, just to wrap up the
+所以,为了结束这个
+
+270
+00:09:31,830 --> 00:09:32,520
+description of the rest of
+剩余过程
+
+271
+00:09:32,540 --> 00:09:34,550
+the procedure, from the SVD
+的描述,来自于SVD
+
+272
+00:09:35,320 --> 00:09:36,940
+numerical linear algebra routine we
+数值线性代数程序,我们
+
+273
+00:09:37,240 --> 00:09:38,650
+get these matrices u, s,
+得到这些矩阵u,s,
+
+274
+00:09:38,830 --> 00:09:41,320
+and d. we're going
+和d.我们将
+
+275
+00:09:41,900 --> 00:09:44,460
+to use the first K columns
+用这个矩阵的前
+
+276
+00:09:45,050 --> 00:09:46,310
+of this matrix to get u1-uK.
+K个列获取u1-uK的向量。
+
+277
+00:09:48,710 --> 00:09:49,460
+Now the other thing we need
+现在我们需要做的其它事情
+
+278
+00:09:49,700 --> 00:09:53,730
+to is take my original
+是获取我们的原始
+
+279
+00:09:54,110 --> 00:09:55,430
+data set, X which is
+数据集,X这是一个
+
+280
+00:09:55,630 --> 00:09:57,080
+an RN And find a
+RN值域。找到
+
+281
+00:09:57,250 --> 00:09:59,210
+lower dimensional representation Z, which
+一个低维的代表Z,他是
+
+282
+00:09:59,420 --> 00:10:01,280
+is a R K for this data.
+R K 对于这些数据。
+
+283
+00:10:01,570 --> 00:10:02,800
+So the way we're
+所以,我们
+
+284
+00:10:02,900 --> 00:10:03,930
+going to do that is
+将要做的是
+
+285
+00:10:04,180 --> 00:10:06,690
+take the first K Columns of the U matrix.
+获取u矩阵的前K列。
+
+286
+00:10:08,330 --> 00:10:09,790
+Construct this matrix.
+构建这个矩阵。
+
+287
+00:10:11,060 --> 00:10:13,040
+Stack up U1, U2 and
+累计U1,U2等
+
+288
+00:10:14,170 --> 00:10:16,690
+so on up to U K in columns.
+直到UK在列上。
+
+289
+00:10:17,350 --> 00:10:19,120
+It's really basically taking, you know,
+你知道我们本质上是
+
+290
+00:10:19,280 --> 00:10:20,350
+this part of the matrix, the
+获取矩阵的这部分,
+
+291
+00:10:20,530 --> 00:10:22,260
+first K columns of this matrix.
+这个矩阵的前K列。
+
+292
+00:10:23,420 --> 00:10:25,370
+And so this is
+所以这将获取
+
+293
+00:10:25,600 --> 00:10:26,920
+going to be an N
+一个N-K
+
+294
+00:10:27,200 --> 00:10:28,580
+by K matrix.
+的矩阵。
+
+295
+00:10:29,500 --> 00:10:30,690
+I'm going to give this matrix a name.
+我将给予这个矩阵一个名字,
+
+296
+00:10:30,880 --> 00:10:32,200
+I'm going to call this matrix
+我将叫这个矩阵为
+
+297
+00:10:32,930 --> 00:10:35,760
+U, subscript "reduce," sort
+U,下标"reduce",这是一类,
+
+298
+00:10:36,090 --> 00:10:38,620
+of a reduced version of the U matrix maybe.
+U矩阵被降维的版本。
+
+299
+00:10:39,140 --> 00:10:41,250
+I'm going to use it to reduce the dimension of my data.
+我将用它对我的数据降维。
+
+300
+00:10:43,040 --> 00:10:43,950
+And the way I'm going to compute Z is going
+我将计算Z,将让
+
+301
+00:10:44,250 --> 00:10:45,960
+to let Z be equal to this
+Z等于这个
+
+302
+00:10:46,220 --> 00:10:49,570
+U reduce matrix transpose times
+U的降维矩阵转置相乘
+
+303
+00:10:50,010 --> 00:10:52,030
+X. Or alternatively, you know,
+X.或者,你知道
+
+304
+00:10:52,510 --> 00:10:53,860
+to write down what this transpose means.
+写下这个转置的意思。
+
+305
+00:10:54,630 --> 00:10:55,910
+When I take this transpose of
+当我获取这个
+
+306
+00:10:56,010 --> 00:10:57,920
+this U matrix, what I'm
+u矩阵的转置时,我将
+
+307
+00:10:58,010 --> 00:11:00,680
+going to end up with is these vectors now in rows.
+在行上结束这些向量。
+
+308
+00:11:00,950 --> 00:11:04,540
+I have U1 transpose down to UK transpose.
+我有U1转置到UK转置。
+
+309
+00:11:07,060 --> 00:11:08,860
+Then take that times X,
+然后乘以X,
+
+310
+00:11:09,700 --> 00:11:10,740
+and that's how I get
+这将让我
+
+311
+00:11:10,920 --> 00:11:12,670
+my vector Z. Just to
+得到向量Z.为了
+
+312
+00:11:12,740 --> 00:11:14,280
+make sure that these dimensions make sense,
+确定这些维数有意义,
+
+313
+00:11:15,370 --> 00:11:16,380
+this matrix here is going
+这个矩阵将是
+
+314
+00:11:16,560 --> 00:11:17,450
+to be k by n
+K-N的,
+
+315
+00:11:18,270 --> 00:11:19,350
+and x here is going
+x是
+
+316
+00:11:19,420 --> 00:11:20,530
+to be n by 1
+N-1,
+
+317
+00:11:20,750 --> 00:11:21,810
+and so the product
+所以这个结果
+
+318
+00:11:22,320 --> 00:11:24,330
+here will be k by 1.
+将是K-1的。
+
+319
+00:11:24,820 --> 00:11:27,920
+And so z is k
+所以z是k
+
+320
+00:11:28,790 --> 00:11:29,810
+dimensional, is a k
+维的,是一个k
+
+321
+00:11:30,010 --> 00:11:31,230
+dimensional vector, which is exactly
+维向量,这是
+
+322
+00:11:32,000 --> 00:11:33,180
+what we wanted.
+我们想要的。
+
+323
+00:11:33,550 --> 00:11:34,640
+And of course these x's here right, can
+当然,这个x是正确的,
+
+324
+00:11:34,890 --> 00:11:36,010
+be Examples in our
+将是例子在我们的
+
+325
+00:11:36,100 --> 00:11:36,970
+training set can be examples
+训练集,也是例子
+
+326
+00:11:37,540 --> 00:11:38,750
+in our cross validation set, can be
+在我们的交叉验证集中,
+
+327
+00:11:38,980 --> 00:11:40,330
+examples in our test set, and
+也是例子在我们的测试集中,
+
+328
+00:11:40,500 --> 00:11:41,590
+for example if you know,
+例如,如果你知道,
+
+329
+00:11:41,930 --> 00:11:43,830
+I wanted to take training example i,
+我想得到训练集i,
+
+330
+00:11:44,260 --> 00:11:45,910
+I can write this as xi
+我能够写这些作为xi,
+
+331
+00:11:47,270 --> 00:11:48,430
+XI and that's what will
+XI 和,将给我们
+
+332
+00:11:48,510 --> 00:11:50,080
+give me ZI over there.
+ZI是什么。
+
+333
+00:11:50,940 --> 00:11:53,140
+So, to summarize, here's the
+所以,总而言之,这是
+
+334
+00:11:53,460 --> 00:11:54,820
+PCA algorithm on one slide.
+幻灯片上的PCA算法。
+
+335
+00:11:56,250 --> 00:11:58,200
+After mean normalization, to ensure
+进行均值归一化后,为确保
+
+336
+00:11:58,420 --> 00:11:59,230
+that every feature is zero mean
+每一个特征都是零均值的,
+
+337
+00:11:59,610 --> 00:12:01,420
+and optional feature scaling which You
+任选特征缩放,你
+
+338
+00:12:02,280 --> 00:12:03,780
+really should do feature scaling if
+能够做特征的缩放,如果
+
+339
+00:12:03,890 --> 00:12:05,820
+your features take on very different ranges of values.
+你的特征能够呈现不同范围的值。
+
+340
+00:12:06,620 --> 00:12:08,610
+After this pre-processing we compute
+预处理完成之后,我们计算
+
+341
+00:12:09,130 --> 00:12:12,010
+the carrier matrix Sigma like
+这个载体矩阵Sigma像
+
+342
+00:12:12,240 --> 00:12:14,070
+so by the
+这个方法,
+
+343
+00:12:14,210 --> 00:12:16,340
+way if your data is
+如果你的数据
+
+344
+00:12:16,610 --> 00:12:17,780
+given as a matrix
+是被给予作为一个矩阵,
+
+345
+00:12:18,030 --> 00:12:18,960
+like hits if you have your
+你有的
+
+346
+00:12:19,230 --> 00:12:22,580
+data Given in rows like this.
+数据被给与在行中像这样。
+
+347
+00:12:22,780 --> 00:12:24,370
+If you have a matrix X
+如果你有一个矩阵X,
+
+348
+00:12:24,960 --> 00:12:26,190
+which is your time trading sets
+这是你的时间交易集
+
+349
+00:12:27,030 --> 00:12:28,830
+written in rows where x1
+被写在行上是x1转置
+
+350
+00:12:29,210 --> 00:12:30,400
+transpose down to xm transpose,
+到xm的转置。
+
+351
+00:12:31,530 --> 00:12:32,700
+this covariance matrix sigma actually has
+这个协方差矩阵sigma 实际上有
+
+352
+00:12:33,020 --> 00:12:36,040
+a nice vectorizing implementation.
+一个非常好的向量化实现。
+
+353
+00:12:37,390 --> 00:12:38,980
+You can implement in octave,
+你能够执行在octave。
+
+354
+00:12:39,440 --> 00:12:41,130
+you can even run sigma equals 1
+你能够甚至运行sigma等于
+
+355
+00:12:41,670 --> 00:12:45,250
+over m, times x,
+1/m乘以x的转置,
+
+356
+00:12:45,550 --> 00:12:46,440
+which is this matrix up here,
+这是这个矩阵放在哪。
+
+357
+00:12:47,250 --> 00:12:50,770
+transpose times x and
+乘以x,
+
+358
+00:12:50,980 --> 00:12:53,320
+this simple expression, that's
+这个简单的表达,这个
+
+359
+00:12:53,570 --> 00:12:55,070
+the vectorize implementation of how
+向量化的实现,是如何计算
+
+360
+00:12:55,220 --> 00:12:56,910
+to compute the matrix sigma.
+矩阵sigma。
+
+361
+00:12:58,020 --> 00:12:59,020
+I'm not going to prove that today.
+今天,我将不证明它。
+
+362
+00:12:59,160 --> 00:13:00,600
+This is the correct vectorization whether you
+正如你所知道的,这是一个正确的向量化,
+
+363
+00:13:00,740 --> 00:13:02,460
+want, you can either numerically test
+你或者单独的数值测试
+
+364
+00:13:02,870 --> 00:13:03,900
+this on yourself by trying out an
+它通过尝试octave。
+
+365
+00:13:03,980 --> 00:13:05,100
+octave and making sure that
+或者确信
+
+366
+00:13:05,840 --> 00:13:06,890
+both this and this implementations
+在它和它的实施中
+
+367
+00:13:06,920 --> 00:13:10,050
+give the same answers or you Can try to prove it yourself mathematically.
+有相同的答案,或者你能够通过数学证明他。
+
+368
+00:13:11,250 --> 00:13:12,330
+Either way but this is the
+无论怎么样,这是
+
+369
+00:13:12,430 --> 00:13:14,580
+correct vectorizing implementation, without compusing next
+正确的向量化引用,不需要要质疑。
+
+370
+00:13:16,480 --> 00:13:17,570
+we can apply the SVD
+我们能够运用SVD
+
+371
+00:13:17,920 --> 00:13:19,050
+routine to get u, s,
+程序得到u,s,
+
+372
+00:13:19,250 --> 00:13:20,840
+and d. And then we
+和d.我们
+
+373
+00:13:21,100 --> 00:13:22,720
+grab the first k
+抓取
+
+374
+00:13:23,050 --> 00:13:24,450
+columns of the u
+矩阵u的前k列,
+
+375
+00:13:24,660 --> 00:13:26,550
+matrix you reduce and
+你所要降维到的,
+
+376
+00:13:26,650 --> 00:13:28,540
+finally this defines how
+最终这个定义
+
+377
+00:13:28,740 --> 00:13:29,980
+we go from a feature
+我们如何得到从一个特征
+
+378
+00:13:30,290 --> 00:13:31,600
+vector x to this
+向量x到这
+
+379
+00:13:31,850 --> 00:13:34,340
+reduce dimension representation z. And
+个降维的代表z.
+
+380
+00:13:34,540 --> 00:13:35,760
+similar to k Means
+相似的k的意思,
+
+381
+00:13:36,090 --> 00:13:37,860
+if you're apply PCA, they way
+如果你运用PCA,
+
+382
+00:13:38,030 --> 00:13:40,300
+you'd apply this is with vectors X and RN.
+你也运用它到向量X和RN。
+
+383
+00:13:41,100 --> 00:13:43,990
+So, this is not done with X-0 1.
+所以,这里xo不能为1.
+
+384
+00:13:44,200 --> 00:13:46,080
+So that was
+这就是
+
+385
+00:13:46,990 --> 00:13:48,680
+the PCA algorithm.
+PCA算法。
+
+386
+00:13:50,120 --> 00:13:51,380
+One thing I didn't do is
+有件事情是我不能
+
+387
+00:13:51,590 --> 00:13:53,190
+give a mathematical proof that
+给予一个数学的证明,
+
+388
+00:13:53,520 --> 00:13:54,600
+this There it actually give
+这里它实际上
+
+389
+00:13:54,970 --> 00:13:56,560
+the projection of the data onto
+给予数据的投射
+
+390
+00:13:57,230 --> 00:13:58,730
+the K dimensional subspace
+到k维子空间
+
+391
+00:13:58,870 --> 00:14:00,620
+ that actually
+这实际
+
+392
+00:14:02,170 --> 00:14:04,800
+minimizes the square projection error Proof
+是最小化投射平方的误差证明
+
+393
+00:14:05,110 --> 00:14:07,170
+of that is beyond the scope of this course.
+超出了我们课程的学习范围。
+
+394
+00:14:07,700 --> 00:14:09,110
+Fortunately the PCA algorithm
+幸运的是这个PCA算法
+
+395
+00:14:09,470 --> 00:14:10,940
+can be implemented in not
+将不会被执行
+
+396
+00:14:11,320 --> 00:14:12,510
+too many lines of code.
+用太多的代码。
+
+397
+00:14:13,190 --> 00:14:14,510
+and if you implement this in
+如果你执行它在
+
+398
+00:14:14,640 --> 00:14:16,120
+octave or algorithm, you
+太多或者算法中,你
+
+399
+00:14:16,520 --> 00:14:17,590
+actually get a very effective
+实际能够得到一个非常有效
+
+400
+00:14:18,110 --> 00:14:19,710
+dimensionality reduction algorithm.
+的降维算法。
+
+401
+00:14:22,430 --> 00:14:23,850
+So, that was the PCA algorithm.
+所以,这就是PCA算法。
+
+402
+00:14:25,010 --> 00:14:26,290
+One thing I didn't do was
+一个事情是我不能
+
+403
+00:14:26,840 --> 00:14:28,420
+give a mathematical proof that
+给予一个数学上的证明,
+
+404
+00:14:29,170 --> 00:14:30,360
+the U1 and U2 and so
+U1和U2等,
+
+405
+00:14:30,720 --> 00:14:31,630
+on and the Z and so
+和Z等。
+
+406
+00:14:31,770 --> 00:14:32,830
+on you get out of this
+你得出的这个过程
+
+407
+00:14:32,980 --> 00:14:34,330
+procedure is really the
+是真实
+
+408
+00:14:34,680 --> 00:14:35,870
+choices that would minimize
+的选着是最小化
+
+409
+00:14:36,500 --> 00:14:37,800
+these squared projection error.
+平方投影误差。
+
+410
+00:14:38,140 --> 00:14:39,350
+Right, remember we said What
+记住我说的,
+
+411
+00:14:39,610 --> 00:14:40,660
+PCA tries to do is try
+PCA所做的是尝试找到
+
+412
+00:14:40,960 --> 00:14:42,170
+to find a surface or line
+一个面或线
+
+413
+00:14:42,570 --> 00:14:43,690
+onto which to project the data
+把数据投影到这个面或线上,
+
+414
+00:14:44,280 --> 00:14:46,340
+so as to minimize to square projection error.
+以便于最小化平方投影误差。
+
+415
+00:14:46,700 --> 00:14:48,610
+So I didn't prove that this
+我将不去证明它,
+
+416
+00:14:49,140 --> 00:14:50,680
+that, and the mathematical proof
+这个数学的证明
+
+417
+00:14:50,970 --> 00:14:52,520
+of that is beyond the scope of this course.
+超出了我们课程的范围。
+
+418
+00:14:53,170 --> 00:14:55,550
+But fortunately the PCA algorithm can
+但幸运的是这个PCA算法能够
+
+419
+00:14:55,730 --> 00:14:58,890
+be implemented in not too many lines of octave code.
+被执行不需要太多行octave代码。
+
+420
+00:14:59,350 --> 00:15:00,740
+And if you implement this,
+如果你执行它 ,
+
+421
+00:15:01,430 --> 00:15:02,560
+this is actually what will
+这实际上是我们将做的
+
+422
+00:15:02,770 --> 00:15:03,730
+work, or this will work well,
+或这将做的很好,
+
+423
+00:15:04,710 --> 00:15:05,940
+and if you implement this algorithm,
+如果你执行这个算法。
+
+424
+00:15:06,500 --> 00:15:09,220
+you get a very effective dimensionality reduction algorithm.
+你将得到一个非常有效的降维算法。
+
+425
+00:15:09,780 --> 00:15:10,650
+That does do the right thing
+这个做的一个正确的事情是
+
+426
+00:15:11,050 --> 00:15:13,460
+of minimizing this square projection error.
+最小化平方投影误差。
+
diff --git a/srt/14 - 5 - Choosing the Number of Principal Components (11 min).srt b/srt/14 - 5 - Choosing the Number of Principal Components (11 min).srt
new file mode 100644
index 00000000..ceb311fd
--- /dev/null
+++ b/srt/14 - 5 - Choosing the Number of Principal Components (11 min).srt
@@ -0,0 +1,1564 @@
+1
+00:00:00,090 --> 00:00:01,560
+In the PCA algorithm we take
+在PCA算法中,我们将(字幕整理:中国海洋大学,黄海广,haiguang2000@qq.com)
+
+2
+00:00:01,980 --> 00:00:03,530
+N dimensional features and reduce
+N维特征减少
+
+3
+00:00:03,970 --> 00:00:06,260
+them to some K dimensional feature representation.
+为某k维特征。
+
+4
+00:00:07,620 --> 00:00:09,090
+This number K is a parameter
+这个数字K是PCA算法
+
+5
+00:00:09,820 --> 00:00:10,800
+of the PCA algorithm.
+的一个参数。
+
+6
+00:00:11,810 --> 00:00:13,240
+This number K is also called
+这个数K也被称为
+
+7
+00:00:13,620 --> 00:00:15,080
+the number of principle components
+主成分的数字
+
+8
+00:00:15,830 --> 00:00:17,480
+or the number of principle components that we've retained.
+或者,我们保留的主成分的数字。
+
+9
+00:00:18,530 --> 00:00:19,640
+And in this video I'd like
+在这个视频中,我想
+
+10
+00:00:19,810 --> 00:00:20,850
+to give you some guidelines,
+给你们一些指引,
+
+11
+00:00:21,730 --> 00:00:23,090
+tell you about how people
+告诉你们一般情况下人们
+
+12
+00:00:23,430 --> 00:00:24,490
+tend to think about how to
+如何考虑
+
+13
+00:00:24,610 --> 00:00:26,740
+choose this parameter K for PCA.
+选取这个参数K。
+
+14
+00:00:28,650 --> 00:00:29,670
+In order to choose k,
+为了选择K,
+
+15
+00:00:30,110 --> 00:00:30,990
+that is to choose the number
+即选择这个主成分的
+
+16
+00:00:31,360 --> 00:00:34,110
+of principal components, here are a couple of useful concepts.
+数字,这里有几个有用的概念。
+
+17
+00:00:36,430 --> 00:00:37,520
+What PCA tries to do
+PCA试图
+
+18
+00:00:37,760 --> 00:00:38,760
+is it tries to minimize
+去减少
+
+19
+00:00:40,070 --> 00:00:41,510
+the average squared projection error.
+投影误差平方的平均值。
+
+20
+00:00:42,030 --> 00:00:43,200
+So it tries to minimize
+因此,它试图减少
+
+21
+00:00:43,430 --> 00:00:45,480
+this quantity, which I'm writing down,
+这个数量,就是我正在写的这个,
+
+22
+00:00:46,410 --> 00:00:47,980
+which is the difference between the
+它是
+
+23
+00:00:48,150 --> 00:00:50,010
+original data X and the
+原始数据X和
+
+24
+00:00:50,690 --> 00:00:53,470
+projected version, X-approx-i, which
+投影,X约-i(上个视频中定义)
+
+25
+00:00:53,620 --> 00:00:54,930
+was defined last video, so
+所以
+
+26
+00:00:55,020 --> 00:00:55,900
+it tries to minimize the squared
+它试图尽量减少
+
+27
+00:00:56,160 --> 00:00:57,360
+distance between x and it's projection
+x和x在低维面的投影
+
+28
+00:00:58,330 --> 00:00:59,750
+onto that lower dimensional surface.
+距离的差的平方
+
+29
+00:01:01,220 --> 00:01:02,990
+So that's the average square projection error.
+所以这是投影误差平方的平均值。
+
+30
+00:01:03,990 --> 00:01:05,320
+Also let me define the
+同时我也将定义
+
+31
+00:01:05,440 --> 00:01:07,020
+total variation in the
+数据的总方差
+
+32
+00:01:07,100 --> 00:01:08,730
+data to be the
+为样本X的平方和
+
+33
+00:01:09,020 --> 00:01:11,730
+average length squared of
+的平均值
+
+34
+00:01:12,140 --> 00:01:14,130
+these examples Xi
+
+35
+00:01:14,450 --> 00:01:16,010
+so the total variation in the
+所以数据的
+
+36
+00:01:16,260 --> 00:01:17,930
+data is the average of
+总方差就是
+
+37
+00:01:18,070 --> 00:01:19,250
+my training sets of the
+我的训练集中
+
+38
+00:01:19,370 --> 00:01:21,640
+length of each of my training examples.
+每个训练样本的平均长度
+
+39
+00:01:22,190 --> 00:01:23,690
+And this one says, "On average, how
+这个说明了:“平均来说,
+
+40
+00:01:23,940 --> 00:01:24,850
+far are my training examples
+我的训练样本
+
+41
+00:01:25,690 --> 00:01:27,960
+from the vector, from just being all zeros?"
+距离全零向量的距离“
+
+42
+00:01:28,770 --> 00:01:30,460
+How far is, how far
+或者说,我的训练样本
+
+43
+00:01:30,820 --> 00:01:32,820
+on average are my training examples from the origin?
+距离原点有多远?
+
+44
+00:01:33,510 --> 00:01:34,450
+When we're trying to choose k, a
+当我们试图选择K,
+
+45
+00:01:35,870 --> 00:01:37,210
+pretty common rule of thumb
+一个常见的
+
+46
+00:01:37,400 --> 00:01:38,620
+for choosing k is to choose
+经验方法是
+
+47
+00:01:38,800 --> 00:01:40,290
+the smaller values so that
+选择较小的值,
+
+48
+00:01:40,980 --> 00:01:43,810
+the ratio between these is less than 0.01.
+使得这两者之间的比值小于0.01。
+
+49
+00:01:44,550 --> 00:01:45,540
+So in other words,
+换句话说,
+
+50
+00:01:46,340 --> 00:01:47,370
+a pretty common way to
+一个很常见的方式,
+
+51
+00:01:47,510 --> 00:01:48,460
+think about how we choose k
+来选择K
+
+52
+00:01:48,800 --> 00:01:51,180
+is we want the average squared projection error.
+是我们想让平均投影误差平方
+
+53
+00:01:51,580 --> 00:01:54,700
+That is the average distance
+即x和其投影的
+
+54
+00:01:55,240 --> 00:01:56,340
+between x and it's projections
+平均距离
+
+55
+00:01:57,570 --> 00:02:00,330
+divided by the total variation of the data.
+除以数据的总方差(
+
+56
+00:02:00,800 --> 00:02:01,870
+That is how much the data varies.
+数据的波动程度)。
+
+57
+00:02:02,940 --> 00:02:04,060
+We want this ratio to be
+我们希望这个比例
+
+58
+00:02:04,240 --> 00:02:06,760
+less than, let's say, 0.01.
+比如说,小于0.01
+
+59
+00:02:06,830 --> 00:02:09,450
+Or to be less than 1%, which is another way of thinking about it.
+或小于1%,这是另一种表述方式。
+
+60
+00:02:10,860 --> 00:02:11,940
+And the way most people think
+而且大多数人
+
+61
+00:02:12,150 --> 00:02:13,640
+about choosing K is rather
+选择K
+
+62
+00:02:13,860 --> 00:02:15,660
+than choosing K directly the
+并不是像多数人谈论的那样
+
+63
+00:02:15,890 --> 00:02:17,110
+way most people talk about
+直接进行选择
+
+64
+00:02:17,480 --> 00:02:18,940
+it is as what this
+这是因为这个数
+
+65
+00:02:19,160 --> 00:02:20,630
+number is, whether it is 0.01
+无论是0.01
+
+66
+00:02:20,740 --> 00:02:23,330
+or some other number.
+或一些其它数字。
+
+67
+00:02:23,720 --> 00:02:25,320
+And if it is 0.01, another way
+如果是0.01,另一种方式,
+
+68
+00:02:25,490 --> 00:02:27,020
+to say this to use the
+用PCA的语言来表述,
+
+69
+00:02:27,270 --> 00:02:30,120
+language of PCA is that 99% of the variance is retained.
+就是99%的方差性会被保留。
+
+70
+00:02:32,060 --> 00:02:33,480
+I don't really want to, don't
+不必担心
+
+71
+00:02:33,850 --> 00:02:34,810
+worry about what this phrase
+这个短语
+
+72
+00:02:35,140 --> 00:02:36,920
+really means technically but this
+的真正含义,这个
+
+73
+00:02:37,830 --> 00:02:39,170
+phrase "99% of variance is retained" just means
+短语“99%的方差会被保留”只是说
+
+74
+00:02:39,420 --> 00:02:41,710
+that this quantity on the left is less than 0.01.
+左侧的这个量小于0.01
+
+75
+00:02:42,340 --> 00:02:43,910
+And so, if you
+所以,如果你
+
+76
+00:02:44,930 --> 00:02:46,510
+are using PCA and if you want
+使用主成分分析,如果你想
+
+77
+00:02:46,630 --> 00:02:47,730
+to tell someone, you know,
+告诉别人,
+
+78
+00:02:48,220 --> 00:02:49,860
+how many principle components you've
+你保留了多少主成分
+
+79
+00:02:49,980 --> 00:02:51,080
+retained it would be
+这样去表述会
+
+80
+00:02:51,140 --> 00:02:52,360
+more common to say well, I
+比较好,我
+
+81
+00:02:52,450 --> 00:02:55,360
+chose k so that 99% of the variance was retained.
+选择k使得99%的方差被保留。
+
+82
+00:02:55,990 --> 00:02:56,960
+And that's kind of a useful thing
+这是一个需要知道的有用的
+
+83
+00:02:57,660 --> 00:02:58,530
+to know, it means that you
+东西,这意味着你
+
+84
+00:02:58,620 --> 00:02:59,820
+know, the average squared projection
+知道,平均投影误差平方
+
+85
+00:03:00,760 --> 00:03:01,720
+error divided by the total
+除以数据总方差,
+
+86
+00:03:01,900 --> 00:03:03,260
+variation that was at most 1%.
+结果不会超过1%。
+
+87
+00:03:03,820 --> 00:03:04,770
+That's kind of an insightful
+这样的表述方式是很有见地
+
+88
+00:03:05,270 --> 00:03:06,790
+thing to think about, whereas if
+的,否则
+
+89
+00:03:06,920 --> 00:03:08,420
+you tell someone that, "Well I
+你告诉别人,“嗯,我
+
+90
+00:03:09,170 --> 00:03:10,710
+had to 100 principle
+有100主成分
+
+91
+00:03:10,890 --> 00:03:12,030
+components" or "k was equal
+或”在1000维的数据中
+
+92
+00:03:12,720 --> 00:03:13,850
+to 100 in a thousand dimensional
+k等于100“
+
+93
+00:03:14,220 --> 00:03:15,350
+data" it's a little
+这对于
+
+94
+00:03:15,420 --> 00:03:16,600
+hard for people to interpret
+人们来说
+
+95
+00:03:19,100 --> 00:03:19,100
+that.
+不太好理解
+
+96
+00:03:19,320 --> 00:03:22,220
+So this number 0.01 is what people often use.
+因此这个数字0.01是人们经常使用的东西。
+
+97
+00:03:23,070 --> 00:03:25,380
+Other common values is 0.05,
+其他常见的值是0.05,
+
+98
+00:03:26,840 --> 00:03:27,810
+and so this would be 5%,
+因此,这将是5%,
+
+99
+00:03:27,990 --> 00:03:28,870
+and if you do that then
+如果你这样做,那么
+
+100
+00:03:29,210 --> 00:03:30,390
+you go and say well 95%
+你会说95%的
+
+101
+00:03:30,740 --> 00:03:32,320
+of the variance is
+方差
+
+102
+00:03:32,480 --> 00:03:34,280
+retained and, you know
+会保留,你知道
+
+103
+00:03:34,700 --> 00:03:36,710
+other numbers maybe 90% of the variance is
+其它数字也许90%的方差被
+
+104
+00:03:37,980 --> 00:03:40,030
+retained, maybe as low as 85%.
+保留,也许低至85%。
+
+105
+00:03:40,150 --> 00:03:42,410
+So 90% would correspond to say
+因此,90%将对应于说
+
+106
+00:03:44,340 --> 00:03:46,950
+0.10, kinda 10%.
+0.10,或10%。
+
+107
+00:03:47,250 --> 00:03:49,160
+And so range of values
+所以,值的范围从
+
+108
+00:03:49,900 --> 00:03:50,770
+from, you know, 90, 95,
+90,95,
+
+109
+00:03:50,870 --> 00:03:51,470
+99, maybe as low as 85% of
+99,也许低至85%
+
+110
+00:03:51,500 --> 00:03:55,100
+the variables contained would be a fairly typical range in values.
+都是一些具有代表性的范围。
+
+111
+00:03:55,780 --> 00:03:56,900
+Maybe 95 to 99
+也许95到99
+
+112
+00:03:57,690 --> 00:03:58,810
+is really the most
+是真正最
+
+113
+00:03:59,020 --> 00:04:02,080
+common range of values that people use.
+对被使用的范围
+
+114
+00:04:02,130 --> 00:04:02,950
+For many data sets you'd be
+对于许多数据集你会
+
+115
+00:04:03,010 --> 00:04:04,320
+surprised, in order to retain
+惊讶,为了保留
+
+116
+00:04:04,790 --> 00:04:06,590
+99% of the variance, you can
+99%的方差,你可以
+
+117
+00:04:06,790 --> 00:04:08,160
+often reduce the dimension of
+往往减少数据维数
+
+118
+00:04:08,200 --> 00:04:11,810
+the data significantly and still retain most of the variance.
+但仍保留大部分的方差。
+
+119
+00:04:12,440 --> 00:04:13,410
+Because for most real life
+因为对于真实世界的数据
+
+120
+00:04:13,560 --> 00:04:15,210
+data says many features are
+来说,许多特征都
+
+121
+00:04:15,280 --> 00:04:17,060
+just highly correlated, and so
+是高度相关的,因此
+
+122
+00:04:17,310 --> 00:04:17,940
+it turns out to be possible
+结果证明:
+
+123
+00:04:18,490 --> 00:04:19,540
+to compress the data a
+对数据进行
+
+124
+00:04:19,610 --> 00:04:20,990
+lot and still retain you
+很多压缩,仍然可以保留
+
+125
+00:04:21,360 --> 00:04:22,310
+know 99% of the variance
+99%的方差
+
+126
+00:04:22,530 --> 00:04:26,260
+or 95% of the variance. So how do you implement this?
+或95%的方差。那么,你如何实现呢?
+
+127
+00:04:26,810 --> 00:04:28,610
+Well, here's one algorithm that you might use.
+那么,这里提供一个你可能会用到的算法。
+
+128
+00:04:28,890 --> 00:04:30,360
+You may start off, if you
+你可以开始了,如果你
+
+129
+00:04:30,540 --> 00:04:31,360
+want to choose the value of
+要选择K的值,
+
+130
+00:04:31,470 --> 00:04:33,510
+k, we might start off with k equals 1.
+我们开始可以给K赋值1
+
+131
+00:04:33,550 --> 00:04:34,670
+And then we run through PCA.
+然后我们运行PCA算法
+
+132
+00:04:35,350 --> 00:04:36,440
+You know, so we compute, you
+然后我们计算,你
+
+133
+00:04:36,570 --> 00:04:38,880
+reduce, compute z1, z2, up to zm.
+减少,计算Z1,Z2,一直到ZM。
+
+134
+00:04:39,520 --> 00:04:40,790
+Compute all of those x1 approx
+计算所有这些X1-approx
+
+135
+00:04:41,090 --> 00:04:42,540
+approx and so on up to xm approx
+到 XM-approx
+
+136
+00:04:43,200 --> 00:04:45,110
+and then we check if 99% of the variance is retained.
+然后我们检查,是否保留了99%的方差。
+
+137
+00:04:47,140 --> 00:04:48,890
+Then we're good and we use k equals 1.
+如果保留了,那么我们让K等于1。
+
+138
+00:04:49,020 --> 00:04:51,960
+But if it isn't then what we'll do we'll next try K equals 2.
+但如果不是,那么我们将做什么,我们接下来会尝试让K等于2。
+
+139
+00:04:52,620 --> 00:04:53,810
+And then we'll again
+然后我们将再次
+
+140
+00:04:54,200 --> 00:04:56,070
+run through this entire procedure and
+运行PCA算法,
+
+141
+00:04:56,170 --> 00:04:57,770
+check, you know is this expression satisfied.
+检查一下,是否满足表达式。
+
+142
+00:04:58,470 --> 00:05:00,980
+Is this less than 0.01. And if not then we do this again.
+是否小于0.01。如果没有的话,我们重复这个步骤。
+
+143
+00:05:01,220 --> 00:05:03,070
+Let's try k equals 3,
+让我们试着让k等于3,
+
+144
+00:05:03,310 --> 00:05:04,910
+then try k equals 4,
+然后尝试k等于4,
+
+145
+00:05:04,970 --> 00:05:06,250
+and so on until maybe
+依此类推,直至也许
+
+146
+00:05:06,630 --> 00:05:07,730
+we get up to k equals
+我们让k等于
+
+147
+00:05:08,070 --> 00:05:09,040
+17 and we find 99% of
+17,我们发现99%的
+
+148
+00:05:09,090 --> 00:05:13,060
+the data have is retained and then
+的数据已被保留,然后
+
+149
+00:05:14,120 --> 00:05:15,110
+we use k equals 17, right?
+我们使得k等于17,对吗?
+
+150
+00:05:15,570 --> 00:05:17,160
+That is one way
+这是一种方式
+
+151
+00:05:17,240 --> 00:05:18,790
+to choose the smallest value
+来选择最小的K
+
+152
+00:05:19,130 --> 00:05:20,920
+of k, so that and 99% of the variance is retained.
+从而使99%的方差被保留。
+
+153
+00:05:22,380 --> 00:05:23,440
+But as you can imagine,
+但正如你可以想像的,
+
+154
+00:05:23,550 --> 00:05:25,140
+this procedure seems horribly inefficient
+这个过程看上去非常低效
+
+155
+00:05:26,210 --> 00:05:28,120
+we're trying k equals one, k equals two, we're doing all these calculations.
+我们尝试让k等于1,k等于2,我们正在做的所有这些计算。
+
+156
+00:05:29,580 --> 00:05:30,540
+Fortunately when you implement
+幸运的是,当你实现
+
+157
+00:05:31,130 --> 00:05:33,510
+PCA it actually, in
+PCA是,在
+
+158
+00:05:33,960 --> 00:05:35,530
+this step, it actually gives us
+这一步, 它实际上给了我们
+
+159
+00:05:35,910 --> 00:05:37,080
+a quantity that makes it
+一个量,使得它
+
+160
+00:05:37,320 --> 00:05:40,160
+much easier to compute these things as well.
+我们可以更容易地计算这些东西。
+
+161
+00:05:41,110 --> 00:05:42,160
+Specifically when you're calling
+特别是当你调用
+
+162
+00:05:42,820 --> 00:05:44,120
+SVD to get these
+SVD得到这些
+
+163
+00:05:44,340 --> 00:05:45,550
+matrices u, s, and d,
+矩阵U,S和D,
+
+164
+00:05:45,610 --> 00:05:46,780
+when you're calling usvd on the
+当你在协方差矩阵Σ上
+
+165
+00:05:47,040 --> 00:05:48,560
+covariance matrix sigma, it also
+调用usvd,它也会
+
+166
+00:05:48,860 --> 00:05:49,780
+gives us back this matrix
+给我们返回矩阵
+
+167
+00:05:50,300 --> 00:05:52,170
+S and what
+S
+
+168
+00:05:52,360 --> 00:05:53,430
+S is, is going to
+S实际上将会是
+
+169
+00:05:53,520 --> 00:05:56,790
+be a square matrix an N by N matrix in fact,
+一个N*N的方阵。
+
+170
+00:05:57,640 --> 00:05:58,090
+that is
+就是说
+
+171
+00:05:58,290 --> 00:05:58,290
+diagonal.
+对角线。
+
+172
+00:05:58,830 --> 00:06:00,380
+So is diagonal entries s one
+因此对角线元素s1
+
+173
+00:06:00,540 --> 00:06:01,640
+one, s two two, s
+1,S2,S3
+
+174
+00:06:01,980 --> 00:06:03,240
+three three down to s
+直到
+
+175
+00:06:03,590 --> 00:06:05,130
+n n are going to
+Sn都将
+
+176
+00:06:05,260 --> 00:06:07,010
+be the only non-zero elements of
+是这个矩阵的唯一的非零元素
+
+177
+00:06:07,130 --> 00:06:08,880
+this matrix, and everything off
+并且对角线之外
+
+178
+00:06:09,060 --> 00:06:11,470
+the diagonals is going to be zero.
+的所有元素都将是0。
+
+179
+00:06:11,590 --> 00:06:11,590
+Okay?
+OK?
+
+180
+00:06:11,670 --> 00:06:12,530
+So those big O's that I'm drawing,
+So 我正在画的这些大O,
+
+181
+00:06:13,340 --> 00:06:14,260
+by that what I mean is
+我想表达的意思是
+
+182
+00:06:14,740 --> 00:06:16,330
+that everything off the diagonal
+在这个矩阵中
+
+183
+00:06:17,130 --> 00:06:18,220
+of this matrix all of those
+对角线之外的所有
+
+184
+00:06:18,480 --> 00:06:20,310
+entries there are going to be zeros.
+元素都是0
+
+185
+00:06:22,300 --> 00:06:23,790
+And so, what is possible
+所以,在这里向表示的
+
+186
+00:06:24,190 --> 00:06:25,250
+to show, and I won't prove
+但我在这里不给出证明
+
+187
+00:06:25,480 --> 00:06:26,380
+this here, and it turns out
+但事实证明
+
+188
+00:06:26,620 --> 00:06:27,880
+that for a given value of
+对于一个给定的K
+
+189
+00:06:27,980 --> 00:06:29,920
+k, this quantity
+这个量
+
+190
+00:06:30,590 --> 00:06:37,820
+over here can be computed much more simply.
+的计算将会比较简单
+
+191
+00:06:38,800 --> 00:06:40,310
+And that quantity can be computed
+而且这个量可以这样计算:
+
+192
+00:06:41,000 --> 00:06:42,900
+as one minus sum from
+1减去
+
+193
+00:06:43,130 --> 00:06:44,400
+i equals 1 through
+从1到k
+
+194
+00:06:44,610 --> 00:06:47,960
+K of Sii divided by
+的sii的和
+
+195
+00:06:48,640 --> 00:06:50,050
+sum from I equals 1
+除以从1到n的
+
+196
+00:06:50,170 --> 00:06:52,010
+through N of Sii.
+的Sii的和。
+
+197
+00:06:53,360 --> 00:06:54,820
+So just to say that it words, or
+所以只是
+
+198
+00:06:55,000 --> 00:06:56,170
+just to take another
+从另一个角度
+
+199
+00:06:56,450 --> 00:06:57,330
+view of how to explain that,
+来进行解释的话
+
+200
+00:06:57,960 --> 00:06:59,370
+if K equals 3 let's say.
+如果K等于3,比如
+
+201
+00:07:00,810 --> 00:07:01,970
+What we're going to do to
+我们将要做的是
+
+202
+00:07:02,080 --> 00:07:03,200
+compute the numerator is sum
+计算分子的和
+
+203
+00:07:03,340 --> 00:07:04,680
+from one-- I equals 1
+从i=1到
+
+204
+00:07:04,820 --> 00:07:05,830
+through 3 of of Sii, so just
+3 的sii的值,所以仅仅
+
+205
+00:07:06,240 --> 00:07:08,170
+compute the sum of these first three elements.
+计算前三个元素的和。
+
+206
+00:07:09,280 --> 00:07:09,710
+So that's the numerator.
+这就是分子。
+
+207
+00:07:10,980 --> 00:07:12,880
+And then for the denominator, well that's
+然后对于分母,
+
+208
+00:07:13,090 --> 00:07:14,970
+the sum of all of these diagonal entries.
+计算对角项的和。
+
+209
+00:07:16,210 --> 00:07:17,470
+And one minus the ratio of
+然后1减去这个比值
+
+210
+00:07:17,660 --> 00:07:19,080
+that, that gives me this
+那给了我这个量
+
+211
+00:07:19,300 --> 00:07:21,330
+quantity over here, that I've
+在这里,我用
+
+212
+00:07:21,650 --> 00:07:23,440
+circled in blue.
+蓝色圆圈标注的。
+
+213
+00:07:23,650 --> 00:07:24,380
+And so, what we can do
+所以,我们需要做的,
+
+214
+00:07:24,650 --> 00:07:26,000
+is just test if this
+只是测试,这个值
+
+215
+00:07:26,430 --> 00:07:29,330
+is less than or equal to 0.01.
+是否小于或等于0.01。
+
+216
+00:07:29,370 --> 00:07:30,460
+Or equivalently, we can test
+或者等价地,我们可以测试
+
+217
+00:07:30,830 --> 00:07:31,960
+if the sum from
+1到k的
+
+218
+00:07:32,180 --> 00:07:33,010
+i equals 1 through k, s-i-i
+sii的和,
+
+219
+00:07:33,970 --> 00:07:35,180
+divided by sum from i
+除以
+
+220
+00:07:35,320 --> 00:07:37,090
+equals 1 through n, s-i-i
+1到n的sii的和,
+
+221
+00:07:37,650 --> 00:07:38,580
+if this is greater than or
+如果是大于或
+
+222
+00:07:38,770 --> 00:07:40,600
+equal to 0.99, if you
+等于0.99,如果你
+
+223
+00:07:40,720 --> 00:07:42,920
+want to be sure that 99% of the variance is retained.
+要确保99%的方差被保留。
+
+224
+00:07:44,770 --> 00:07:45,650
+And so what you can do
+那么你可以这样做:
+
+225
+00:07:45,940 --> 00:07:48,360
+is just slowly increase k,
+只是慢慢地增加K,
+
+226
+00:07:48,770 --> 00:07:49,820
+set k equals one, set k equals
+让k=1,让k=2
+
+227
+00:07:50,100 --> 00:07:51,290
+two, set k equals three and so
+k=3,以此类推
+
+228
+00:07:52,140 --> 00:07:53,240
+on, and just test this quantity
+然后检查哪个是
+
+229
+00:07:54,720 --> 00:07:56,120
+to see what is the
+使得99%的方差被保留的
+
+230
+00:07:56,350 --> 00:07:58,820
+smallest value of k that ensures that 99% of the variance is retained.
+的最小值。
+
+231
+00:08:00,600 --> 00:08:01,810
+And if you do
+如果你这样做
+
+232
+00:08:02,000 --> 00:08:02,790
+this, then you need to call
+那么你需要调用
+
+233
+00:08:03,170 --> 00:08:04,660
+the SVD function only once.
+一次SVD。
+
+234
+00:08:04,970 --> 00:08:05,830
+Because that gives you the
+因为这可以给你
+
+235
+00:08:06,010 --> 00:08:07,060
+S matrix and once you
+矩阵S,一旦你
+
+236
+00:08:07,090 --> 00:08:08,350
+have the S matrix, you can
+获得矩阵S,你可以
+
+237
+00:08:08,490 --> 00:08:09,540
+then just keep on doing
+继续进行这种
+
+238
+00:08:09,770 --> 00:08:11,370
+this calculation by increasing
+计算通过增加
+
+239
+00:08:11,930 --> 00:08:12,910
+the value of K in the
+分子中
+
+240
+00:08:13,070 --> 00:08:14,450
+numerator and so you
+K的值,而且你
+
+241
+00:08:14,560 --> 00:08:16,290
+don't need keep to calling SVD over
+不需要反复调用SVD
+
+242
+00:08:16,540 --> 00:08:18,620
+and over again to test out the different values of K.
+来测试出K的不同值
+
+243
+00:08:18,910 --> 00:08:20,030
+So this procedure is much more
+所以此过程是更
+
+244
+00:08:20,150 --> 00:08:21,700
+efficient, and this can
+高效的,这可以
+
+245
+00:08:21,910 --> 00:08:24,020
+allow you to select the
+允许你选择
+
+246
+00:08:24,090 --> 00:08:25,890
+value of K without needing
+k的值而不用
+
+247
+00:08:26,260 --> 00:08:27,620
+to run PCA from scratch
+从头到尾
+
+248
+00:08:28,030 --> 00:08:30,650
+over and over. You just run SVD once, this
+重复运行PCA。只需计算一次奇异值分解,这
+
+249
+00:08:30,850 --> 00:08:32,350
+gives you all of these diagonal numbers,
+给你所有这些对角线的数字,
+
+250
+00:08:32,780 --> 00:08:35,090
+all of these numbers S11, S22 down to SNN,
+所有这些数字充S11,S22一直到Snn的,
+
+251
+00:08:35,780 --> 00:08:36,820
+and then you can
+然后你就可以
+
+252
+00:08:36,920 --> 00:08:38,440
+just you know, vary K
+你知道的,变化K
+
+253
+00:08:38,730 --> 00:08:40,740
+in this expression to find
+在这个表达式中来查找
+
+254
+00:08:41,010 --> 00:08:42,250
+the smallest value of K, so
+k的最小值,使得
+
+255
+00:08:43,140 --> 00:08:44,030
+that 99% of the variance is retained.
+99%的方差被保留。
+
+256
+00:08:44,850 --> 00:08:45,870
+So to summarize, the way
+总结一下,
+
+257
+00:08:46,050 --> 00:08:47,850
+that I often use, the
+我经常使用的方法,
+
+258
+00:08:47,970 --> 00:08:49,050
+way that I often choose K
+我一般选择K的方法,
+
+259
+00:08:49,420 --> 00:08:50,790
+when I am using PCA for compression
+当我使用PCA进行压缩时,
+
+260
+00:08:51,120 --> 00:08:52,590
+is I would call SVD once
+我会在协方差矩阵上进行一次
+
+261
+00:08:52,950 --> 00:08:54,480
+in the covariance matrix, and then
+SVD,然后
+
+262
+00:08:54,540 --> 00:08:55,750
+I would use this formula and
+我会用这个公式,
+
+263
+00:08:56,030 --> 00:08:57,930
+pick the smallest value of
+挑选k的最小值
+
+264
+00:08:58,020 --> 00:09:00,390
+K for which this expression is satisfied.
+使得k满足这个公式。
+
+265
+00:09:01,580 --> 00:09:02,560
+And by the way, even if you
+顺便说一句,即使你
+
+266
+00:09:02,650 --> 00:09:03,850
+were to pick some different value
+是挑选一些不同的k
+
+267
+00:09:04,180 --> 00:09:04,960
+of K, even if you were
+值,即使你
+
+268
+00:09:05,000 --> 00:09:05,920
+to pick the value of K
+手动选择K值,
+
+269
+00:09:06,090 --> 00:09:07,250
+manually, you know maybe you
+你知道,也许你
+
+270
+00:09:07,300 --> 00:09:08,620
+have a thousand dimensional data
+有一千个维度数据
+
+271
+00:09:09,540 --> 00:09:11,590
+and I just want to choose K equals one hundred.
+我只是想选择K=100。
+
+272
+00:09:12,430 --> 00:09:13,450
+Then, if you want to explain
+然后,如果你想解释
+
+273
+00:09:13,690 --> 00:09:14,760
+to others what you just did,
+给别人你刚才做了什么,
+
+274
+00:09:15,230 --> 00:09:17,070
+a good way to explain the performance
+一个向他们
+
+275
+00:09:17,750 --> 00:09:18,910
+of your implementation of PCA to
+解释你的PCA算法性能的好方法是
+
+276
+00:09:19,220 --> 00:09:20,300
+them, is actually to take
+其实是取
+
+277
+00:09:20,540 --> 00:09:21,670
+this quantity and compute what
+这个量,并计算结果
+
+278
+00:09:21,890 --> 00:09:23,000
+this is, and that will
+这将
+
+279
+00:09:23,110 --> 00:09:25,770
+tell you what was the percentage of variance retained.
+告诉你方差被保留的百分比。
+
+280
+00:09:26,300 --> 00:09:28,070
+And if you report that number, then,
+如果你报告这个数字,那么,
+
+281
+00:09:28,340 --> 00:09:29,720
+you know, people that are familiar
+你知道,熟悉PCA的人
+
+282
+00:09:30,100 --> 00:09:31,610
+with PCA, and people can
+他们可以
+
+283
+00:09:31,880 --> 00:09:33,020
+use this to get a
+通过这个来
+
+284
+00:09:33,080 --> 00:09:34,560
+good understanding of how well
+很好的理解
+
+285
+00:09:34,900 --> 00:09:37,340
+your hundred dimensional representation is
+你的100维表示是如何
+
+286
+00:09:37,690 --> 00:09:39,270
+approximating your original data
+逼近原始数据的
+
+287
+00:09:39,580 --> 00:09:41,300
+set, because there's 99% of variance retained.
+因为有99%的方差保留。
+
+288
+00:09:41,990 --> 00:09:44,140
+That's really a measure of your
+这真的是对你的构造误差的
+
+289
+00:09:44,360 --> 00:09:45,860
+square of construction error, that
+好的度量,那个比值
+
+290
+00:09:46,240 --> 00:09:47,870
+ratio being 0.01, just
+为0.01,正好
+
+291
+00:09:48,430 --> 00:09:49,940
+gives people a good intuitive
+给人一种很好的直观的
+
+292
+00:09:50,430 --> 00:09:51,820
+sense of whether your implementation
+感受,是否你的PCA
+
+293
+00:09:52,580 --> 00:09:53,840
+of PCA is finding a
+找到了一个
+
+294
+00:09:54,000 --> 00:09:56,530
+good approximation of your original data set.
+对原始数据集的良好近似。
+
+295
+00:09:58,440 --> 00:09:59,600
+So hopefully, that gives you
+所以希望,那给你
+
+296
+00:09:59,800 --> 00:10:01,260
+an efficient procedure for choosing
+一个高效的步骤去选择
+
+297
+00:10:01,850 --> 00:10:02,800
+the number K. For choosing
+K,将你的数据维数
+
+298
+00:10:03,260 --> 00:10:04,940
+what dimension to reduce your
+进行缩减,
+
+299
+00:10:05,160 --> 00:10:06,630
+data to, and if
+并且如果
+
+300
+00:10:06,750 --> 00:10:07,830
+you apply PCA to very
+你将PCA应用到
+
+301
+00:10:07,970 --> 00:10:09,740
+high dimensional data sets, you know, to like
+高维数据集,你知道,比方说
+
+302
+00:10:09,990 --> 00:10:11,570
+a thousand dimensional data, very often,
+1000维数据,很常见的维数,
+
+303
+00:10:11,980 --> 00:10:13,340
+just because data sets tend
+只是因为数据集往往
+
+304
+00:10:13,530 --> 00:10:14,720
+to have highly correlated
+有高度相关
+
+305
+00:10:15,070 --> 00:10:16,140
+features, this is just a
+的特征,这仅仅是一个
+
+306
+00:10:16,280 --> 00:10:17,190
+property of most of the data sets you see,
+你看到的大部分数据集的一个属性,
+
+307
+00:10:18,440 --> 00:10:19,420
+you often find that PCA
+你经常会发现PCA
+
+308
+00:10:20,040 --> 00:10:21,610
+will be able to retain ninety nine
+将可保留99
+
+309
+00:10:21,840 --> 00:10:22,940
+per cent of the variance or say,
+的方差或者说,
+
+310
+00:10:23,110 --> 00:10:24,440
+ninety five ninety nine, some
+95,99,一些
+
+311
+00:10:24,720 --> 00:10:25,910
+high fraction of the variance,
+高比例的方差,
+
+312
+00:10:26,360 --> 00:10:27,580
+even while compressing the data
+即使是因为一个很大的因素
+
+313
+00:10:28,560 --> 00:10:29,720
+by a very large factor.
+对数据进行压缩
+
diff --git a/srt/14 - 6 - Reconstruction from Compressed Representation (4 min).srt b/srt/14 - 6 - Reconstruction from Compressed Representation (4 min).srt
new file mode 100644
index 00000000..1c2f5736
--- /dev/null
+++ b/srt/14 - 6 - Reconstruction from Compressed Representation (4 min).srt
@@ -0,0 +1,565 @@
+1
+00:00:00,120 --> 00:00:01,020
+In some of the earlier videos,
+在以前的视频中,(字幕翻译:中国海洋大学,黄海广,haiguang2000@qq.com)
+
+2
+00:00:01,690 --> 00:00:03,300
+I was talking about PCA as
+我谈论PCA
+
+3
+00:00:03,410 --> 00:00:05,270
+a compression algorithm where you
+作为压缩算法。
+
+4
+00:00:05,330 --> 00:00:06,760
+may have say, a thousand dimensional
+在那里你可能需要把1000维
+
+5
+00:00:07,270 --> 00:00:08,760
+data and compress it
+的数据压缩
+
+6
+00:00:09,100 --> 00:00:10,850
+to a hundred dimensional feature back
+一百维特征,
+
+7
+00:00:11,010 --> 00:00:12,360
+there, or have three dimensional
+或具有三维
+
+8
+00:00:12,810 --> 00:00:14,980
+data and compress it to a two dimensional representation.
+数据压缩到一二维表示。
+
+9
+00:00:16,360 --> 00:00:17,430
+So, if this is a
+所以,如果这是一个
+
+10
+00:00:17,620 --> 00:00:19,040
+compression algorithm, there should
+压缩算法,应该
+
+11
+00:00:19,360 --> 00:00:20,440
+be a way to go back from
+能回到
+
+12
+00:00:20,660 --> 00:00:22,930
+this compressed representation, back to
+这个压缩表示,回
+
+13
+00:00:23,030 --> 00:00:25,560
+an approximation of your original high dimensional data.
+到你原有的高维数据的一种近似。
+
+14
+00:00:26,340 --> 00:00:28,070
+So, given z(i), which maybe
+所以,给定的Z(i),这可能
+
+15
+00:00:28,780 --> 00:00:30,250
+a hundred dimensional, how do
+一百维,怎么办
+
+16
+00:00:30,320 --> 00:00:31,710
+you go back to your original
+你回到你原来的
+
+17
+00:00:32,050 --> 00:00:34,720
+representation x(i), which was maybe a thousand dimensional?
+表示x(i),这可能是一千维的数组?
+
+18
+00:00:35,760 --> 00:00:36,820
+In this video, I'd like to
+这个视频,我想
+
+19
+00:00:36,930 --> 00:00:40,350
+describe how to do that.
+描述如何去做。
+
+20
+00:00:40,500 --> 00:00:43,620
+In the PCA algorithm, we may have an example like this.
+在PCA算法,我们可能有一个这样的样本。
+
+21
+00:00:43,940 --> 00:00:45,670
+So maybe that's my example x1
+也许那是那是我们的样本x1,
+
+22
+00:00:45,910 --> 00:00:47,810
+and maybe that's my example x2.
+也许那是我的样本X2。
+
+23
+00:00:48,110 --> 00:00:49,340
+And what we do
+我们做的是,
+
+24
+00:00:49,570 --> 00:00:51,010
+is, we take these examples and
+我们把这些样本
+
+25
+00:00:51,120 --> 00:00:54,160
+we project them onto this one dimensional surface.
+投射到这一个维平面。
+
+26
+00:00:55,150 --> 00:00:56,280
+And then now we need
+然后现在我们需要
+
+27
+00:00:56,450 --> 00:00:57,750
+to use only a real number,
+只使用一个实数,
+
+28
+00:00:58,350 --> 00:01:00,500
+say z1, to specify the
+比如Z1,指定
+
+29
+00:01:00,600 --> 00:01:01,950
+location of these points after
+这些点的位置后
+
+30
+00:01:02,300 --> 00:01:04,640
+they've been projected onto this one dimensional surface. So
+他们被投射到这一个三维曲面。所以
+
+31
+00:01:04,890 --> 00:01:06,930
+, given a point
+,给定一个点
+
+32
+00:01:07,730 --> 00:01:08,730
+like this, given a point z1,
+这样,给定一个点Z1,
+
+33
+00:01:08,980 --> 00:01:10,840
+how can we go back to
+我们怎么能回去
+
+34
+00:01:11,000 --> 00:01:12,580
+this original two-dimensional space?
+这个原始的二维空间?
+
+35
+00:01:13,290 --> 00:01:15,380
+And in particular, given the
+特别是,给定
+
+36
+00:01:15,510 --> 00:01:16,510
+point z, which is an
+Z点,这是一个
+
+37
+00:01:16,660 --> 00:01:17,840
+r, can we map
+R,我们可以映射
+
+38
+00:01:18,160 --> 00:01:19,660
+this back to some approximate
+这些来回到一些近似
+
+39
+00:01:20,440 --> 00:01:22,060
+representation x and r2
+表示X和R2
+
+40
+00:01:22,370 --> 00:01:24,970
+of whatever the original value of the data was?
+的任何数据的原始值?
+
+41
+00:01:26,520 --> 00:01:28,090
+So, whereas z equals U
+而z等于
+
+42
+00:01:28,400 --> 00:01:29,570
+reduced transverse x, if you
+UTreduced x,如果你
+
+43
+00:01:29,680 --> 00:01:30,640
+want to go in the opposite
+想去相反的
+
+44
+00:01:30,930 --> 00:01:33,620
+direction, the equation for
+方向,方程
+
+45
+00:01:33,780 --> 00:01:35,150
+that is, we're going
+变为,
+
+46
+00:01:35,290 --> 00:01:38,220
+to write x approx equals
+x approx 等于
+
+47
+00:01:40,470 --> 00:01:43,570
+U reduce times z.
+U reduce乘以 z
+
+48
+00:01:44,020 --> 00:01:44,880
+Again, just to check the dimensions,
+只是为了检查维度,
+
+49
+00:01:45,950 --> 00:01:47,760
+here U reduce is
+这里U reduce 为
+
+50
+00:01:47,970 --> 00:01:48,990
+going to be an n by k
+N*K
+
+51
+00:01:49,680 --> 00:01:51,260
+dimensional vector, z is
+维向量,Z
+
+52
+00:01:51,370 --> 00:01:53,270
+going to be a k by 1 dimensional vector.
+变成k阶一维向量。
+
+53
+00:01:54,030 --> 00:01:56,280
+So, we multiply these out and that's going to be n by one.
+所以,我们将这些相乘,将是一个N。
+
+54
+00:01:56,720 --> 00:01:58,270
+So x approx is going
+所以x approx
+
+55
+00:01:58,450 --> 00:01:59,990
+to be an n dimensional vector.
+是一个N维向量。
+
+56
+00:02:00,310 --> 00:02:01,320
+And so the intent of PCA,
+这就是PCA的意图,
+
+57
+00:02:01,390 --> 00:02:03,320
+that is, the square projection error
+方块上的投影误差
+
+58
+00:02:03,620 --> 00:02:04,510
+is not too big, is that
+不太大,
+
+59
+00:02:04,730 --> 00:02:06,050
+this x approx will be
+x approx将会
+
+60
+00:02:06,500 --> 00:02:08,640
+close to whatever was
+接近
+
+61
+00:02:08,960 --> 00:02:10,090
+the original value of x
+x的初始值
+
+62
+00:02:10,270 --> 00:02:13,140
+that you had used to derive z in the first place.
+你有用于导出Z放在第一位。
+
+63
+00:02:14,080 --> 00:02:16,630
+To show a picture of what this looks like, this is what it looks like.
+这张图片,看起来就是它的样子。
+
+64
+00:02:16,870 --> 00:02:17,820
+What you get back of this
+当你回到
+
+65
+00:02:17,970 --> 00:02:19,640
+procedure are points that lie
+程序点
+
+66
+00:02:19,920 --> 00:02:22,860
+on the projection of that onto the green line.
+的投影到绿色线。
+
+67
+00:02:23,500 --> 00:02:24,580
+So to take our early example,
+所以采取我们早期的样本,
+
+68
+00:02:24,920 --> 00:02:26,400
+if we started off with
+如果我们开始
+
+69
+00:02:26,610 --> 00:02:28,570
+this value of x1, and got
+这个值x1,并得到了
+
+70
+00:02:28,850 --> 00:02:29,690
+this z1, if you plug
+Z1,如果你用
+
+71
+00:02:30,310 --> 00:02:32,760
+z1 through this formula to get
+Z1通过这个公式得到
+
+72
+00:02:33,440 --> 00:02:35,510
+x1 approx, then this
+x1 approx,那么这个
+
+73
+00:02:35,730 --> 00:02:37,040
+point here, that will be
+点在这里,那将是
+
+74
+00:02:37,590 --> 00:02:40,110
+x1 approx, which is
+x1 approx,这
+
+75
+00:02:40,260 --> 00:02:41,990
+going to be r2 and so.
+将变为R2等。
+
+76
+00:02:42,780 --> 00:02:44,060
+And similarly, if you
+同样地,如果你
+
+77
+00:02:44,180 --> 00:02:45,640
+do the same procedure, this will
+做同样的程序,这将
+
+78
+00:02:45,760 --> 00:02:47,840
+be x2 approx.
+是x2 approx。
+
+79
+00:02:49,640 --> 00:02:50,630
+And you know, that's a pretty
+如你所知,这是一个漂亮的
+
+80
+00:02:50,780 --> 00:02:53,160
+decent approximation to the original data.
+与原始数据相当相似。
+
+81
+00:02:53,670 --> 00:02:54,870
+So, that's how you
+所以,这就是你
+
+82
+00:02:55,060 --> 00:02:56,190
+go back from your low dimensional
+从低维
+
+83
+00:02:56,630 --> 00:02:58,350
+representation z back to
+表示Z回到
+
+84
+00:02:58,700 --> 00:03:00,720
+an uncompressed representation of
+未压缩的表示
+
+85
+00:03:00,760 --> 00:03:01,990
+the data we get back an
+我们得到的数据的一个
+
+86
+00:03:02,240 --> 00:03:03,480
+the approxiamation to your original
+之间你的原始
+
+87
+00:03:03,690 --> 00:03:05,400
+data x, and we
+数据X,我们
+
+88
+00:03:05,500 --> 00:03:07,210
+also call this process reconstruction
+也把这个过程称为重建
+
+89
+00:03:08,220 --> 00:03:08,900
+of the original data.
+原始数据。
+
+90
+00:03:09,530 --> 00:03:10,950
+When we think of trying to reconstruct
+当我们认为试图重建
+
+91
+00:03:11,310 --> 00:03:13,630
+the original value of x from the compressed representation.
+从压缩表示x的初始值。
+
+92
+00:03:16,770 --> 00:03:18,370
+So, given an unlabeled data
+所以,给定未标记的数据
+
+93
+00:03:18,610 --> 00:03:19,850
+set, you now know how to
+集,您现在知道如何
+
+94
+00:03:19,990 --> 00:03:21,710
+apply PCA and take
+应用PCA和带
+
+95
+00:03:21,970 --> 00:03:23,800
+your high dimensional features x and
+你的高维特征X和
+
+96
+00:03:24,130 --> 00:03:25,440
+map it to this
+映射到这
+
+97
+00:03:25,560 --> 00:03:27,200
+lower dimensional representation z, and
+的低维表示Z,和
+
+98
+00:03:27,400 --> 00:03:28,630
+from this video, hopefully you now
+这个视频,希望你现在
+
+99
+00:03:28,910 --> 00:03:29,670
+also know how to take
+也知道如何采取
+
+100
+00:03:30,260 --> 00:03:31,690
+these low representation z and
+这些低维表示Z
+
+101
+00:03:31,860 --> 00:03:32,810
+map the backup to an approximation
+映射到备份到一个近似
+
+102
+00:03:33,700 --> 00:03:35,780
+of your original high dimensional data.
+你原有的高维数据。
+
+103
+00:03:37,300 --> 00:03:38,180
+Now that you know how to
+现在你知道如何
+
+104
+00:03:38,460 --> 00:03:40,280
+implement in applying PCA, what
+实施应用PCA,
+
+105
+00:03:40,470 --> 00:03:41,290
+we will like to do next is to
+我们将要做的事是
+
+106
+00:03:41,390 --> 00:03:42,250
+talk about some of the
+谈论
+
+107
+00:03:42,290 --> 00:03:43,460
+mechanics of how to
+一些技术在
+
+108
+00:03:43,990 --> 00:03:45,240
+actually use PCA well,
+实际使用PCA很好,
+
+109
+00:03:45,530 --> 00:03:46,670
+and in particular, in the next
+特别是,在接下来的
+
+110
+00:03:46,890 --> 00:03:47,610
+video, I like to talk
+视频中,我想谈一谈
+
+111
+00:03:48,090 --> 00:03:49,730
+about how to choose K, which is,
+关于如何选择K,这是,
+
+112
+00:03:49,910 --> 00:03:51,140
+how to choose the dimension
+如何选择维度
+
+113
+00:03:51,560 --> 00:03:53,570
+of this reduced representation vector z.
+这减少表示的向量Z。
+
diff --git a/srt/14 - 7 - Advice for Applying PCA (13 min).srt b/srt/14 - 7 - Advice for Applying PCA (13 min).srt
new file mode 100644
index 00000000..e9af7bc7
--- /dev/null
+++ b/srt/14 - 7 - Advice for Applying PCA (13 min).srt
@@ -0,0 +1,1910 @@
+1
+00:00:00,090 --> 00:00:01,450
+In an earlier video, I had
+在早期的视频中(字幕翻译:中国海洋大学,仇志金)
+
+2
+00:00:01,610 --> 00:00:02,710
+said that PCA can be
+我曾经说过,PCA在某些情况下可以
+
+3
+00:00:02,840 --> 00:00:05,410
+sometimes used to speed up the running time of a learning algorithm.
+加快学习算法的执行效率
+
+4
+00:00:07,070 --> 00:00:08,140
+In this video, I'd like
+在该视频中
+
+5
+00:00:08,370 --> 00:00:09,520
+to explain how to actually
+我要去讲解实际情况下如何去做
+
+6
+00:00:09,820 --> 00:00:11,270
+do that, and also say
+也就是说
+
+7
+00:00:11,460 --> 00:00:12,900
+some, just try to give
+尝试着给一些建议
+
+8
+00:00:12,990 --> 00:00:14,550
+some advice about how to apply PCA.
+如何去应用PCA算法
+
+9
+00:00:17,110 --> 00:00:19,630
+Here's how you can use PCA to speed up a learning algorithm,
+下面讲解如何使用PCA算法来提高学习算法
+
+10
+00:00:20,270 --> 00:00:21,940
+and this supervised learning algorithm
+对监督学习算法
+
+11
+00:00:22,270 --> 00:00:23,630
+speed up is actually the most
+进行学习算法加速
+
+12
+00:00:23,870 --> 00:00:25,870
+common use that I
+我个人常常采用PCA算法
+
+13
+00:00:26,530 --> 00:00:27,720
+personally make of PCA.
+我个人常常采用PCA算法
+
+14
+00:00:28,710 --> 00:00:29,640
+Let's say you have a supervised
+比如说,你有一个
+
+15
+00:00:30,300 --> 00:00:31,660
+learning problem, note this is
+监督学习算法,值的注意的是
+
+16
+00:00:31,810 --> 00:00:33,380
+a supervised learning problem with inputs
+该监督学习算法
+
+17
+00:00:33,690 --> 00:00:35,510
+X and labels Y, and
+需要输入X和标签Y
+
+18
+00:00:35,810 --> 00:00:37,330
+let's say that your examples
+如果说,在你的例子中
+
+19
+00:00:37,830 --> 00:00:39,140
+xi are very high dimensional.
+xi是有非常高的维度
+
+20
+00:00:39,840 --> 00:00:41,670
+So, lets say that your examples, xi are
+比如说
+
+21
+00:00:41,800 --> 00:00:44,000
+10,000 dimensional feature vectors.
+xi是10,000维的特征向量
+
+22
+00:00:45,510 --> 00:00:46,550
+One example of that, would
+比如说
+
+23
+00:00:46,700 --> 00:00:48,130
+be, if you were doing some computer
+你要解决一些关于
+
+24
+00:00:48,540 --> 00:00:50,390
+vision problem, where you have
+可视化信息处理
+
+25
+00:00:50,650 --> 00:00:52,410
+a 100x100 images, and so
+你有一个100*100的图片
+
+26
+00:00:52,780 --> 00:00:55,550
+if you have 100x100, that's 10000
+如果是100*100,则就会有
+
+27
+00:00:55,850 --> 00:00:57,520
+pixels, and so if xi are,
+10000个像素点,所以xi
+
+28
+00:00:57,780 --> 00:00:59,240
+you know, feature vectors
+你知道,特征向量
+
+29
+00:00:59,760 --> 00:01:01,670
+that contain your 10000 pixel
+包含10,000个像素
+
+30
+00:01:02,470 --> 00:01:03,580
+intensity values, then
+亮度值,
+
+31
+00:01:04,410 --> 00:01:05,580
+you have 10000 dimensional feature vectors.
+所以你将要处理10,000维的特征矩阵
+
+32
+00:01:06,880 --> 00:01:08,530
+So with very high-dimensional
+对于像这种的高维
+
+33
+00:01:09,300 --> 00:01:10,890
+feature vectors like this, running a
+特征矩阵
+
+34
+00:01:11,320 --> 00:01:12,860
+learning algorithm can be slow, right?
+运行学习算法时将变的非常慢,不是吗?
+
+35
+00:01:13,030 --> 00:01:14,300
+Just, if you feed 10,000 dimensional
+如果你要使用10,000维的
+
+36
+00:01:14,790 --> 00:01:16,980
+feature vectors into logistic regression,
+特征矩阵进行逻辑回归
+
+37
+00:01:17,570 --> 00:01:19,780
+or a new network, or support vector machine or what have you,
+或一个新的网络,或支持向量机,或你想要的操作
+
+38
+00:01:20,450 --> 00:01:22,000
+just because that's a lot of data,
+因为数据量太大
+
+39
+00:01:22,200 --> 00:01:23,060
+that's 10,000 numbers,
+包含10,000成员
+
+40
+00:01:24,130 --> 00:01:25,970
+it can make your learning algorithm run more slowly.
+将会造成你的学习算法运行非常的慢
+
+41
+00:01:27,170 --> 00:01:28,530
+Fortunately with PCA we'll be
+幸运的时,PCA算法
+
+42
+00:01:28,680 --> 00:01:29,810
+able to reduce the dimension of
+可以减少要处理
+
+43
+00:01:30,060 --> 00:01:31,050
+this data and so make
+数据的维度
+
+44
+00:01:31,180 --> 00:01:32,410
+our algorithms run more
+从而使得算法更加高效
+
+45
+00:01:32,920 --> 00:01:34,440
+efficiently. Here's how you
+至于如何去做
+
+46
+00:01:34,590 --> 00:01:35,780
+do that. We are going
+我们首先
+
+47
+00:01:35,980 --> 00:01:37,030
+first check our labeled
+观察已经被标记的训练集
+
+48
+00:01:37,400 --> 00:01:39,520
+training set and extract just
+和只抽取输入的参数
+
+49
+00:01:39,800 --> 00:01:41,830
+the inputs, we're just going to extract the X's
+我们只抽取出X
+
+50
+00:01:42,730 --> 00:01:45,130
+and temporarily put aside the Y's.
+把Y先临时放一边
+
+51
+00:01:45,860 --> 00:01:46,750
+So this will now give us
+所以我们目前得到
+
+52
+00:01:47,090 --> 00:01:49,150
+an unlabelled training set x1
+一个无标签的训练集x1
+
+53
+00:01:49,400 --> 00:01:51,000
+through xm which are maybe
+到xm,这些数据集可能是
+
+54
+00:01:51,240 --> 00:01:53,600
+there's a ten thousand dimensional data,
+一万维的数据
+
+55
+00:01:53,940 --> 00:01:55,800
+ten thousand dimensional examples we have.
+在我们一万维的样本中
+
+56
+00:01:55,870 --> 00:01:57,230
+So just extract the input vectors
+仅仅抽取出了输入的向量
+
+57
+00:01:58,370 --> 00:01:58,930
+x1 through xm.
+x1 到 xm
+
+58
+00:02:00,650 --> 00:02:01,810
+Then we're going to apply PCA
+此时,我们继续使用PCA
+
+59
+00:02:02,700 --> 00:02:03,740
+and this will give me a
+将要使用一个
+
+60
+00:02:03,980 --> 00:02:06,100
+reduced dimension representation of the
+低维的数据来代表
+
+61
+00:02:06,410 --> 00:02:08,010
+data, so instead of
+取代
+
+62
+00:02:08,260 --> 00:02:09,540
+10,000 dimensional feature vectors I now
+10,000维的特征向量
+
+63
+00:02:09,780 --> 00:02:11,880
+have maybe one thousand dimensional feature vectors.
+可能我只需要一千维的特征向量
+
+64
+00:02:12,330 --> 00:02:13,500
+So that's like a 10x savings.
+这样就是一个10倍的储存量
+
+65
+00:02:15,110 --> 00:02:17,260
+So this gives me, if you will, a new training set.
+所以这样就给我们,如果你想要的话,一个新的训练集
+
+66
+00:02:17,910 --> 00:02:19,430
+So whereas previously I might
+之前我可能
+
+67
+00:02:19,620 --> 00:02:21,180
+have had an example x1, y1,
+有一个样本x1, y1
+
+68
+00:02:21,490 --> 00:02:24,340
+my first training input, is now represented by z1.
+我之前的训练集的输入,现在用z1来取代
+
+69
+00:02:24,580 --> 00:02:25,800
+And so we'll have a
+因此,我们将要
+
+70
+00:02:26,050 --> 00:02:27,010
+new sort of training example,
+使用新的方式对训练集进行表达
+
+71
+00:02:28,210 --> 00:02:29,240
+which is Z1 paired with y1.
+Z1 与 y1 进行对应
+
+72
+00:02:30,700 --> 00:02:33,170
+And similarly Z2, Y2, and so on, up to ZM, YM.
+相似的Z2, Y2等等,直到 ZM, YM
+
+73
+00:02:33,770 --> 00:02:35,300
+Because my training examples are
+因为我的训练样本是
+
+74
+00:02:35,460 --> 00:02:36,980
+now represented with this much
+新的方式表示
+
+75
+00:02:37,480 --> 00:02:41,040
+lower dimensional representation Z1, Z2, up to ZM.
+该方式是低维的,Z1, Z2, 到 ZM
+
+76
+00:02:41,310 --> 00:02:42,340
+Finally, I can take this
+最终,我可以使用
+
+77
+00:02:43,650 --> 00:02:45,060
+reduced dimension training set and
+这个低维的训练集
+
+78
+00:02:45,240 --> 00:02:46,540
+feed it to a learning algorithm maybe
+去使用神经网络算法
+
+79
+00:02:46,640 --> 00:02:47,900
+a neural network, maybe logistic
+或者是逻辑回归算法
+
+80
+00:02:48,280 --> 00:02:49,450
+regression, and I can
+我可以
+
+81
+00:02:49,750 --> 00:02:51,990
+learn the hypothesis H, that
+学习假设函数H
+
+82
+00:02:52,230 --> 00:02:53,830
+takes this input, these low-dimensional
+根据输入,使用低维
+
+83
+00:02:54,330 --> 00:02:56,230
+representations Z and tries to make predictions.
+的Z,尝试去进行预测
+
+84
+00:02:57,890 --> 00:02:59,030
+So if I were using logistic
+所以,如果我使用逻辑回归
+
+85
+00:02:59,460 --> 00:03:00,880
+regression for example, I would
+我将要
+
+86
+00:03:01,060 --> 00:03:02,760
+train a hypothesis that outputs, you know,
+训练一个获得输出的假设函数,你知道
+
+87
+00:03:03,080 --> 00:03:04,020
+one over one plus E to
+1加上一个E的
+
+88
+00:03:04,180 --> 00:03:06,020
+the negative-theta transpose
+负的theta的转置
+
+89
+00:03:07,620 --> 00:03:10,150
+Z, that
+Z,这样
+
+90
+00:03:10,610 --> 00:03:11,530
+takes this input to one these
+输入一个Z矩阵
+
+91
+00:03:11,960 --> 00:03:13,660
+z vectors, and tries to make a prediction.
+就可以尝试去进行预测
+
+92
+00:03:15,260 --> 00:03:16,310
+And finally, if you have
+最终,如果你
+
+93
+00:03:16,630 --> 00:03:17,800
+a new example, maybe a new
+有一个新的样本,可能是一个
+
+94
+00:03:17,920 --> 00:03:20,060
+test example X. What
+新的测试样本X,
+
+95
+00:03:20,220 --> 00:03:21,340
+you do is you would
+你要做的是
+
+96
+00:03:22,130 --> 00:03:23,730
+take your test example x,
+你将要使用测试样本的x
+
+97
+00:03:24,960 --> 00:03:26,590
+map it through the same mapping
+通过PCA的映射关系
+
+98
+00:03:26,990 --> 00:03:27,860
+that was found by PCA
+进行映射
+
+99
+00:03:28,220 --> 00:03:29,610
+to get you your corresponding z.
+获得对应的z
+
+100
+00:03:30,390 --> 00:03:31,280
+And that z then gets
+获得z后
+
+101
+00:03:31,950 --> 00:03:33,740
+fed to this hypothesis, and this
+把z带入到假设函数中
+
+102
+00:03:33,910 --> 00:03:35,450
+hypothesis then makes a
+这个假设函数
+
+103
+00:03:35,750 --> 00:03:36,740
+prediction on your input x.
+可以预测你输入的x
+
+104
+00:03:38,110 --> 00:03:40,090
+One final note, what PCA does
+最后,PCA如何
+
+105
+00:03:40,510 --> 00:03:42,350
+is it defines a mapping from
+定义一个映射
+
+106
+00:03:42,710 --> 00:03:45,090
+x to z and
+从x到z
+
+107
+00:03:45,960 --> 00:03:46,970
+this mapping from x to
+这个映射从x到
+
+108
+00:03:47,050 --> 00:03:48,280
+z should be defined by running
+z应该被定义
+
+109
+00:03:48,580 --> 00:03:50,840
+PCA only on the training sets.
+只在训练集上执行PCA
+
+110
+00:03:51,650 --> 00:03:53,310
+And in particular, this mapping that
+尤其特别的是,这个映射是
+
+111
+00:03:53,530 --> 00:03:54,770
+PCA is learning, right, this
+PCA的学习过程,没错
+
+112
+00:03:54,950 --> 00:03:57,650
+mapping, what that does is it computes the set of parameters.
+这个映射,是计算一系列的参数
+
+113
+00:03:58,210 --> 00:04:00,500
+That's the feature scaling and mean normalization.
+是特征的缩放和归一化过程
+
+114
+00:04:01,240 --> 00:04:04,040
+And there's also computing this matrix U reduced.
+这里经常计算矩阵Ureduced
+
+115
+00:04:04,680 --> 00:04:05,510
+But all of these things that
+但所有这些事情
+
+116
+00:04:05,670 --> 00:04:06,980
+U reduce, that's like a
+计算矩阵Ureduced,就像
+
+117
+00:04:07,120 --> 00:04:08,420
+parameter that is learned
+通过学习PCA得到的参数
+
+118
+00:04:08,670 --> 00:04:09,950
+by PCA and we should
+我们应该
+
+119
+00:04:10,150 --> 00:04:12,270
+be fitting our parameters only to
+拟合这些参数,仅仅在
+
+120
+00:04:12,480 --> 00:04:13,990
+our training sets and not
+训练集上
+
+121
+00:04:14,040 --> 00:04:16,250
+to our cross validation or test sets and
+而不是交叉验证或者在测试集上
+
+122
+00:04:16,370 --> 00:04:17,560
+so these things the U reduced
+求解Ureduced这些事情
+
+123
+00:04:18,180 --> 00:04:19,460
+so on, that should be
+就是说
+
+124
+00:04:19,820 --> 00:04:22,430
+obtained by running PCA only on your training set.
+只在训练集上运行,得到的PCA
+
+125
+00:04:23,300 --> 00:04:26,930
+And then having found U reduced, or having found the parameters for feature
+然后,找到Ureduced,或者找到特征参数
+
+126
+00:04:27,350 --> 00:04:28,620
+scaling where the mean normalization
+缩放,即归一化
+
+127
+00:04:29,320 --> 00:04:31,790
+and scaling the scale
+和缩放规模
+
+128
+00:04:32,180 --> 00:04:34,500
+that you divide the features by to get them on to comparable scales.
+使得你划分的特征,让他们具有可比性
+
+129
+00:04:34,760 --> 00:04:36,010
+Having found all those parameters
+找到所有的参数
+
+130
+00:04:36,570 --> 00:04:38,010
+on the training set, you can
+在训练集上
+
+131
+00:04:38,220 --> 00:04:41,560
+then apply the same mapping to other examples that may be
+于是,你可以应用这相同的映射到其他样本中
+
+132
+00:04:41,820 --> 00:04:45,020
+In your cross-validation sets or
+如在你交叉验证集
+
+133
+00:04:45,180 --> 00:04:46,680
+in your test sets, OK?
+或者你的测试集,OK?
+
+134
+00:04:47,150 --> 00:04:48,340
+Just to summarize, when you're
+总结一下,当你
+
+135
+00:04:48,450 --> 00:04:49,790
+running PCA, run your
+在运行PCA时,运行
+
+136
+00:04:49,900 --> 00:04:51,070
+PCA only on the
+PCA仅仅在
+
+137
+00:04:51,220 --> 00:04:52,450
+training set portion of the
+训练集部分的数据
+
+138
+00:04:52,490 --> 00:04:55,880
+data not the cross-validation set or the test set portion of your data.
+不可以是在交叉验证集和测试集数据部分
+
+139
+00:04:56,410 --> 00:04:57,620
+And that defines the mapping from
+定义了x到z的映射
+
+140
+00:04:57,870 --> 00:04:58,770
+x to z and you can
+你可以
+
+141
+00:04:58,950 --> 00:05:00,320
+then apply that mapping to
+应用这个映射到
+
+142
+00:05:00,560 --> 00:05:02,240
+your cross-validation set and your
+你的交叉验证集
+
+143
+00:05:02,290 --> 00:05:03,370
+test set and by the
+和你的测试集
+
+144
+00:05:03,450 --> 00:05:04,660
+way in this example I talked
+顺便说一下,在这个例子中,
+
+145
+00:05:05,000 --> 00:05:06,660
+about reducing the data from
+我讲了减少一万维的数据
+
+146
+00:05:06,950 --> 00:05:08,510
+ten thousand dimensional to one
+到
+
+147
+00:05:08,740 --> 00:05:10,350
+thousand dimensional, this is actually
+一千维的数据,实际上
+
+148
+00:05:10,660 --> 00:05:11,950
+not that unrealistic. For many
+这不是说不现实的
+
+149
+00:05:12,280 --> 00:05:14,720
+problems we actually reduce the dimensional data. You
+在许多问题上,我们需要去减少数据的维度
+
+150
+00:05:17,600 --> 00:05:18,700
+know by 5x maybe by 10x
+你可以减少5倍或者10倍
+
+151
+00:05:18,780 --> 00:05:20,910
+and still retain most of the variance and we can do this
+仍然保留较大的方差,我们这样做可以
+
+152
+00:05:21,270 --> 00:05:22,680
+barely effecting the performance,
+几乎不影响性能
+
+153
+00:05:23,900 --> 00:05:25,840
+in terms of classification accuracy, let's say,
+分类精度,比方说
+
+154
+00:05:26,240 --> 00:05:27,970
+barely affecting the classification
+几乎不影响分类
+
+155
+00:05:28,770 --> 00:05:30,320
+accuracy of the learning algorithm.
+算法的准确度
+
+156
+00:05:31,090 --> 00:05:32,140
+And by working with lower dimensional
+而且通过较低的维度
+
+157
+00:05:32,590 --> 00:05:33,730
+data our learning algorithm
+数据,我们的学习算法
+
+158
+00:05:34,060 --> 00:05:36,500
+can often run much much faster.
+通常运行的非常快
+
+159
+00:05:36,910 --> 00:05:38,120
+To summarize, we've so far talked
+总而言之,我们到目前为止
+
+160
+00:05:38,410 --> 00:05:40,920
+about the following applications of PCA.
+我们讨论下PCA应用
+
+161
+00:05:41,970 --> 00:05:43,780
+First is the compression application where
+首先,压缩应用
+
+162
+00:05:44,020 --> 00:05:45,140
+we might do so to reduce
+我们可能需要这样做,
+
+163
+00:05:45,500 --> 00:05:46,440
+the memory or the disk space
+来减少存储器或硬盘空间
+
+164
+00:05:46,590 --> 00:05:47,960
+needed to store data and we
+用来存储数据
+
+165
+00:05:48,240 --> 00:05:49,390
+just talked about how to
+我们仅仅讨论如何
+
+166
+00:05:49,460 --> 00:05:51,630
+use this to speed up a learning algorithm.
+使用该方法去加速学习算法
+
+167
+00:05:52,100 --> 00:05:53,870
+In these applications, in order
+在该应用中,
+
+168
+00:05:54,130 --> 00:05:56,240
+to choose K, often we'll
+需要去选择一个K,我们经常
+
+169
+00:05:56,420 --> 00:05:58,770
+do so according to, figuring
+这样做的依据是,计算出
+
+170
+00:05:59,160 --> 00:06:00,590
+out what is the percentage of
+方差保留的百分比
+
+171
+00:06:00,810 --> 00:06:03,880
+variance retained, and so
+通常
+
+172
+00:06:04,780 --> 00:06:06,320
+for this learning algorithm, speed
+这种学习算法
+
+173
+00:06:06,570 --> 00:06:10,050
+up application often will retain 99% of the variance.
+加速应用,需要保留99%的方差
+
+174
+00:06:10,530 --> 00:06:11,690
+That would be a very typical choice
+这是一种非常典型的
+
+175
+00:06:12,100 --> 00:06:14,270
+for how to choose k. So
+如何去选择k的方式
+
+176
+00:06:14,730 --> 00:06:16,640
+that's how you choose k for these compression applications.
+所以对于压缩应用,就是你如何选择k
+
+177
+00:06:17,850 --> 00:06:19,590
+Whereas for visualization applications
+而对于可视化应用
+
+178
+00:06:20,760 --> 00:06:22,100
+while usually we know
+虽然我们通常知道
+
+179
+00:06:22,230 --> 00:06:23,550
+how to plot only two dimensional
+如何去标记二维数据
+
+180
+00:06:24,020 --> 00:06:25,520
+data or three dimensional data,
+或三维数据
+
+181
+00:06:26,540 --> 00:06:28,650
+and so for visualization applications, we'll
+对于可视化应用程序
+
+182
+00:06:28,830 --> 00:06:29,660
+usually choose k equals 2
+我们通常选择k等于2
+
+183
+00:06:29,710 --> 00:06:31,930
+or k equals 3, because we can plot
+或者k等于3,因为我们可以
+
+184
+00:06:32,740 --> 00:06:33,500
+only 2D and 3D data sets.
+标记2D和3D数据集
+
+185
+00:06:34,510 --> 00:06:35,720
+So that summarizes the main
+总结一下,
+
+186
+00:06:36,020 --> 00:06:37,230
+applications of PCA, as well
+应用PCA的主要是
+
+187
+00:06:37,870 --> 00:06:39,580
+as how to choose the
+如何去选择
+
+188
+00:06:39,670 --> 00:06:41,540
+value of k for these different applications.
+k值,对于不同的应用
+
+189
+00:06:42,890 --> 00:06:45,710
+I should mention that
+我必须提出
+
+190
+00:06:46,400 --> 00:06:48,100
+there is often one frequent misuse of PCA and
+PCA算法经常被滥用
+
+191
+00:06:48,800 --> 00:06:50,300
+you sometimes hear about others
+有时候你会听到
+
+192
+00:06:50,580 --> 00:06:51,820
+doing this hopefully not too often.
+别人希望这样去做,不是太频繁
+
+193
+00:06:52,230 --> 00:06:54,780
+I just want to mention this so that you know not to do it.
+我只是仅仅提一下,以至于你知道不要这样去做
+
+194
+00:06:55,480 --> 00:06:56,460
+And there is one bad use of
+这里有一个非常坏的PCA应用
+
+195
+00:06:56,540 --> 00:06:59,170
+PCA, which iss to try to use it to prevent over-fitting.
+iss曾经尝试使用PCA去防止过拟合
+
+196
+00:07:00,380 --> 00:07:00,660
+Here's the reasoning.
+这里的推理
+
+197
+00:07:01,910 --> 00:07:03,080
+This is not a great
+不是非常棒的
+
+198
+00:07:03,730 --> 00:07:04,610
+way to use PCA,
+使用PCA方式
+
+199
+00:07:04,670 --> 00:07:05,630
+but here's the reasoning behind
+但该方法背后的原因是
+
+200
+00:07:05,690 --> 00:07:07,080
+this method, which is,you know
+你知道
+
+201
+00:07:07,350 --> 00:07:09,090
+if we have Xi, then
+如果我们有Xi
+
+202
+00:07:09,300 --> 00:07:10,660
+maybe we'll have n features, but
+可能我们将要有n个特征
+
+203
+00:07:10,830 --> 00:07:12,640
+if we compress the data, and
+但是如果我们压缩这些数据
+
+204
+00:07:12,750 --> 00:07:13,700
+use Zi instead
+使用Zi替代
+
+205
+00:07:14,270 --> 00:07:15,410
+and that reduces the number
+减少
+
+206
+00:07:15,560 --> 00:07:17,050
+of features to k, which
+特征数量k
+
+207
+00:07:17,290 --> 00:07:19,300
+could be much lower dimensional. And
+可以得到很低的维度
+
+208
+00:07:19,410 --> 00:07:21,130
+so if we have a much smaller
+因此,如果你有一个非常少的
+
+209
+00:07:21,490 --> 00:07:22,520
+number of features, if k
+数量的特征
+
+210
+00:07:22,770 --> 00:07:25,800
+is 1,000 and n is
+比如k是1,000
+
+211
+00:07:26,090 --> 00:07:27,010
+10,000, then if we have
+n是10,000
+
+212
+00:07:27,780 --> 00:07:29,390
+only 1,000 dimensional data, maybe
+但是我们只有1,000维的数据
+
+213
+00:07:29,670 --> 00:07:30,580
+we're less likely to over-fit
+可能我们不太可能过拟合
+
+214
+00:07:31,260 --> 00:07:32,230
+than if we were using 10,000-dimensional
+相反,如果我们使用10,000维的数据
+
+215
+00:07:33,280 --> 00:07:34,980
+data with like a thousand features.
+1,000个特征
+
+216
+00:07:35,950 --> 00:07:37,160
+So some people think
+一些人就会认为
+
+217
+00:07:37,360 --> 00:07:39,360
+of PCA as a way to prevent over-fitting.
+PCA是一种防止过拟合的方法
+
+218
+00:07:39,950 --> 00:07:41,940
+But just to emphasize this
+但是,仅仅强调这一点
+
+219
+00:07:42,110 --> 00:07:44,000
+is a bad application of PCA
+是PCA的非常糟糕的应用
+
+220
+00:07:44,260 --> 00:07:46,080
+and I do not recommend doing this.
+我不建议去这样做
+
+221
+00:07:46,520 --> 00:07:48,430
+And it's not that this method works badly.
+这并不是说该方法工作不好
+
+222
+00:07:49,000 --> 00:07:49,920
+If you want to use
+如果你想去
+
+223
+00:07:50,330 --> 00:07:51,560
+this method to reduce the dimensional
+使用该方法去减少数据维度
+
+224
+00:07:51,890 --> 00:07:52,830
+data, to try to prevent over-fitting,
+防止过拟合
+
+225
+00:07:53,690 --> 00:07:54,830
+it might actually work OK.
+它可能效果也会很好
+
+226
+00:07:55,560 --> 00:07:56,720
+But this just is not
+但是,这仅仅
+
+227
+00:07:57,040 --> 00:07:58,340
+a good way to address
+不是一种非常好的方式
+
+228
+00:07:58,680 --> 00:08:00,390
+over-fitting and instead, if you're
+处理过拟合问题,相反,如果
+
+229
+00:08:00,510 --> 00:08:01,810
+worried about over-fitting, there is
+你担心过拟合
+
+230
+00:08:02,030 --> 00:08:03,420
+a much better way to address
+这有一种非常好的方式去防止过拟合发生
+
+231
+00:08:03,800 --> 00:08:05,680
+it, to use regularization instead of
+使用规则化来代替使用PCA
+
+232
+00:08:05,900 --> 00:08:07,910
+using PCA to reduce the dimension of the data.
+来减少数据维度
+
+233
+00:08:08,670 --> 00:08:10,000
+And the reason is, if
+理由是,
+
+234
+00:08:11,010 --> 00:08:12,150
+you think about how PCA works,
+你想一下,PCA是如何工作的
+
+235
+00:08:12,900 --> 00:08:13,950
+it does not use the labels y.
+它不需要使用标签y
+
+236
+00:08:14,530 --> 00:08:15,680
+You are just looking
+你仅仅是使用
+
+237
+00:08:16,050 --> 00:08:17,220
+at your inputs xi, and you're
+输入的xi
+
+238
+00:08:17,340 --> 00:08:19,070
+using that to find a
+你是使用它去寻找
+
+239
+00:08:19,130 --> 00:08:21,150
+lower-dimensional approximation to your data.
+低维对你的数据进行近似
+
+240
+00:08:21,390 --> 00:08:22,840
+So what PCA does,
+所以PCA
+
+241
+00:08:23,190 --> 00:08:25,410
+is it throws away some information.
+会舍掉一些信息
+
+242
+00:08:26,460 --> 00:08:28,040
+It throws away or reduces the
+它扔掉或减少
+
+243
+00:08:28,180 --> 00:08:29,680
+dimension of your data without
+数据的维度
+
+244
+00:08:30,110 --> 00:08:31,390
+knowing what the values of y
+不关心y值是什么
+
+245
+00:08:32,380 --> 00:08:33,700
+is, so this is probably
+所以这可能是较好的使用PCA
+
+246
+00:08:34,250 --> 00:08:35,770
+okay using PCA this way
+这种方式
+
+247
+00:08:35,920 --> 00:08:37,750
+is probably okay if, say
+可能是好的
+
+248
+00:08:37,990 --> 00:08:39,190
+99 percent of the
+99%的
+
+249
+00:08:39,410 --> 00:08:40,400
+variance is retained, if you're keeping most
+方差信息被保留,如果你保持
+
+250
+00:08:40,830 --> 00:08:41,970
+of the variance, but
+较多的方差
+
+251
+00:08:42,100 --> 00:08:44,230
+it might also throw away some valuable information.
+但是,它也可能会丢掉更多的有价值的信息
+
+252
+00:08:45,010 --> 00:08:45,980
+And it turns out that
+事实证明
+
+253
+00:08:46,310 --> 00:08:47,580
+if you're retaining 99% of
+如果你保留了99%的方差
+
+254
+00:08:47,820 --> 00:08:49,260
+the variance or 95%
+或者95%的方差
+
+255
+00:08:49,360 --> 00:08:50,940
+of the variance or whatever, it
+诸如此类
+
+256
+00:08:51,020 --> 00:08:52,310
+turns out that just using
+事实证明,只使用
+
+257
+00:08:52,720 --> 00:08:54,650
+regularization will often give
+规则化常常会给你带来
+
+258
+00:08:54,790 --> 00:08:56,010
+you at least as good
+至少一样好的效果
+
+259
+00:08:56,220 --> 00:08:57,880
+a method for preventing over-fitting
+来防止过拟合
+
+260
+00:08:58,900 --> 00:09:00,340
+and regularization will often just
+规则化
+
+261
+00:09:00,590 --> 00:09:02,220
+work better, because when you
+常常会工作的更好,因为
+
+262
+00:09:02,350 --> 00:09:03,890
+are applying linear regression or logistic
+当你是应用线性回归或者逻辑回归
+
+263
+00:09:04,250 --> 00:09:05,240
+regression or some other method
+或其它的一些方法
+
+264
+00:09:05,600 --> 00:09:07,390
+with regularization, well, this minimization
+进行规则化时,这个最小化问题
+
+265
+00:09:08,010 --> 00:09:09,420
+problem actually knows what the
+实际上是知道
+
+266
+00:09:09,480 --> 00:09:10,740
+values of y are, and
+y的值
+
+267
+00:09:10,960 --> 00:09:12,680
+so is less likely to throw
+所以不太可能
+
+268
+00:09:12,880 --> 00:09:14,330
+away some valuable information, whereas
+损失掉一些值的信息
+
+269
+00:09:14,730 --> 00:09:15,790
+PCA doesn't make use
+然而,PCA不使用
+
+270
+00:09:16,060 --> 00:09:17,810
+of the labels and is more
+标签
+
+271
+00:09:17,850 --> 00:09:19,940
+likely to throw away valuable information.
+更有可能丢失一些值的信息
+
+272
+00:09:20,230 --> 00:09:21,370
+So, to summarize, it is
+因此,总结一下
+
+273
+00:09:21,620 --> 00:09:22,900
+a good use of PCA, if your
+使用PCA比较好的方式
+
+274
+00:09:23,010 --> 00:09:24,380
+main motivation to speed up
+你关心
+
+275
+00:09:24,530 --> 00:09:26,490
+your learning algorithm, but using
+学习算法的速度,
+
+276
+00:09:26,790 --> 00:09:28,360
+PCA to prevent over-fitting, that
+但是使用PCA去防止过拟合
+
+277
+00:09:28,650 --> 00:09:29,630
+is not a good use of
+不是一种非常好的方式
+
+278
+00:09:30,030 --> 00:09:32,270
+PCA, and using regularization instead
+使用规则化来代替
+
+279
+00:09:32,900 --> 00:09:36,190
+is really what many people
+正确的,许多人
+
+280
+00:09:36,440 --> 00:09:40,490
+would recommend doing instead. Finally,
+建议这样取代,最后
+
+281
+00:09:41,310 --> 00:09:43,350
+one last misuse of PCA.
+减少对PCA的滥用
+
+282
+00:09:43,750 --> 00:09:45,760
+And so I should say PCA is a very useful algorithm,
+所以我应该说主成分分析是一种非常有用的算法,
+
+283
+00:09:46,270 --> 00:09:49,170
+I often use it for the compression on the visualization purposes.
+我经常使用它来达到压缩可视化文件的目的
+
+284
+00:09:50,230 --> 00:09:51,400
+But, what I sometimes
+但是,我有时候发现
+
+285
+00:09:51,570 --> 00:09:53,310
+see, is also people sometimes
+一些人在有时候
+
+286
+00:09:53,710 --> 00:09:56,080
+use PCA where it shouldn't be.
+在不需要使用PCA的时候使用PCA
+
+287
+00:09:56,220 --> 00:09:57,940
+So, here's a pretty common thing that
+因此,这有一个很常见的事情
+
+288
+00:09:58,030 --> 00:09:59,140
+I see, which is if someone
+我发现,
+
+289
+00:09:59,330 --> 00:10:00,330
+is designing a machine-learning system,
+如果一些人在设计机器学习系统时,
+
+290
+00:10:01,010 --> 00:10:02,130
+they may write down the
+他们可能写下
+
+291
+00:10:02,200 --> 00:10:04,150
+plan like this: let's design a learning system.
+这样一个计划:让我们设计一个学习系统
+
+292
+00:10:05,060 --> 00:10:06,080
+Get a training set and then,
+获得训练集
+
+293
+00:10:06,570 --> 00:10:07,350
+you know, what I'm going to
+你知道,我们将要
+
+294
+00:10:07,400 --> 00:10:08,700
+do is run PCA, then train
+执行PCA
+
+295
+00:10:08,860 --> 00:10:11,200
+logistic regression and then test on my test data.
+接下来,训练逻辑回归,然后测试我们的数据
+
+296
+00:10:11,680 --> 00:10:12,770
+So often at the very
+所以,经常
+
+297
+00:10:13,090 --> 00:10:14,360
+start of a project,
+在一个项目开始时
+
+298
+00:10:14,600 --> 00:10:15,600
+someone will just write out a
+一些人会写下
+
+299
+00:10:15,720 --> 00:10:16,980
+project plan than says lets
+一个工程计划
+
+300
+00:10:17,310 --> 00:10:18,610
+do these four steps with PCA inside.
+包括PCA在内的四步计划
+
+301
+00:10:20,210 --> 00:10:21,220
+Before writing down a project
+在写下项目计划之前
+
+302
+00:10:21,530 --> 00:10:23,350
+plan the incorporates PCA like
+结合PCA
+
+303
+00:10:23,560 --> 00:10:24,860
+this, one very good
+一个非常好的问题被提出
+
+304
+00:10:25,030 --> 00:10:27,110
+question to ask is, well, what if we
+如果我们
+
+305
+00:10:27,630 --> 00:10:28,560
+were to just do the whole
+如何直接去做
+
+306
+00:10:29,540 --> 00:10:31,470
+without using PCA.
+而不使用PCA
+
+307
+00:10:32,170 --> 00:10:33,450
+And often people do not
+通常在人们
+
+308
+00:10:33,800 --> 00:10:34,940
+consider this step before
+不考虑这一步
+
+309
+00:10:35,440 --> 00:10:37,080
+coming up with a complicated project plan and
+在提出一个复杂的项目计划
+
+310
+00:10:37,920 --> 00:10:40,620
+implementing PCA and so on.
+和实现PCA之前
+
+311
+00:10:40,810 --> 00:10:42,360
+And sometime, and so specifically,
+有些情况下,具体地说
+
+312
+00:10:43,050 --> 00:10:44,300
+what I often advise people
+我常常建议人们
+
+313
+00:10:44,670 --> 00:10:45,980
+is, before you implement
+在实现PCA之前
+
+314
+00:10:46,450 --> 00:10:47,970
+PCA, I would first
+我首先建议
+
+315
+00:10:48,220 --> 00:10:49,410
+suggest that, you know, do
+你知道
+
+316
+00:10:49,600 --> 00:10:50,770
+whatever it is, take whatever it
+无论做什么
+
+317
+00:10:50,850 --> 00:10:52,030
+is you want to do and first
+执行一切你想做的
+
+318
+00:10:52,450 --> 00:10:53,650
+consider doing it with your
+首先考虑
+
+319
+00:10:53,980 --> 00:10:56,420
+original raw data xi, and
+你最原始的数据xi
+
+320
+00:10:56,600 --> 00:10:57,860
+only if that doesn't do
+只有不去做你想要的
+
+321
+00:10:57,960 --> 00:10:59,650
+what you want, then implement PCA before using Zi.
+然后,在使用Zi之前,应用PCA
+
+322
+00:11:01,010 --> 00:11:02,420
+So, before using PCA you know,
+因此,使用PCA之前
+
+323
+00:11:03,030 --> 00:11:03,930
+instead of reducing the dimension
+代替减少数据维度
+
+324
+00:11:04,360 --> 00:11:05,710
+of the data, I would consider
+而应该好好考虑
+
+325
+00:11:06,640 --> 00:11:08,020
+well, let's ditch this PCA step,
+让我们抛弃PCA这一步
+
+326
+00:11:08,420 --> 00:11:09,690
+and I would consider, let's
+我需要考虑
+
+327
+00:11:10,040 --> 00:11:11,460
+just train my learning algorithm
+仅仅在我的原始数据上
+
+328
+00:11:12,440 --> 00:11:13,560
+on my original data.
+训练学习算法
+
+329
+00:11:14,410 --> 00:11:15,990
+Let's just use my original raw
+让我们仅仅使用我原始的
+
+330
+00:11:16,300 --> 00:11:17,770
+inputs xi, and I would
+输入数据xi
+
+331
+00:11:18,180 --> 00:11:19,550
+recommend, instead of putting
+我建议,取代
+
+332
+00:11:19,720 --> 00:11:20,910
+PCA into the algorithm, just
+把PCA放入到算法中
+
+333
+00:11:21,030 --> 00:11:23,250
+try doing whatever it is you're doing with the xi first.
+首先去尝试仅仅使用xi
+
+334
+00:11:24,090 --> 00:11:25,000
+And only if you have
+除非
+
+335
+00:11:25,150 --> 00:11:26,180
+a reason to believe that doesn't
+你认为存在一个原因导致不能工作
+
+336
+00:11:26,480 --> 00:11:27,590
+work, so that only if your
+比如
+
+337
+00:11:27,790 --> 00:11:29,470
+learning algorithm ends up
+你的学习算法结束
+
+338
+00:11:29,510 --> 00:11:31,100
+running too slowly, or only if
+运行太慢
+
+339
+00:11:31,280 --> 00:11:32,680
+the memory requirement or the
+或者内存
+
+340
+00:11:32,910 --> 00:11:34,140
+disk space requirement is too large,
+或者硬盘空间需要太大
+
+341
+00:11:34,430 --> 00:11:35,850
+so you want to compress your
+因此你需要去压缩
+
+342
+00:11:36,190 --> 00:11:37,810
+representation, but if only
+你的表示方法
+
+343
+00:11:38,000 --> 00:11:39,020
+using the xi doesn't work,
+但是如果仅仅使用xi不能正常工作
+
+344
+00:11:39,360 --> 00:11:40,640
+only if you have evidence or strong
+只有如果你有证据
+
+345
+00:11:40,950 --> 00:11:41,890
+reason to believe that using
+或强有力的原因认为
+
+346
+00:11:42,380 --> 00:11:43,890
+the xi won't work, then implement
+使用xi不能工作
+
+347
+00:11:44,380 --> 00:11:46,730
+PCA and consider using the compressed representation.
+此时使用PCA,来考虑进行压缩
+
+348
+00:11:47,990 --> 00:11:48,830
+Because what I do see, is
+因为我所看到的
+
+349
+00:11:49,100 --> 00:11:50,380
+sometimes people start off with
+有时候,一些人
+
+350
+00:11:50,530 --> 00:11:51,520
+a project plan that incorporates PCA
+开始做项目计划
+
+351
+00:11:52,100 --> 00:11:54,580
+inside, and sometimes they,
+总是把PCA包含在里面
+
+352
+00:11:54,650 --> 00:11:55,620
+whatever they're
+有时候,他们
+
+353
+00:11:55,820 --> 00:11:57,380
+doing will work just
+做的非常好
+
+354
+00:11:57,660 --> 00:11:59,520
+fine, even with out using PCA instead.
+即使没有使用PCA在里面
+
+355
+00:11:59,840 --> 00:12:01,650
+So, just consider that
+因此,仅仅考虑
+
+356
+00:12:01,860 --> 00:12:03,130
+as an alternative as well, before you
+作为一种替代选择
+
+357
+00:12:03,320 --> 00:12:04,170
+go to spend a lot of
+在你花大量时间
+
+358
+00:12:04,300 --> 00:12:05,570
+time to get PCA in, figure
+去获得PCA,
+
+359
+00:12:05,770 --> 00:12:08,100
+out what k is and so on.
+计算出k等之前
+
+360
+00:12:08,250 --> 00:12:09,330
+So, that's it for PCA.
+这就是主成份分析
+
+361
+00:12:09,800 --> 00:12:11,000
+Despite these last sets of
+尽管最近的结论说
+
+362
+00:12:11,080 --> 00:12:12,380
+comments, PCA is an
+PCA是
+
+363
+00:12:12,690 --> 00:12:14,060
+incredibly useful algorithm, when you
+非常有用的算法
+
+364
+00:12:14,150 --> 00:12:15,330
+use it for the appropriate applications
+当你在适当的应用程序使用它时
+
+365
+00:12:16,070 --> 00:12:17,480
+and I've actually used PCA pretty
+实际上,我经常使用PCA
+
+366
+00:12:17,770 --> 00:12:19,330
+often and for me,
+对我来说,非常频繁
+
+367
+00:12:19,580 --> 00:12:20,650
+I use it mostly to speed
+我主要使用PCA
+
+368
+00:12:20,850 --> 00:12:22,150
+up the running time of my learning algorithms.
+来提高我的学习算法的运行速度
+
+369
+00:12:22,880 --> 00:12:24,310
+But I think, just as
+但是,我认为
+
+370
+00:12:24,400 --> 00:12:25,690
+common an application of PCA,
+仅仅常见的应用可以使用PCA
+
+371
+00:12:26,020 --> 00:12:27,300
+is to use it to
+常常用在
+
+372
+00:12:27,410 --> 00:12:29,030
+compress data, to reduce
+压缩数据
+
+373
+00:12:29,620 --> 00:12:30,650
+the memory or disk space
+减少对内存和硬盘空间需求
+
+374
+00:12:30,990 --> 00:12:33,130
+requirements, or to use it to visualize data.
+或者用来处理图像数据
+
+375
+00:12:34,270 --> 00:12:35,710
+And PCA is one of
+PCA是非常
+
+376
+00:12:35,750 --> 00:12:36,960
+the most commonly used and one
+常用的方式之一
+
+377
+00:12:36,990 --> 00:12:39,420
+of the most powerful unsupervised learning algorithms.
+最强大的无监督算法之一
+
+378
+00:12:40,060 --> 00:12:41,210
+And with what you've learned
+你之前学过的知识
+
+379
+00:12:41,420 --> 00:12:43,120
+in these videos, I think hopefully
+和在该视频学到的内容
+
+380
+00:12:43,500 --> 00:12:44,710
+you'll be able to implement
+我希望你可以去
+
+381
+00:12:45,150 --> 00:12:46,280
+PCA and use them
+通过所有的上述目的
+
+382
+00:12:46,500 --> 00:12:47,930
+through all of these purposes as well
+来更好的使用PCA
+
diff --git a/srt/15 - 1 - Problem Motivation (8 min).srt b/srt/15 - 1 - Problem Motivation (8 min).srt
new file mode 100644
index 00000000..866083b7
--- /dev/null
+++ b/srt/15 - 1 - Problem Motivation (8 min).srt
@@ -0,0 +1,1166 @@
+1
+00:00:00,170 --> 00:00:01,190
+In this next set of videos,
+在接下来的一系列视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,720 --> 00:00:02,680
+I'd like to tell you about
+我将向大家介绍
+
+3
+00:00:03,050 --> 00:00:04,560
+a problem called Anomaly Detection.
+异常检测(Anomaly detection)问题
+
+4
+00:00:05,710 --> 00:00:07,220
+This is a reasonably commonly
+这是机器学习算法
+
+5
+00:00:07,870 --> 00:00:08,740
+use you type machine learning.
+的一个常见应用
+
+6
+00:00:09,580 --> 00:00:10,990
+And one of the interesting aspects
+这种算法的一个有趣之处在于
+
+7
+00:00:11,580 --> 00:00:13,250
+is that it's mainly for
+它虽然主要用于
+
+8
+00:00:14,020 --> 00:00:15,860
+unsupervised problem, that there's some
+非监督学习问题
+
+9
+00:00:16,320 --> 00:00:17,240
+aspects of it that are
+但从某些角度看
+
+10
+00:00:17,510 --> 00:00:20,000
+also very similar to sort of the supervised learning problem.
+它又类似于一些监督学习问题
+
+11
+00:00:21,160 --> 00:00:22,440
+So, what is anomaly detection?
+那么 什么是异常检测呢?
+
+12
+00:00:23,380 --> 00:00:25,000
+To explain it. Let me use
+为了解释这个概念
+
+13
+00:00:25,240 --> 00:00:27,780
+the motivating example of: Imagine
+让我举一个例子吧
+
+14
+00:00:28,440 --> 00:00:30,040
+that you're a manufacturer of
+假想你是一个
+
+15
+00:00:30,330 --> 00:00:32,370
+aircraft engines, and let's
+飞机引擎制造商
+
+16
+00:00:32,600 --> 00:00:33,850
+say that as your aircraft
+当你生产的飞机引擎
+
+17
+00:00:34,280 --> 00:00:35,330
+engines roll off the assembly
+从生产线上流出时
+
+18
+00:00:35,620 --> 00:00:37,580
+line, you're doing, you know, QA or
+你需要进行
+
+19
+00:00:37,820 --> 00:00:39,850
+quality assurance testing, and as
+QA (质量控制测试)
+
+20
+00:00:40,030 --> 00:00:41,340
+part of that testing you
+而作为这个测试的一部分
+
+21
+00:00:41,410 --> 00:00:43,140
+measure features of your
+你测量了飞机引擎的一些特征变量
+
+22
+00:00:43,510 --> 00:00:44,900
+aircraft engine, like maybe, you measure
+比如 你可能测量了
+
+23
+00:00:45,180 --> 00:00:46,820
+the heat generated, things like
+引擎运转时产生的热量
+
+24
+00:00:46,860 --> 00:00:48,340
+the vibrations and so on.
+或者引擎的振动等等
+
+25
+00:00:48,630 --> 00:00:49,570
+I share some friends that worked
+我有一些朋友
+
+26
+00:00:49,860 --> 00:00:50,940
+on this problem a long time
+很早之前就开始进行这类工作
+
+27
+00:00:51,010 --> 00:00:52,610
+ago, and these were actually the
+在实际工作中
+
+28
+00:00:52,710 --> 00:00:53,960
+sorts of features that they were
+他们确实会从真实的飞机引擎
+
+29
+00:00:54,470 --> 00:00:55,910
+collecting off actual aircraft
+采集这些特征变量
+
+30
+00:00:56,280 --> 00:00:58,540
+engines so you
+这样一来
+
+31
+00:00:58,630 --> 00:00:59,570
+now have a data set of
+你就有了一个数据集
+
+32
+00:00:59,700 --> 00:01:01,000
+X1 through Xm, if you have
+从x(1)到x(m)
+
+33
+00:01:01,760 --> 00:01:04,490
+manufactured m aircraft engines,
+如果你生产了m个引擎的话
+
+34
+00:01:05,030 --> 00:01:06,740
+and if you plot your data, maybe it looks like this.
+也许你会将这些数据绘制成图表 看起来就是这个样子
+
+35
+00:01:07,130 --> 00:01:08,640
+So, each point here, each cross
+这里的每个点 每个叉
+
+36
+00:01:08,770 --> 00:01:10,580
+here as one of your unlabeled examples.
+都是你的无标签数据
+
+37
+00:01:11,990 --> 00:01:15,220
+So, the anomaly detection problem is the following.
+这样 异常检测问题可以定义如下
+
+38
+00:01:16,450 --> 00:01:17,770
+Let's say that on, you
+我们假设
+
+39
+00:01:17,880 --> 00:01:18,970
+know, the next day, you
+后来有一天
+
+40
+00:01:19,140 --> 00:01:20,390
+have a new aircraft engine
+你有一个新的飞机引擎
+
+41
+00:01:20,810 --> 00:01:21,860
+that rolls off the assembly line
+从生产线上流出
+
+42
+00:01:22,320 --> 00:01:23,890
+and your new aircraft engine has
+而你的新飞机引擎
+
+43
+00:01:24,160 --> 00:01:25,440
+some set of features x-test.
+有特征变量x-test
+
+44
+00:01:26,290 --> 00:01:27,680
+What the anomaly detection problem is,
+所谓的异常检测问题就是
+
+45
+00:01:27,930 --> 00:01:29,070
+we want to know if this
+我们希望知道
+
+46
+00:01:29,420 --> 00:01:31,310
+aircraft engine is anomalous in
+这个新的飞机引擎是否有某种异常
+
+47
+00:01:31,520 --> 00:01:32,480
+any way, in other words, we want
+或者说
+
+48
+00:01:32,740 --> 00:01:34,110
+to know if, maybe, this engine
+我们希望判断
+
+49
+00:01:34,570 --> 00:01:36,290
+should undergo further testing
+这个引擎是否需要进一步测试
+
+50
+00:01:37,330 --> 00:01:38,370
+because, or if it looks
+因为 如果它看起来
+
+51
+00:01:38,710 --> 00:01:40,560
+like an okay engine, and
+像一个正常的引擎
+
+52
+00:01:40,740 --> 00:01:41,700
+so it's okay to just ship
+那么我们可以直接将它运送到客户那里
+
+53
+00:01:41,880 --> 00:01:43,260
+it to a customer without further testing.
+而不需要进一步的测试
+
+54
+00:01:44,560 --> 00:01:45,670
+So, if your new
+比如说
+
+55
+00:01:45,840 --> 00:01:47,330
+aircraft engine looks like
+如果你的新引擎
+
+56
+00:01:47,540 --> 00:01:49,150
+a point over there, well, you
+对应的点落在这里
+
+57
+00:01:49,260 --> 00:01:50,200
+know, that looks a lot
+那么 你可以认为
+
+58
+00:01:50,360 --> 00:01:51,440
+like the aircraft engines we've seen
+它看起来像我们之前见过的引擎
+
+59
+00:01:51,650 --> 00:01:53,860
+before, and so maybe we'll say that it looks okay.
+因此我们可以直接认为它是正常的
+
+60
+00:01:54,750 --> 00:01:55,740
+Whereas, if your new aircraft
+然而 如果你的新飞机引擎
+
+61
+00:01:56,200 --> 00:01:59,390
+engine, if x-test, you know, were
+如果x-test
+
+62
+00:01:59,620 --> 00:02:00,430
+a point that were out here,
+对应的点在这外面
+
+63
+00:02:00,910 --> 00:02:02,270
+so that if X1 and
+这里x1和
+
+64
+00:02:02,410 --> 00:02:04,800
+X2 are the features of this new example.
+x2是这个新引擎对应的特征变量
+
+65
+00:02:05,360 --> 00:02:06,530
+If x-tests were all the
+如果x-test在外面这么远的地方
+
+66
+00:02:06,590 --> 00:02:08,930
+way out there, then we would call that an anomaly.
+那么我们可以认为这是一个异常
+
+67
+00:02:10,420 --> 00:02:11,640
+and maybe send that aircraft engine
+也许我们需要在向客户发货之前
+
+68
+00:02:12,070 --> 00:02:13,720
+for further testing before we
+进一步检测
+
+69
+00:02:13,870 --> 00:02:15,130
+ship it to a customer, since
+这个引擎
+
+70
+00:02:16,010 --> 00:02:18,340
+it looks very different than
+因为它和我们之前见过的
+
+71
+00:02:18,600 --> 00:02:20,350
+the rest of the aircraft engines we've seen before.
+其他飞机引擎看起来不一样
+
+72
+00:02:21,000 --> 00:02:22,560
+More formally in the anomaly
+如果更正式的定义
+
+73
+00:02:22,960 --> 00:02:24,230
+detection problem, we're give
+异常检测问题
+
+74
+00:02:24,900 --> 00:02:26,160
+some data sets, x1 through
+那么我们有一些数据
+
+75
+00:02:26,280 --> 00:02:28,340
+Xm of examples, and we
+从x(1)到x(m)
+
+76
+00:02:28,460 --> 00:02:29,720
+usually assume that these end
+我们通常假定这m个样本
+
+77
+00:02:29,880 --> 00:02:32,250
+examples are normal or
+都是正常的
+
+78
+00:02:33,120 --> 00:02:34,910
+non-anomalous examples, and we
+或者说都不是异常的
+
+79
+00:02:34,980 --> 00:02:36,100
+want an algorithm to tell us
+然后我们需要一个算法来告诉我们
+
+80
+00:02:36,290 --> 00:02:38,300
+if some new example x-test is anomalous.
+一个新的样本数据x-test是否是异常
+
+81
+00:02:38,850 --> 00:02:40,080
+The approach that we're going
+我们要采取的方法是
+
+82
+00:02:40,130 --> 00:02:41,670
+to take is that given this training
+给定训练集
+
+83
+00:02:42,060 --> 00:02:43,300
+set, given the unlabeled training
+给定无标签的训练集
+
+84
+00:02:43,690 --> 00:02:45,280
+set, we're going to
+我们将
+
+85
+00:02:45,420 --> 00:02:46,920
+build a model for p of
+对数据建一个模型p(x)
+
+86
+00:02:47,020 --> 00:02:48,060
+x. In other words, we're
+也就是说
+
+87
+00:02:48,140 --> 00:02:49,320
+going to build a model for the
+我们将对
+
+88
+00:02:49,520 --> 00:02:51,230
+probability of x, where
+x的分布概率建模
+
+89
+00:02:51,390 --> 00:02:53,330
+x are these features of, say, aircraft engines.
+其中x是这些特征变量 例如飞机引擎
+
+90
+00:02:54,620 --> 00:02:56,290
+And so, having built a
+因此 当我们
+
+91
+00:02:56,530 --> 00:02:57,350
+model of the probability of x
+建立了x的概率模型之后
+
+92
+00:02:58,070 --> 00:02:59,230
+we're then going to say that
+我们就会说
+
+93
+00:02:59,820 --> 00:03:01,280
+for the new aircraft engine, if
+对于新的飞机引擎
+
+94
+00:03:01,520 --> 00:03:04,670
+p of x-test is less
+也就是x-test 如果概率p
+
+95
+00:03:04,920 --> 00:03:07,180
+than some epsilon then
+低于阈值ε
+
+96
+00:03:07,930 --> 00:03:09,170
+we flag this as an anomaly.
+那么就将其标记为异常
+
+97
+00:03:11,410 --> 00:03:12,260
+So we see a new engine
+因此当我们看到一个新的引擎
+
+98
+00:03:12,660 --> 00:03:13,960
+that, you know, has very low probability
+在我们根据训练数据
+
+99
+00:03:14,850 --> 00:03:15,900
+under a model p of
+得到的p(x)模型中
+
+100
+00:03:16,020 --> 00:03:17,130
+x that we estimate from the data,
+概率非常低时
+
+101
+00:03:17,790 --> 00:03:19,370
+then we flag this anomaly, whereas
+我们就将其标记为异常
+
+102
+00:03:19,730 --> 00:03:21,880
+if p of x-test is, say,
+反之 如果x-test的概率p
+
+103
+00:03:22,320 --> 00:03:24,110
+greater than or equal to some small threshold.
+大于给定的阈值ε
+
+104
+00:03:25,120 --> 00:03:26,620
+Then we say that, you know, okay, it looks okay.
+我们就认为它是正常的
+
+105
+00:03:27,780 --> 00:03:28,740
+And so, given the training set,
+因此 给定图中的
+
+106
+00:03:28,980 --> 00:03:30,890
+like that plotted here, if
+这个训练集
+
+107
+00:03:31,060 --> 00:03:31,940
+you build a model, hopefully
+如果你建立了一个模型
+
+108
+00:03:32,560 --> 00:03:34,020
+you will find that aircraft engines,
+你将很可能发现飞机引擎
+
+109
+00:03:34,470 --> 00:03:35,500
+or hopefully the model p of
+很可能发现模型p(x)
+
+110
+00:03:35,560 --> 00:03:37,070
+x will say that points that
+将会认为
+
+111
+00:03:37,260 --> 00:03:38,540
+lie, you know, somewhere in the
+在中心区域的这些点
+
+112
+00:03:38,580 --> 00:03:39,550
+middle, that's pretty high probability,
+有很大的概率值
+
+113
+00:03:40,720 --> 00:03:42,830
+whereas points a little bit further out have lower probability.
+而稍微远离中心区域的点概率会小一些
+
+114
+00:03:43,850 --> 00:03:45,050
+Points that are even further out
+更远的地方的点
+
+115
+00:03:45,530 --> 00:03:47,220
+have somewhat lower probability, and the
+它们的概率将更小
+
+116
+00:03:47,480 --> 00:03:48,420
+point that's way out here,
+这外面的点
+
+117
+00:03:49,080 --> 00:03:50,400
+the point that's way
+和这外面的点
+
+118
+00:03:50,520 --> 00:03:52,100
+out there, would be an anomaly.
+将成为异常点
+
+119
+00:03:54,150 --> 00:03:55,280
+Whereas the point that's way in
+而这边的点
+
+120
+00:03:55,470 --> 00:03:56,460
+there, right in the
+正好在中心区域的点
+
+121
+00:03:56,520 --> 00:03:57,720
+middle, this would be
+这些点将是正常的
+
+122
+00:03:57,830 --> 00:03:59,080
+okay because p of x
+因为在中心区域
+
+123
+00:03:59,370 --> 00:04:00,300
+right in the middle of that
+p(x)概率值
+
+124
+00:04:00,460 --> 00:04:01,320
+would be very high cause we've
+会非常大
+
+125
+00:04:01,520 --> 00:04:03,320
+seen a lot of points in that region.
+因为我们看到很多点都落在了这个区域
+
+126
+00:04:04,620 --> 00:04:07,580
+Here are some examples of applications of anomaly detection.
+异常检测算法有如下应用案例
+
+127
+00:04:08,450 --> 00:04:09,990
+Perhaps the most common application of
+也许异常检测
+
+128
+00:04:10,080 --> 00:04:11,420
+anomaly detection is actually
+最常见的应用是
+
+129
+00:04:11,560 --> 00:04:13,260
+for detection if you
+是欺诈检测
+
+130
+00:04:13,360 --> 00:04:14,820
+have many users, and if
+假设你有很多用户
+
+131
+00:04:15,070 --> 00:04:16,360
+each of your users take different
+你的每个用户
+
+132
+00:04:16,670 --> 00:04:17,740
+activities, you know maybe
+都在从事不同的活动
+
+133
+00:04:17,920 --> 00:04:18,560
+on your website or in the
+也许是在你的网站上
+
+134
+00:04:18,630 --> 00:04:20,180
+physical plant or something, you
+也许是在一个实体工厂之类的地方
+
+135
+00:04:20,300 --> 00:04:23,670
+can compute features of the different users activities.
+你可以对不同的用户活动计算特征变量
+
+136
+00:04:24,830 --> 00:04:25,730
+And what you can do is build
+然后 你可以
+
+137
+00:04:25,940 --> 00:04:27,240
+a model to say, you know,
+建立一个模型
+
+138
+00:04:27,310 --> 00:04:28,960
+what is the probability of different
+用来表示用户表现出
+
+139
+00:04:29,170 --> 00:04:30,730
+users behaving different ways.
+各种行为的可能性
+
+140
+00:04:30,890 --> 00:04:32,280
+What is the probability of a particular vector
+用来表示用户行为
+
+141
+00:04:32,460 --> 00:04:34,590
+of features of a
+对应的特征向量
+
+142
+00:04:34,840 --> 00:04:36,750
+users behavior so you
+出现的概率
+
+143
+00:04:36,900 --> 00:04:38,360
+know examples of features of
+因此 你看到
+
+144
+00:04:38,450 --> 00:04:40,480
+a users activity may be on
+某个用户在网站上行为
+
+145
+00:04:40,650 --> 00:04:41,650
+the website it'd be things like,
+的特征变量是这样的
+
+146
+00:04:42,710 --> 00:04:44,350
+maybe x1 is how often does
+也许x1是用户登陆的频率
+
+147
+00:04:44,840 --> 00:04:46,460
+this user log in, x2, you know, maybe
+x2也许是
+
+148
+00:04:46,850 --> 00:04:47,920
+the number of what
+用户访问
+
+149
+00:04:48,130 --> 00:04:49,330
+pages visited, or the
+某个页面的次数
+
+150
+00:04:49,730 --> 00:04:51,420
+number of transactions, maybe x3
+或者交易次数
+
+151
+00:04:51,440 --> 00:04:52,820
+is, you know, the number of
+也许x3是
+
+152
+00:04:53,120 --> 00:04:53,990
+posts of the users on the
+用户在论坛上发贴的次数
+
+153
+00:04:54,130 --> 00:04:55,850
+forum, feature x4 could
+x4是
+
+154
+00:04:56,000 --> 00:04:56,910
+be what is the typing
+用户的
+
+155
+00:04:57,440 --> 00:04:58,660
+speed of the user and some
+打字速度
+
+156
+00:04:58,920 --> 00:04:59,980
+websites can actually track that
+有些网站是可以记录
+
+157
+00:05:00,280 --> 00:05:01,410
+was the typing speed of this
+用户每秒
+
+158
+00:05:01,600 --> 00:05:03,010
+user in characters per second.
+打了多少个字母的
+
+159
+00:05:03,730 --> 00:05:06,610
+And so you can model p of x based on this sort of data.
+因此你可以根据这些数据建一个模型p(x)
+
+160
+00:05:08,150 --> 00:05:09,140
+And finally having your model
+最后你将得到
+
+161
+00:05:09,270 --> 00:05:10,530
+p of x, you can
+你的模型p(x)
+
+162
+00:05:10,790 --> 00:05:12,570
+try to identify users that
+然后你可以用它来发现
+
+163
+00:05:12,760 --> 00:05:14,210
+are behaving very strangely on your
+你网站上的行为奇怪的用户
+
+164
+00:05:14,350 --> 00:05:15,590
+website by checking which ones have
+你只需要
+
+165
+00:05:16,320 --> 00:05:18,100
+probably effects less than epsilon and
+看哪些用户的p(x)概率小于ε
+
+166
+00:05:18,240 --> 00:05:21,140
+maybe send the profiles of those users for further review.
+接下来 你拿来这些用户的档案 做进一步筛选
+
+167
+00:05:22,330 --> 00:05:24,560
+Or demand additional identification from
+或者要求这些用户
+
+168
+00:05:24,740 --> 00:05:26,190
+those users, or some such
+验证他们的身份
+
+169
+00:05:26,650 --> 00:05:28,370
+to guard against you know,
+从而让你的网站防御
+
+170
+00:05:29,200 --> 00:05:31,650
+strange behavior or fraudulent behavior on your website.
+异常行为或者欺诈行为
+
+171
+00:05:33,030 --> 00:05:34,960
+This sort of technique will tend
+这样的技术将会找到
+
+172
+00:05:35,160 --> 00:05:36,470
+of flag the users that are
+行为不寻常的用户
+
+173
+00:05:36,720 --> 00:05:38,250
+behaving unusually, not just
+而不只是
+
+174
+00:05:39,480 --> 00:05:41,420
+users that maybe behaving fraudulently.
+有欺诈行为的用户
+
+175
+00:05:42,190 --> 00:05:44,030
+So not just constantly having
+也不只是那些
+
+176
+00:05:44,370 --> 00:05:45,670
+stolen or users that are
+被盗号的用户
+
+177
+00:05:45,780 --> 00:05:47,780
+trying to do funny things, or just find unusual users.
+或者有滑稽行为的用户 而是行为不寻常的用户
+
+178
+00:05:48,560 --> 00:05:49,770
+But this is actually the technique
+然而这就是许多
+
+179
+00:05:50,040 --> 00:05:51,430
+that is used by many online
+许多在线购物网站
+
+180
+00:05:52,500 --> 00:05:53,570
+websites that sell things to
+常用来识别异常用户
+
+181
+00:05:53,750 --> 00:05:55,860
+try identify users behaving
+的技术
+
+182
+00:05:56,240 --> 00:05:57,900
+strangely that might be
+这些用户行为奇怪
+
+183
+00:05:58,040 --> 00:05:59,160
+indicative of either fraudulent
+可能是表示他们有欺诈行为
+
+184
+00:05:59,760 --> 00:06:02,420
+behavior or of computer accounts that have been stolen.
+或者是被盗号
+
+185
+00:06:03,580 --> 00:06:06,410
+Another example of anomaly detection is manufacturing.
+异常检测的另一个例子是在工业生产领域
+
+186
+00:06:07,180 --> 00:06:08,470
+So, already talked about the
+事实上 我们之前已经谈到过
+
+187
+00:06:08,530 --> 00:06:09,770
+aircraft engine thing where you can
+飞机引擎的问题
+
+188
+00:06:10,030 --> 00:06:11,460
+find unusual, say, aircraft
+你可以找到异常的飞机引擎
+
+189
+00:06:11,900 --> 00:06:13,600
+engines and send those for further review.
+然后要求进一步细查这些引擎的质量
+
+190
+00:06:15,430 --> 00:06:16,740
+A third application would be
+第三个应用是
+
+191
+00:06:17,070 --> 00:06:19,210
+monitoring computers in a data center.
+数据中心的计算机监控
+
+192
+00:06:19,390 --> 00:06:20,410
+I actually have some friends who work on this too.
+实际上 我有些朋友正在从事这类工作
+
+193
+00:06:21,260 --> 00:06:22,280
+So if you have a lot
+如果你管理一个
+
+194
+00:06:22,580 --> 00:06:23,550
+of machines in a computer
+计算机集群
+
+195
+00:06:23,730 --> 00:06:24,690
+cluster or in a
+或者一个数据中心
+
+196
+00:06:24,780 --> 00:06:25,710
+data center, we can do
+其中有许多计算机
+
+197
+00:06:25,920 --> 00:06:28,560
+things like compute features at each machine.
+那么我们可以为每台计算机计算特征变量
+
+198
+00:06:29,020 --> 00:06:30,650
+So maybe some features capturing
+也许某些特征衡量
+
+199
+00:06:31,170 --> 00:06:32,730
+you know, how much memory used, number of
+计算机的内存消耗
+
+200
+00:06:32,870 --> 00:06:34,280
+disc accesses, CPU load.
+或者硬盘访问量 CPU负载
+
+201
+00:06:35,060 --> 00:06:36,050
+As well as more complex features
+或者一些更加复杂的特征
+
+202
+00:06:36,440 --> 00:06:37,450
+like what is the CPU
+例如一台计算机的
+
+203
+00:06:37,830 --> 00:06:39,650
+load on this machine divided by
+CPU负载
+
+204
+00:06:39,960 --> 00:06:41,340
+the amount of network traffic
+与网络流量
+
+205
+00:06:41,950 --> 00:06:43,050
+on this machine?
+的比值
+
+206
+00:06:43,340 --> 00:06:44,580
+Then given the dataset of how
+那么 给定正常情况下
+
+207
+00:06:44,820 --> 00:06:45,780
+your computers in your data
+数据中心中计算机
+
+208
+00:06:46,070 --> 00:06:47,230
+center usually behave, you can
+的特征变量
+
+209
+00:06:47,390 --> 00:06:48,460
+model the probability of x,
+你可以建立p(x)模型
+
+210
+00:06:48,590 --> 00:06:49,730
+so you can model the probability
+也就是说 你可以建模
+
+211
+00:06:50,350 --> 00:06:51,840
+of these machines having
+这些计算机
+
+212
+00:06:52,840 --> 00:06:53,790
+different amounts of memory use
+出现不同内存消耗的概率
+
+213
+00:06:54,060 --> 00:06:55,200
+or probability of these machines having
+或者出现不同硬盘访问量
+
+214
+00:06:55,920 --> 00:06:57,160
+different numbers of disc accesses
+的概率
+
+215
+00:06:57,780 --> 00:06:59,880
+or different CPU loads and so on.
+或者不同的CPU负载等等
+
+216
+00:07:00,030 --> 00:07:01,100
+And if you ever have a machine
+然后 如果你有一个计算机
+
+217
+00:07:02,030 --> 00:07:03,530
+whose probability of x,
+它的概率p(x)
+
+218
+00:07:03,800 --> 00:07:05,330
+p of x, is very small then you
+非常小
+
+219
+00:07:05,440 --> 00:07:06,880
+know that machine is behaving unusually
+那么你可以认为这个计算机运行不正常
+
+220
+00:07:07,970 --> 00:07:08,950
+and maybe that machine is
+或许它
+
+221
+00:07:09,050 --> 00:07:11,630
+about to go down, and you
+即将停机
+
+222
+00:07:11,700 --> 00:07:13,620
+can flag that for review by a system administrator.
+因此你可以要求系统管理员查看其工作状况
+
+223
+00:07:14,690 --> 00:07:15,890
+And this is actually being used
+目前 这种技术
+
+224
+00:07:16,060 --> 00:07:17,800
+today by various data
+实际正在被各大数据中心使用
+
+225
+00:07:18,020 --> 00:07:19,550
+centers to watch out for unusual
+用来监测
+
+226
+00:07:20,040 --> 00:07:21,430
+things happening on their machines.
+大量计算机可能发生的异常
+
+227
+00:07:22,920 --> 00:07:24,420
+So, that's anomaly detection.
+好啦 这就是异常检测算法
+
+228
+00:07:25,540 --> 00:07:26,880
+In the next video, I'll
+在下一个视频中
+
+229
+00:07:27,120 --> 00:07:29,400
+talk a bit about the Gaussian distribution and
+我们将介绍一下高斯分布
+
+230
+00:07:29,580 --> 00:07:31,030
+review properties of the Gaussian
+回顾一下高斯分布
+
+231
+00:07:31,580 --> 00:07:33,540
+probability distribution, and in
+的一些特征
+
+232
+00:07:33,690 --> 00:07:34,650
+videos after that, we will
+在再下一个视频中
+
+233
+00:07:34,790 --> 00:07:37,390
+apply it to develop an anomaly detection algorithm.
+我们将利用高斯分布来推导一个异常检测算法
+
diff --git a/srt/15 - 2 - Gaussian Distribution (10 min).srt b/srt/15 - 2 - Gaussian Distribution (10 min).srt
new file mode 100644
index 00000000..08060a64
--- /dev/null
+++ b/srt/15 - 2 - Gaussian Distribution (10 min).srt
@@ -0,0 +1,1481 @@
+1
+00:00:00,240 --> 00:00:01,410
+In this video, I'd like to
+在这个视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,560 --> 00:00:03,590
+talk about the Gaussian distribution, which
+我将介绍高斯分布
+
+3
+00:00:03,830 --> 00:00:05,810
+is also called the normal distribution.
+也称为正态分布
+
+4
+00:00:07,430 --> 00:00:08,940
+In case you're already intimately
+如果你已经
+
+5
+00:00:09,620 --> 00:00:11,980
+familiar with the Gaussian distribution, it is
+对高斯分布非常熟悉了
+
+6
+00:00:12,160 --> 00:00:13,810
+probably okay to skip this video.
+那么也许你可以直接跳过这段视频
+
+7
+00:00:14,640 --> 00:00:15,890
+But if you're not sure or
+但是 如果你不确定
+
+8
+00:00:15,970 --> 00:00:16,890
+if it's been a while since you've
+或者你已经有段时间
+
+9
+00:00:17,040 --> 00:00:18,770
+worked with a Gaussian distribution or the normal
+没有接触高斯分布
+
+10
+00:00:19,020 --> 00:00:20,480
+distribution then please do
+或者正态分布了
+
+11
+00:00:20,610 --> 00:00:22,960
+watch this video all the way to the end.
+那么 请从头到尾看完这段视频
+
+12
+00:00:23,220 --> 00:00:24,260
+And in the video after this,
+在下一个视频中
+
+13
+00:00:24,480 --> 00:00:25,740
+we'll start applying the Gaussian
+我们将应用高斯分布
+
+14
+00:00:25,980 --> 00:00:28,890
+distribution to developing an anomaly detection algorithm.
+来推导一套异常检测算法
+
+15
+00:00:31,990 --> 00:00:33,310
+Let's say x is a
+假设x是一个
+
+16
+00:00:33,540 --> 00:00:36,470
+real value random variable, so x is a real number.
+实数随机变量 因此x是一个实数
+
+17
+00:00:37,380 --> 00:00:39,080
+If the probability distribution of
+如果x的概率分布
+
+18
+00:00:39,270 --> 00:00:41,160
+x is Gaussian, it
+服从高斯分布
+
+19
+00:00:41,400 --> 00:00:42,710
+would mean Mu and variant
+其中均值为μ
+
+20
+00:00:43,110 --> 00:00:45,360
+sigma squared, then we'll
+方差为σ平方
+
+21
+00:00:45,540 --> 00:00:47,600
+write this as x the
+那么将它记作
+
+22
+00:00:47,690 --> 00:00:49,270
+random variable tilde.
+随机变量x 波浪号
+
+23
+00:00:51,930 --> 00:00:53,300
+That's this little tilde
+这个小小的波浪号
+
+24
+00:00:53,540 --> 00:00:59,520
+as distributed as and then
+读作 服从...分布
+
+25
+00:00:59,730 --> 00:01:01,550
+to denote the Gaussian Distribution, sometimes
+为了表示高斯分布
+
+26
+00:01:02,070 --> 00:01:03,930
+you're going to write script n, parentheses
+有时你将使用大写字母N
+
+27
+00:01:04,830 --> 00:01:07,140
+Mu, sigma squared.
+括号μ σ平方
+
+28
+00:01:07,470 --> 00:01:09,310
+So, this script's end stands for
+因此这个大写字母N表示
+
+29
+00:01:09,530 --> 00:01:10,920
+normal, since Gaussian and normal
+Normal (正态)
+
+30
+00:01:11,300 --> 00:01:12,170
+distribution, they mean the same
+因为高斯分布就是正态分布
+
+31
+00:01:12,390 --> 00:01:14,660
+phase of synonymous and a
+他们是同义词
+
+32
+00:01:14,780 --> 00:01:16,190
+Gussian distribution is parameterized
+然后 高斯分布
+
+33
+00:01:17,070 --> 00:01:18,430
+by 2 parameters, by a
+有两个参数
+
+34
+00:01:19,010 --> 00:01:20,930
+mean parameter which we
+一个是均值
+
+35
+00:01:21,020 --> 00:01:22,770
+denote Mu, and a variance
+我们记作μ
+
+36
+00:01:23,090 --> 00:01:25,010
+parameter, which we denote by sigma squared.
+另一个是方差 我们记作σ平方
+
+37
+00:01:26,120 --> 00:01:27,270
+If we pluck the Gaussian distribution
+如果我们将高斯分布
+
+38
+00:01:27,990 --> 00:01:30,100
+or Gaussian probability density, it
+的概率密度函数绘制出来
+
+39
+00:01:30,220 --> 00:01:31,760
+will look like the bell shaped
+它看起来将是这样一个钟形的曲线
+
+40
+00:01:32,100 --> 00:01:34,820
+curve, which you may have seen before.
+大家之前可能就见过
+
+41
+00:01:36,230 --> 00:01:37,860
+And so, this bell-shaped curve
+这个钟形曲线
+
+42
+00:01:38,110 --> 00:01:40,350
+is parameterized by those 2 parameters Mu and sigma.
+有两个参数 分别是μ和σ
+
+43
+00:01:41,330 --> 00:01:42,670
+And the location of the
+其中μ控制
+
+44
+00:01:42,930 --> 00:01:44,230
+center of this bell-shaped curve
+这个钟形曲线
+
+45
+00:01:44,580 --> 00:01:46,960
+is the mean Mu, and the
+的中心位置
+
+46
+00:01:47,050 --> 00:01:48,150
+width of this bell-shaped curve,
+σ控制这个钟形曲线的宽度
+
+47
+00:01:49,430 --> 00:01:51,020
+so it's roughly that, is
+因此 参数σ
+
+48
+00:01:51,290 --> 00:01:52,970
+the, this parameter
+有时也称作
+
+49
+00:01:53,500 --> 00:01:55,390
+sigma, is also called one standard deviation.
+一个标准差
+
+50
+00:01:56,540 --> 00:01:58,350
+And so, this specifies the
+这条钟形曲线决定了
+
+51
+00:01:58,530 --> 00:01:59,630
+probability of x taking
+x取不同数值
+
+52
+00:01:59,910 --> 00:02:00,990
+on different values, so x
+的概率密度分布
+
+53
+00:02:01,190 --> 00:02:02,730
+taking on values, you know
+因此 x取中心这些值
+
+54
+00:02:02,810 --> 00:02:03,770
+in the middle here is pretty high
+的概率相当大
+
+55
+00:02:04,020 --> 00:02:05,290
+since the Gaussian density here
+因为高斯分布的概率密度
+
+56
+00:02:05,400 --> 00:02:06,490
+is pretty high whereas
+在这里很大
+
+57
+00:02:06,610 --> 00:02:08,540
+x taking on values further and
+而x取远处和更远处数值
+
+58
+00:02:08,720 --> 00:02:10,310
+further away will be diminishing
+的概率将逐渐降低
+
+59
+00:02:10,860 --> 00:02:12,600
+in probability. Finally, just
+直至消失
+
+60
+00:02:12,920 --> 00:02:13,770
+for completeness, let me write
+最后 为了讲述的完整性
+
+61
+00:02:14,020 --> 00:02:15,260
+out the formula for the Gaussian
+让我写下高斯分布的数学公式
+
+62
+00:02:16,080 --> 00:02:17,310
+distribution so the property
+x的概率分布
+
+63
+00:02:17,710 --> 00:02:19,780
+of x and I'll
+我有时不写p(x)
+
+64
+00:02:19,940 --> 00:02:20,940
+sometimes write this instead of
+我会用这个代替
+
+65
+00:02:21,050 --> 00:02:22,070
+p of x, I'm going
+我会写成
+
+66
+00:02:22,190 --> 00:02:22,960
+to write this as p of
+p 括号 x
+
+67
+00:02:23,350 --> 00:02:24,930
+x semicolon Mu comma sigma squared.
+分号 μ 逗号 σ 平方
+
+68
+00:02:25,500 --> 00:02:26,750
+And so this denotes that the probability of
+这个表示
+
+69
+00:02:26,910 --> 00:02:28,670
+x is parametrized by
+x的概率分布
+
+70
+00:02:28,810 --> 00:02:30,660
+the two parameters Mu and sigma squared.
+由两个参数控制μ和σ平方
+
+71
+00:02:31,940 --> 00:02:33,330
+And the formula for the
+高斯分布的概率密度公式
+
+72
+00:02:33,370 --> 00:02:34,760
+Gaussian density is this,
+是这样的
+
+73
+00:02:35,170 --> 00:02:37,860
+1 over 2pi, sigma e
+2π开方 乘以 σ 分之 1
+
+74
+00:02:38,070 --> 00:02:41,510
+to the negative x minus Mu squared over 2 sigma squared.
+乘以一个e的指数函数 其中指数项为 负的 x减μ的平方除以2倍的σ平方
+
+75
+00:02:41,870 --> 00:02:45,980
+So there's no need to memorize this
+其实我们并不需要
+
+76
+00:02:46,470 --> 00:02:47,530
+formula, you know, this
+记住这个公式
+
+77
+00:02:47,690 --> 00:02:49,410
+is just the formula for the
+它只是左边这条钟形曲线
+
+78
+00:02:49,540 --> 00:02:51,020
+bell-shaped curve over here on the left.
+对应的公式
+
+79
+00:02:51,700 --> 00:02:53,100
+There's no need to memorize it and
+我们没有必要记住它
+
+80
+00:02:53,270 --> 00:02:53,990
+if you ever need to use this,
+当我们真的需要用到它时
+
+81
+00:02:54,190 --> 00:02:56,460
+you can always look this up.
+我们总可以查资料找到它
+
+82
+00:02:56,540 --> 00:02:57,450
+And so that figure on the
+如果你选定
+
+83
+00:02:57,740 --> 00:02:58,420
+left, that is what you get
+μ值
+
+84
+00:02:58,910 --> 00:03:00,100
+if you take a fixed
+以及σ值
+
+85
+00:03:00,290 --> 00:03:01,200
+value of Mu and a
+然后绘制p(x)曲线
+
+86
+00:03:01,250 --> 00:03:04,070
+fixed value of sigma and
+那么你将得到
+
+87
+00:03:04,450 --> 00:03:06,140
+you plot p of x. So this
+左边这幅图
+
+88
+00:03:06,870 --> 00:03:07,830
+curve here, this is really
+因此这条曲线
+
+89
+00:03:08,390 --> 00:03:10,000
+p of x plotted as a
+其实就是
+
+90
+00:03:10,030 --> 00:03:11,540
+function of x, you know,
+给定μ值
+
+91
+00:03:11,640 --> 00:03:15,970
+for a fixed value of Mu
+以及σ平方 也就是方差值时
+
+92
+00:03:16,190 --> 00:03:18,770
+and of sigma squared sigma squared, that's called the variance.
+p(x)的函数图像
+
+93
+00:03:19,950 --> 00:03:22,270
+And sometimes it's easier to think in terms of sigma.
+也许有些时候我们使用σ会更方便
+
+94
+00:03:22,950 --> 00:03:24,730
+So sigma is called the
+而σ被称作
+
+95
+00:03:25,120 --> 00:03:27,850
+standard deviation and it,
+标准差
+
+96
+00:03:28,000 --> 00:03:29,640
+so it specifies the
+它确定了
+
+97
+00:03:29,800 --> 00:03:31,310
+width of this Gaussian probability
+高斯分布概率密度函数
+
+98
+00:03:31,730 --> 00:03:33,120
+density whereas the square
+的宽度
+
+99
+00:03:33,330 --> 00:03:34,490
+of sigma, so sigma squared, is
+而σ平方
+
+100
+00:03:34,620 --> 00:03:36,830
+called the variance. Let's look
+则称作方差
+
+101
+00:03:37,000 --> 00:03:39,980
+at some examples of what the Gaussian distribution looks like.
+让我们看几个高斯分布的图像
+
+102
+00:03:41,010 --> 00:03:43,280
+If Mu equals zero, sigma equals 1.
+如果μ取0 σ取1
+
+103
+00:03:43,650 --> 00:03:44,730
+Then we have a Gaussian distribution
+那么我们的高斯分布
+
+104
+00:03:45,480 --> 00:03:48,000
+that is centered around zero, because that's Mu.
+将以0为中心 因为μ等于0
+
+105
+00:03:48,810 --> 00:03:50,560
+And the width of this Gaussian, so
+而高斯分布的宽度
+
+106
+00:03:50,730 --> 00:03:53,610
+that's one standard deviation is sigma over there.
+将是一个标准差 也就是σ
+
+107
+00:03:55,140 --> 00:03:56,330
+Let's look at some examples of
+让我们看几个高斯分布的图像
+
+108
+00:03:56,700 --> 00:03:58,770
+Gaussians. If Mu
+如果μ取0
+
+109
+00:03:58,970 --> 00:04:00,750
+is equal to zero it equals 1.
+σ取1
+
+110
+00:04:00,950 --> 00:04:02,150
+Then that corresponds to a
+那么这将对应
+
+111
+00:04:02,370 --> 00:04:04,030
+Gaussian distribution that is centered
+一个以0为中心
+
+112
+00:04:04,770 --> 00:04:06,380
+at zero since Mu is zero.
+的高斯分布
+
+113
+00:04:07,390 --> 00:04:08,310
+And the width of this Gaussian
+而高斯分布的宽度
+
+114
+00:04:10,810 --> 00:04:12,570
+is Gaussian thus controlled
+高斯分布的宽度
+
+115
+00:04:13,010 --> 00:04:15,430
+by sigma by that variance parameter sigma.
+由标准差σ决定
+
+116
+00:04:16,850 --> 00:04:17,390
+Here's another example.
+来看另一个例子
+
+117
+00:04:20,520 --> 00:04:21,270
+Let's say Mu is equal to
+如果μ取0
+
+118
+00:04:21,550 --> 00:04:23,670
+zero and sigma is equal to one-half.
+σ取0.5
+
+119
+00:04:24,200 --> 00:04:26,290
+So the standard deviation is
+也就是说标准差
+
+120
+00:04:26,530 --> 00:04:27,650
+one-half and the variance
+是0.5
+
+121
+00:04:28,280 --> 00:04:29,550
+sigma squared would therefore be
+方差σ平方
+
+122
+00:04:29,710 --> 00:04:33,600
+the square of 0.5 would be 0.25.
+是0.5的平方 也就是0.25
+
+123
+00:04:33,680 --> 00:04:34,910
+And in that case the Gaussian distribution,
+这时候 高斯分布
+
+124
+00:04:35,600 --> 00:04:37,040
+the Gaussian probability density looks
+高斯分布的概率密度函数曲线
+
+125
+00:04:37,180 --> 00:04:39,490
+like this, is also centered at zero.
+会是这样的 以0为中心
+
+126
+00:04:40,110 --> 00:04:41,410
+But now the width of
+然而 现在它的宽度
+
+127
+00:04:41,600 --> 00:04:43,250
+this is much smaller because
+小了许多
+
+128
+00:04:43,620 --> 00:04:45,170
+the smaller variance, the
+因为方差变小了
+
+129
+00:04:45,520 --> 00:04:46,980
+width of this Gaussian density
+高斯密度函数的宽度
+
+130
+00:04:47,450 --> 00:04:49,350
+is roughly half as wide.
+大约是之前的一半
+
+131
+00:04:50,550 --> 00:04:51,710
+But because this is a
+但是 因为这是
+
+132
+00:04:51,970 --> 00:04:53,590
+probability distribution, the area under
+一个概率分布
+
+133
+00:04:53,800 --> 00:04:54,850
+the curve, that is the shaded
+因此曲线下的面积
+
+134
+00:04:55,310 --> 00:04:56,790
+area there, that area
+这些阴影区域的积分
+
+135
+00:04:57,180 --> 00:04:58,810
+must integrate to 1.
+一定是1
+
+136
+00:04:58,810 --> 00:05:00,500
+This is a property of probability distributions.
+这是概率分布的一个特性
+
+137
+00:05:01,650 --> 00:05:02,680
+And so, you know, this
+因此
+
+138
+00:05:02,830 --> 00:05:04,530
+is a much taller Gaussian density because
+这个高斯密度曲线更高
+
+139
+00:05:04,820 --> 00:05:06,050
+it's half as wide, with
+因为它只有一半宽
+
+140
+00:05:06,200 --> 00:05:08,150
+half the standard deviation, but it's twice as tall.
+只有一半的标准差 但是它有两倍高
+
+141
+00:05:09,130 --> 00:05:11,510
+Another example, if sigma is
+再看一个例子
+
+142
+00:05:11,640 --> 00:05:12,540
+equal to 2, then you
+如果σ等于2
+
+143
+00:05:12,650 --> 00:05:14,870
+get a much fatter, or much wider Gaussian density.
+那么你将得到一个更胖更宽的高斯密度曲线
+
+144
+00:05:15,310 --> 00:05:17,090
+And so here, the sigma
+在这里
+
+145
+00:05:17,370 --> 00:05:19,300
+parameter controls that this
+σ参数决定了
+
+146
+00:05:19,630 --> 00:05:21,000
+Gaussian density has a wider width.
+曲线会更宽
+
+147
+00:05:21,930 --> 00:05:23,180
+And once again, the area under
+同样的
+
+148
+00:05:23,220 --> 00:05:24,390
+the curve, that is this shaded
+曲线下方的面积 这快阴影区域
+
+149
+00:05:24,700 --> 00:05:26,720
+area, you know, it always integrates to 1.
+的积分一定是1
+
+150
+00:05:26,840 --> 00:05:28,170
+That's a property of probability
+这是概率分布的一个特性
+
+151
+00:05:28,800 --> 00:05:30,280
+distributions, and because it's
+因为它更宽
+
+152
+00:05:30,480 --> 00:05:31,930
+wider, it's also half as
+因此它只有一半高
+
+153
+00:05:32,650 --> 00:05:36,640
+tall, in order to just integrate to the same thing.
+这样积分才能保持不变
+
+154
+00:05:36,750 --> 00:05:37,520
+And finally, one last example would be,
+最后一个例子
+
+155
+00:05:37,880 --> 00:05:38,980
+if we now changed the Mu
+如果我们也改变参数μ
+
+156
+00:05:39,130 --> 00:05:40,660
+parameters as well, then instead
+那么曲线
+
+157
+00:05:41,000 --> 00:05:42,320
+of being centered at zero, we
+将不再以0为中心
+
+158
+00:05:42,410 --> 00:05:43,840
+now we have a Gaussian distribution
+现在我们的高斯分布
+
+159
+00:05:44,830 --> 00:05:46,810
+that is centered at three, because
+以3为中心
+
+160
+00:05:47,710 --> 00:05:49,740
+this shifts over the entire Gaussian distribution.
+因为整个高斯分布被平移了
+
+161
+00:05:51,170 --> 00:05:54,040
+Next, lets take about the parameter estimation problem.
+接下来 让我们来看参数估计问题
+
+162
+00:05:55,100 --> 00:05:56,570
+So what is the parameter estimation problem?
+那么 什么是参数估计问题?
+
+163
+00:05:57,520 --> 00:05:58,350
+Let's say we have a data set
+假设我们有一个数据集
+
+164
+00:05:58,850 --> 00:06:00,180
+of m examples, so x1
+其中有m个样本
+
+165
+00:06:00,350 --> 00:06:01,470
+through x(m), and let's say
+从x(1)到x(m)
+
+166
+00:06:01,710 --> 00:06:03,250
+each of these examples is a real number.
+假设他们都是实数
+
+167
+00:06:04,200 --> 00:06:05,520
+Here in the figure, I've plotted an
+在这幅图里
+
+168
+00:06:05,620 --> 00:06:06,390
+example of a data set,
+我画出了整个数据集
+
+169
+00:06:06,580 --> 00:06:08,390
+so the horizontal axis is the
+图中的横轴
+
+170
+00:06:08,580 --> 00:06:09,430
+x axis and, you know, I
+是x轴
+
+171
+00:06:09,560 --> 00:06:12,290
+have a range of examples of x and I've just plotted them
+我的样本x取值分布广泛
+
+172
+00:06:12,560 --> 00:06:15,060
+on this figure here.
+我就将它们画在这里
+
+173
+00:06:15,260 --> 00:06:17,280
+And the parameter estimation problem is, let's
+而参数估计问题就是
+
+174
+00:06:17,500 --> 00:06:18,750
+say I suspect that these examples
+假设我猜测这些样本
+
+175
+00:06:19,450 --> 00:06:21,160
+came from a Gaussian distribution so
+来自一个高斯分布的总体
+
+176
+00:06:21,300 --> 00:06:24,560
+let's say I suspect that each of my example x(i) was distributed.
+假设我猜测每一个样本xi服从某个分布
+
+177
+00:06:25,300 --> 00:06:26,930
+That's what this tilde thing means.
+这里的波浪号表示 服从...分布
+
+178
+00:06:27,590 --> 00:06:28,520
+Thus, I suspect that each of
+因此 我猜测
+
+179
+00:06:28,580 --> 00:06:30,220
+these examples was distributed according
+这里的每个样本
+
+180
+00:06:30,760 --> 00:06:32,190
+to a normal distribution or
+服从正态分布
+
+181
+00:06:32,250 --> 00:06:34,060
+Gaussian distribution with some
+或者高斯分布
+
+182
+00:06:34,300 --> 00:06:36,210
+parameter Mu and some parameter sigma squared.
+它有两个参数 μ和σ平方
+
+183
+00:06:37,570 --> 00:06:39,560
+But I don't know what the values of these parameters are.
+然而 我不知道这些参数的值是多少
+
+184
+00:06:40,820 --> 00:06:42,360
+The problem with parameter estimation is,
+参数估计问题就是
+
+185
+00:06:43,160 --> 00:06:44,480
+given my data set I want
+给定数据集
+
+186
+00:06:44,800 --> 00:06:45,720
+to figure out, I want to
+我希望能找到
+
+187
+00:06:45,880 --> 00:06:46,840
+estimate, what are the
+能够估算出
+
+188
+00:06:46,990 --> 00:06:48,470
+values of Mu and sigma squared.
+μ和σ平方的值
+
+189
+00:06:49,620 --> 00:06:50,570
+So, if you're given a
+因此 如果你有
+
+190
+00:06:50,640 --> 00:06:51,660
+data set like this, you know,
+这样一个数据
+
+191
+00:06:51,790 --> 00:06:54,050
+it looks like maybe, if I
+它看起来好像
+
+192
+00:06:54,190 --> 00:06:56,210
+estimate what Gaussian distribution the
+如果我试图找到
+
+193
+00:06:56,350 --> 00:06:59,010
+data came from, maybe that
+它来自哪个高斯分布
+
+194
+00:07:00,660 --> 00:07:01,770
+might be roughly the Gaussian distribution
+也许这个就是
+
+195
+00:07:02,280 --> 00:07:04,410
+it came from, with Mu
+它对应的高斯分布
+
+196
+00:07:05,500 --> 00:07:07,350
+being the center of the distribution and
+其中μ对应分布函数的中心
+
+197
+00:07:07,990 --> 00:07:11,680
+sigma the standard deviation controlling the width of this Gaussian distribution.
+而标准差σ控制高斯分布的宽度
+
+198
+00:07:12,140 --> 00:07:12,820
+It seems like a reasonable
+这条曲线似乎
+
+199
+00:07:13,260 --> 00:07:15,280
+fit to the data, because, you know, it
+很好的拟合了数据
+
+200
+00:07:15,440 --> 00:07:16,880
+looks the data has
+因为看起来
+
+201
+00:07:17,110 --> 00:07:18,910
+a very high probability of being
+这个数据集
+
+202
+00:07:19,240 --> 00:07:20,590
+in the central region, low probability of
+在中心区域的概率比较大
+
+203
+00:07:21,640 --> 00:07:24,720
+being further out, low probability of being further out, and so on.
+而在外围 在边缘的概率越来越小
+
+204
+00:07:24,780 --> 00:07:25,770
+And so, maybe this is
+因此 也许这是
+
+205
+00:07:25,890 --> 00:07:27,360
+a reasonable estimate of
+对μ和σ平方
+
+206
+00:07:28,020 --> 00:07:29,920
+Mu and of sigma squared,
+的一个不错的估计
+
+207
+00:07:30,410 --> 00:07:31,810
+that is, if it corresponds to
+也就是说
+
+208
+00:07:31,960 --> 00:07:33,970
+a Gaussian distribution, that then looks like this.
+我们的数据对应这样一个高斯分布
+
+209
+00:07:35,650 --> 00:07:36,340
+So, what I'm going to
+那么 接下来
+
+210
+00:07:36,430 --> 00:07:37,550
+do is write out the
+我将写下
+
+211
+00:07:37,660 --> 00:07:39,090
+formulas, the standard formulas
+对μ和σ平方
+
+212
+00:07:39,750 --> 00:07:40,920
+for estimating the parameters from
+进行参数估计的
+
+213
+00:07:41,130 --> 00:07:43,480
+Mu and sigma squared.
+标准公式
+
+214
+00:07:44,110 --> 00:07:44,860
+The way we are going to
+我们估计μ的方法是
+
+215
+00:07:45,390 --> 00:07:47,140
+estimate Mu is going to
+对我的
+
+216
+00:07:47,380 --> 00:07:48,850
+be just the average
+所有样本
+
+217
+00:07:49,670 --> 00:07:50,630
+of my example.
+求平均值
+
+218
+00:07:51,210 --> 00:07:52,190
+So Mu is the mean parameter,
+μ就是平均值参数
+
+219
+00:07:52,750 --> 00:07:53,340
+so I'm going to take my
+因此 我将
+
+220
+00:07:53,380 --> 00:07:54,350
+training set, take my m
+使用我的训练集
+
+221
+00:07:54,450 --> 00:07:56,200
+examples and average them.
+使用我的m个样本 对它们取平均
+
+222
+00:07:56,470 --> 00:07:58,120
+And that just gives me the center of this distribution.
+这样我就得到了高斯分布的中心位置
+
+223
+00:08:01,150 --> 00:08:01,670
+How about sigma squared?
+那么如何估计σ平方呢?
+
+224
+00:08:01,890 --> 00:08:03,110
+Well the variance, I'll just
+σ平方表示方差
+
+225
+00:08:03,340 --> 00:08:04,890
+write out the standard formula again,
+我们也来写下方差的标准计算公式
+
+226
+00:08:05,150 --> 00:08:06,780
+I'm going to estimate as sum
+我先将所有的样本
+
+227
+00:08:07,280 --> 00:08:08,900
+of 1 through m of x(i),
+从x(1)到x(m)
+
+228
+00:08:09,150 --> 00:08:11,730
+minus Mu squared,
+减去平均值μ
+
+229
+00:08:12,130 --> 00:08:13,130
+and so this Mu here is
+平方再求和
+
+230
+00:08:13,240 --> 00:08:14,030
+actually the Mu that I compute
+实际上μ是用
+
+231
+00:08:14,450 --> 00:08:15,580
+over here using this formula,
+之前的这个公式计算出来的
+
+232
+00:08:16,790 --> 00:08:17,920
+and what the variance is, or
+而方差的含义
+
+233
+00:08:18,040 --> 00:08:18,890
+one interpretation of the variance,
+至少一种方差的定义
+
+234
+00:08:19,440 --> 00:08:20,230
+is that, if you look at the
+是将这一项
+
+235
+00:08:20,250 --> 00:08:21,580
+this term, that's the square
+所有样本
+
+236
+00:08:22,090 --> 00:08:23,580
+difference between the value
+的差值平方和
+
+237
+00:08:24,020 --> 00:08:25,190
+I've got in my example minus
+我已经将样本减去了平均值
+
+238
+00:08:25,740 --> 00:08:28,300
+the mean, minus the center, minus the mean of distribution.
+减去了中心位置 分布的平均值
+
+239
+00:08:28,810 --> 00:08:29,690
+And so, you know, the
+因此 你看
+
+240
+00:08:29,730 --> 00:08:30,630
+variance, I'm gonna estimate
+我会将方差估计为
+
+241
+00:08:31,250 --> 00:08:32,530
+as just the average of
+样本减去平均值
+
+242
+00:08:32,570 --> 00:08:35,520
+the square differences, between my examples, minus the mean.
+差值取平方 再求平均
+
+243
+00:08:37,270 --> 00:08:38,370
+And as a side comment,
+这里顺便提一下
+
+244
+00:08:38,850 --> 00:08:40,150
+only for those of you that are experts
+你们有些人
+
+245
+00:08:40,490 --> 00:08:41,820
+in statistics, if you're
+可能精通统计学
+
+246
+00:08:42,010 --> 00:08:43,690
+an expert in statistics and if
+如果你是统计方面的专家
+
+247
+00:08:43,830 --> 00:08:45,570
+you've heard of maximum likelihood estimation,
+听说过极大似然估计
+
+248
+00:08:46,680 --> 00:08:47,950
+then these estimates are actually the
+那么这里的估计
+
+249
+00:08:48,770 --> 00:08:50,530
+maximum likelihood estimates of the parameters
+实际就是对μ和σ平方
+
+250
+00:08:50,680 --> 00:08:52,590
+of Mu and sigma squared.
+的极大似然估计
+
+251
+00:08:53,220 --> 00:08:55,260
+But if you haven't heard of that before, don't worry about it.
+不过如果你没听说过这些 没关系
+
+252
+00:08:55,440 --> 00:08:56,500
+All you need to know is that
+你只需要知道
+
+253
+00:08:56,750 --> 00:08:57,880
+these are the two standard formulas
+这里有两个标准公式
+
+254
+00:08:58,600 --> 00:09:01,090
+for how you try to
+给定数据集 你可以利用它们
+
+255
+00:09:01,520 --> 00:09:03,820
+figure out what our Mu and sigma squared given the dataset.
+估算出μ和σ平方的值
+
+256
+00:09:05,050 --> 00:09:06,140
+Finally one last side comment.
+最后 再提一句
+
+257
+00:09:06,650 --> 00:09:07,810
+Again only for those of
+同样是针对那些
+
+258
+00:09:07,950 --> 00:09:10,520
+you that has maybe taken a statistics class before.
+学习过统计课程的人
+
+259
+00:09:10,880 --> 00:09:12,040
+But if you have taken a statistics
+如果你以前上过统计课程
+
+260
+00:09:12,200 --> 00:09:13,530
+class before, some of you
+你们可能会见过
+
+261
+00:09:13,610 --> 00:09:14,620
+may have seen the formula here
+这么一个公式
+
+262
+00:09:14,820 --> 00:09:15,810
+where, you know, this is m minus
+在这里不是m
+
+263
+00:09:16,030 --> 00:09:17,300
+1, instead of m. So
+而是m减1
+
+264
+00:09:17,700 --> 00:09:19,310
+this first term becomes 1
+第一项写成
+
+265
+00:09:19,520 --> 00:09:20,410
+over m minus 1, instead
+m减1分之1
+
+266
+00:09:20,450 --> 00:09:22,640
+of 1 over m. In machine
+而不是m分之1
+
+267
+00:09:22,960 --> 00:09:25,170
+learning, people tend to use this 1 over m formula.
+在机器学习领域 大家喜欢用m分之1
+
+268
+00:09:26,000 --> 00:09:27,230
+But in practice, whether it
+然而在实际使用中
+
+269
+00:09:27,470 --> 00:09:28,480
+is 1 over m or 1
+到底是选择使用
+
+270
+00:09:28,550 --> 00:09:29,710
+over m minus one, makes essentially
+m分之1还是m减1分之1
+
+271
+00:09:30,170 --> 00:09:32,290
+no difference, assuming, you know, m is
+其实区别很小
+
+272
+00:09:32,540 --> 00:09:34,670
+reasonably large, it's a large training set size.
+只要你有一个还算大的训练集
+
+273
+00:09:35,310 --> 00:09:36,480
+So, just in case you've seen
+因此 如果你见过m减1分之1
+
+274
+00:09:36,740 --> 00:09:38,570
+this other version before, in either
+这个版本的公式
+
+275
+00:09:38,810 --> 00:09:39,970
+version it works just equally
+它同样可以很好的估计出参数
+
+276
+00:09:40,300 --> 00:09:41,630
+well, but in machine
+只是在机器学习领域
+
+277
+00:09:41,910 --> 00:09:42,850
+learning most people tend to
+大部分人更习惯使用
+
+278
+00:09:42,970 --> 00:09:44,410
+use 1 over m in this formula.
+m分之1这个版本的公式
+
+279
+00:09:45,690 --> 00:09:46,740
+And the two versions have slightly
+这两个版本的公式
+
+280
+00:09:47,070 --> 00:09:48,770
+different theoretical properties, slightly different
+在理论特性和数学特性上稍有不同
+
+281
+00:09:49,030 --> 00:09:50,530
+mathematical properties, but in
+但是在实际使用中
+
+282
+00:09:50,590 --> 00:09:54,080
+practice it really makes very little difference, if any.
+他们的区别甚小 几乎可以忽略不计
+
+283
+00:09:56,490 --> 00:09:57,670
+So, hopefully, you now have
+现在 我想
+
+284
+00:09:57,890 --> 00:09:58,900
+a good sense of what the
+你大概已经对高斯分布的样子
+
+285
+00:09:59,020 --> 00:10:00,410
+Gaussian distribution looks like,
+有些感觉了
+
+286
+00:10:00,740 --> 00:10:02,210
+as well as how to estimate
+你也知道
+
+287
+00:10:02,270 --> 00:10:03,730
+the parameters, mu and sigma
+如何估计高斯分布中的
+
+288
+00:10:04,010 --> 00:10:05,770
+squared, of the Gaussian distribution, and
+参数μ和σ平方
+
+289
+00:10:05,910 --> 00:10:07,510
+if you're given a training set,
+只要你有一个训练集
+
+290
+00:10:07,850 --> 00:10:08,940
+that is if you're given a
+如果你猜测它
+
+291
+00:10:09,240 --> 00:10:10,220
+set of data that you suspect
+来自一个高斯分布
+
+292
+00:10:11,130 --> 00:10:12,350
+comes from a Gaussian
+你就可以估计出它的参数值
+
+293
+00:10:12,410 --> 00:10:15,190
+distribution with unknown parameters using sigma squared.
+μ和σ平方
+
+294
+00:10:16,190 --> 00:10:17,510
+In the next video, we'll start
+在下一个视频中
+
+295
+00:10:17,810 --> 00:10:18,820
+to take this and apply it
+我们将利用这些知识
+
+296
+00:10:18,920 --> 00:10:20,810
+to develop the anomaly detection algorithm.
+推导出异常检测算法
+
diff --git a/srt/15 - 3 - Algorithm (12 min).srt b/srt/15 - 3 - Algorithm (12 min).srt
new file mode 100644
index 00000000..49ef34f1
--- /dev/null
+++ b/srt/15 - 3 - Algorithm (12 min).srt
@@ -0,0 +1,1741 @@
+1
+00:00:00,090 --> 00:00:01,240
+In the last video, we talked
+在上一节视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,560 --> 00:00:03,660
+about the Gaussian distribution. In
+我们谈到了高斯分布
+
+3
+00:00:03,810 --> 00:00:05,350
+this video lets apply that
+在本节视频中
+
+4
+00:00:05,440 --> 00:00:07,300
+to develop an anomaly detection algorithm.
+我将应用高斯分布开发异常检测算法
+
+5
+00:00:10,360 --> 00:00:11,690
+Let's say that we have an
+假如说我们有一个无标签的训练集
+
+6
+00:00:11,840 --> 00:00:13,390
+unlabeled training set of M
+共有 m 个训练样本
+
+7
+00:00:13,650 --> 00:00:15,410
+examples, and each of
+并且
+
+8
+00:00:15,470 --> 00:00:16,730
+these examples is going to
+这里的训练集里的每一个样本
+
+9
+00:00:16,760 --> 00:00:18,350
+be a feature in Rn so
+都是 n 维的特征
+
+10
+00:00:18,440 --> 00:00:19,420
+your training set could be,
+因此你的训练集应该是
+
+11
+00:00:20,540 --> 00:00:21,860
+feature vectors from the last
+它可以是来自上一个这些特征向量的集合
+
+12
+00:00:22,730 --> 00:00:24,150
+M aircraft engines being manufactured.
+比如 m 个飞机引擎产品的样本
+
+13
+00:00:24,960 --> 00:00:26,730
+Or it could be features from m
+或者是
+
+14
+00:00:27,070 --> 00:00:28,290
+users or something else.
+来自 m 个用户或者其它的什么东西
+
+15
+00:00:29,320 --> 00:00:30,460
+The way we are going to address
+现在
+
+16
+00:00:30,840 --> 00:00:32,310
+anomaly detection, is we are
+我们解决异常检测的方法是
+
+17
+00:00:32,350 --> 00:00:33,480
+going to model p of x
+我们要从数据中
+
+18
+00:00:33,860 --> 00:00:35,640
+from the data sets.
+建立一个 p(x) 概率模型
+
+19
+00:00:36,240 --> 00:00:38,530
+We're going to try to figure out what are high probability features, what
+我们要尝试计算出这些哪些特征出现的概率比较高
+
+20
+00:00:38,860 --> 00:00:40,620
+are lower probability types of features.
+哪些特征的概率较低
+
+21
+00:00:41,350 --> 00:00:42,810
+So, x is a
+因此
+
+22
+00:00:43,090 --> 00:00:44,900
+vector and what we
+x 是一个向量
+
+23
+00:00:45,320 --> 00:00:46,580
+are going to do is model p of
+我们要做的事情是
+
+24
+00:00:47,020 --> 00:00:48,870
+x, as probability of x1,
+建立一个 p(x) 的模型 表示 x1 的概率
+
+25
+00:00:49,440 --> 00:00:50,390
+that is of the first component
+这是 x 的第一个组成部分
+
+26
+00:00:50,950 --> 00:00:53,180
+of x, times the probability
+并用它乘以 x2 的概率
+
+27
+00:00:53,990 --> 00:00:54,960
+of x2, that is the probability
+这是第二个特征的概率
+
+28
+00:00:55,510 --> 00:00:57,350
+of the second feature, times the
+然后再乘以
+
+29
+00:00:57,450 --> 00:00:58,860
+probability of the third
+第三个特征的概率
+
+30
+00:00:59,090 --> 00:01:01,230
+feature, and so on up
+一直这样下去
+
+31
+00:01:01,410 --> 00:01:03,290
+to the probability of the final feature
+直到最后一个特征
+
+32
+00:01:03,760 --> 00:01:03,930
+of Xn.
+特征 xn
+
+33
+00:01:04,200 --> 00:01:06,320
+Now I'm leaving space here cause I'll fill in something in a minute.
+这里我先空着 最后再来填满
+
+34
+00:01:08,780 --> 00:01:09,720
+So, how do we
+那么
+
+35
+00:01:09,830 --> 00:01:10,960
+model each of these terms,
+我们如何为这些项进行建模呢?
+
+36
+00:01:11,460 --> 00:01:13,020
+p of X1, p of X2, and so on.
+p(x1) p(x2) 等等
+
+37
+00:01:14,080 --> 00:01:15,380
+What we're going to do,
+我们下面要做的
+
+38
+00:01:15,680 --> 00:01:16,860
+is assume that the feature,
+是假定特征 x1
+
+39
+00:01:17,480 --> 00:01:19,830
+X1, is distributed according
+其分布
+
+40
+00:01:20,340 --> 00:01:22,950
+to a Gaussian distribution, with
+服从高斯正态分布
+
+41
+00:01:23,160 --> 00:01:25,140
+some mean, which you
+你也可以
+
+42
+00:01:25,340 --> 00:01:25,850
+want to write as mu1 and
+写出期望 μ1
+
+43
+00:01:25,920 --> 00:01:26,900
+some variance, which I'm going
+以及方差
+
+44
+00:01:26,990 --> 00:01:28,560
+to write as sigma squared 1,
+我用 σ1 的平方表示
+
+45
+00:01:29,890 --> 00:01:30,690
+and so p of X1 is
+这样
+
+46
+00:01:30,820 --> 00:01:32,020
+going to be a Gaussian
+p(x1) 就可以写成
+
+47
+00:01:32,350 --> 00:01:34,410
+probability distribution, with mean
+这样一个高斯分布
+
+48
+00:01:34,610 --> 00:01:37,580
+mu1 and variance sigma squared 1.
+其期望为 μ1 方差为 (σ1)^2
+
+49
+00:01:38,230 --> 00:01:39,660
+And similarly I'm
+同样地
+
+50
+00:01:39,720 --> 00:01:40,570
+going to assume that X2
+我假设 x2
+
+51
+00:01:40,760 --> 00:01:42,220
+is distributed, Gaussian,
+也服从高斯分布
+
+52
+00:01:42,870 --> 00:01:44,620
+that's what this little tilda stands for,
+这里的小波浪线读作"服从"
+
+53
+00:01:44,800 --> 00:01:47,220
+that means distributed Gaussian
+表示它服从高斯分布
+
+54
+00:01:47,740 --> 00:01:49,490
+with mean mu2 and Sigma
+其期望为 μ2
+
+55
+00:01:49,830 --> 00:01:51,780
+squared 2, so it's distributed according
+方差为 (σ2)^2
+
+56
+00:01:52,170 --> 00:01:54,230
+to a different Gaussian, which has
+所以它服从的高斯分布与刚刚那个不同
+
+57
+00:01:54,460 --> 00:01:58,010
+a different set of parameters, mu2 sigma square 2.
+它的期望和方差都不一样
+
+58
+00:01:58,120 --> 00:02:00,160
+And similarly, you know,
+与此类似
+
+59
+00:02:00,360 --> 00:02:04,020
+X3 is yet another
+x3 服从另外一个高斯分布
+
+60
+00:02:04,480 --> 00:02:06,590
+Gaussian, so this
+因此
+
+61
+00:02:06,780 --> 00:02:09,100
+can have a different mean and
+x3 也会有一个不同的期望
+
+62
+00:02:09,300 --> 00:02:11,630
+a different standard deviation than the
+以及一个与其它特征不同的标准差
+
+63
+00:02:11,830 --> 00:02:15,370
+other features, and so on, up to XN.
+直到 xn 都是如此
+
+64
+00:02:17,000 --> 00:02:17,740
+And so that's my model.
+这就是我要说的模型
+
+65
+00:02:19,010 --> 00:02:20,230
+Just as a side comment for
+顺便说一下
+
+66
+00:02:20,370 --> 00:02:21,490
+those of you that are experts in
+对那些擅长统计的同学来说
+
+67
+00:02:21,890 --> 00:02:22,770
+statistics, it turns out that
+实际上
+
+68
+00:02:22,990 --> 00:02:23,850
+this equation that I just
+我刚刚写的式子
+
+69
+00:02:24,250 --> 00:02:25,590
+wrote out actually corresponds to an
+写出来实际上就对应于
+
+70
+00:02:25,750 --> 00:02:27,490
+independence assumption on the
+一个从 x1 到 xn 上的独立的假设
+
+71
+00:02:28,060 --> 00:02:29,550
+values of the features x1 through xn.
+一个从 x1 到 xn 上的独立的假设
+
+72
+00:02:30,290 --> 00:02:31,520
+But in practice it turns out
+但实际中 结果是
+
+73
+00:02:32,040 --> 00:02:34,010
+that the algorithm of this fragment, it works just fine,
+这些算法的效果都还不错
+
+74
+00:02:34,410 --> 00:02:36,330
+whether or not these features are
+无论这些特征
+
+75
+00:02:36,610 --> 00:02:37,780
+anywhere close to independent and
+是否独立
+
+76
+00:02:38,280 --> 00:02:39,810
+even if independence assumption doesn't
+即使这个独立的假设不成立
+
+77
+00:02:40,240 --> 00:02:41,830
+hold true this algorithm works just fine.
+这个算法的效果也还不错
+
+78
+00:02:42,650 --> 00:02:45,870
+But in case you don't know
+如果你不知道
+
+79
+00:02:45,970 --> 00:02:47,380
+those terms I just used independence assumptions and so on,
+我刚刚提到的独立假说和其它相关内容
+
+80
+00:02:47,830 --> 00:02:48,460
+don't worry about it.
+也不要担心
+
+81
+00:02:49,170 --> 00:02:50,840
+You'll be able to understand
+你将会慢慢有能力去理解
+
+82
+00:02:51,360 --> 00:02:52,690
+it and implement this algorithm just fine
+并且能很好地实现这些算法
+
+83
+00:02:53,250 --> 00:02:55,310
+and that comment was really meant only for the experts in statistics.
+刚才插入的那些内容也不重要
+
+84
+00:02:57,790 --> 00:02:58,880
+Finally, in order to
+最后
+
+85
+00:02:59,210 --> 00:03:00,320
+wrap this up, let me
+做一个总结
+
+86
+00:03:00,590 --> 00:03:04,680
+take this expression and write it a little bit more compactly.
+让我把这些式子写得紧凑点
+
+87
+00:03:05,120 --> 00:03:06,200
+So, we're going to
+我可以把这个式子
+
+88
+00:03:06,310 --> 00:03:07,500
+write this is a product
+写成一个乘积式
+
+89
+00:03:07,740 --> 00:03:09,520
+from J equals one
+从 j=1 到 j=n
+
+90
+00:03:10,230 --> 00:03:11,840
+through N, of P
+从 j=1 到 j=n
+
+91
+00:03:12,140 --> 00:03:15,350
+of XJ parameterized by
+被muj和( ∑2 )^2
+
+92
+00:03:16,020 --> 00:03:17,930
+mu j comma sigma squared
+乘积项是 p(xj; μj, (σj)^2)
+
+93
+00:03:19,500 --> 00:03:21,500
+j. So this funny
+在这里
+
+94
+00:03:21,790 --> 00:03:23,330
+symbol here, there is
+出现了一个有趣的符号
+
+95
+00:03:23,780 --> 00:03:25,220
+capital Greek alphabet pi,
+希腊字母 π
+
+96
+00:03:25,490 --> 00:03:27,600
+that funny symbol there corresponds to
+这个有趣的字母表示的是
+
+97
+00:03:28,190 --> 00:03:29,980
+taking the product of a set of values.
+一系列数值的乘积
+
+98
+00:03:30,590 --> 00:03:32,290
+And so, you're familiar with
+同时
+
+99
+00:03:32,400 --> 00:03:33,930
+the summation notation, so the
+你应该对求和符号比较熟悉
+
+100
+00:03:34,520 --> 00:03:36,460
+sum from i equals one through
+从 i=1 到 n 求和
+
+101
+00:03:36,930 --> 00:03:39,070
+n, of i. This
+也就是
+
+102
+00:03:39,960 --> 00:03:41,820
+means 1 + 2 + 3 plus
+1+2+3+...
+
+103
+00:03:42,230 --> 00:03:43,730
+dot dot dot, up to
+直到加到 n
+
+104
+00:03:43,910 --> 00:03:45,350
+n. Where as this
+正如刚才的符号 π
+
+105
+00:03:45,660 --> 00:03:46,910
+funny symbol here, this product
+这个求和符号 ∑
+
+106
+00:03:47,390 --> 00:03:48,630
+symbol, right product from i
+只是表示从 i 等于1
+
+107
+00:03:48,840 --> 00:03:50,310
+equals 1 through n
+到 n 进行连加
+
+108
+00:03:50,620 --> 00:03:52,210
+of i. Then this
+那么
+
+109
+00:03:52,520 --> 00:03:54,530
+means that, it's just like summation except that we're now multiplying.
+这里的 π 也是一样 只不过表示连乘而不是连加
+
+110
+00:03:55,200 --> 00:03:56,680
+This becomes 1 times
+这就变成了
+
+111
+00:03:56,880 --> 00:03:58,700
+2 times 3 times up
+1×2×3 一直乘到 n
+
+112
+00:03:59,910 --> 00:04:01,330
+to N. And so using
+所以这里用的就是
+
+113
+00:04:01,860 --> 00:04:03,430
+this product notation, this
+这里连乘的符号
+
+114
+00:04:03,570 --> 00:04:05,880
+product from j equals 1 through n of this expression.
+这个表达式表示从等于 j=1 开始一直乘到 j=n
+
+115
+00:04:06,620 --> 00:04:08,440
+It's just more compact, it's
+这样写看起来更紧凑
+
+116
+00:04:08,820 --> 00:04:09,960
+just shorter way for writing
+这样写只是避免了
+
+117
+00:04:10,330 --> 00:04:12,810
+out this product of
+把这个乘积表达式
+
+118
+00:04:13,120 --> 00:04:14,400
+of all of these terms up there.
+的所有项都写出来而已
+
+119
+00:04:15,200 --> 00:04:16,200
+Since we're are taking these p
+因为我们是采用
+
+120
+00:04:16,430 --> 00:04:17,510
+of x j given mu j
+给出mu j、∑( j )的p和xj
+
+121
+00:04:17,730 --> 00:04:18,740
+comma sigma squared j terms
+以及
+
+122
+00:04:19,130 --> 00:04:20,290
+and multiplying them together.
+和它们的乘积项
+
+123
+00:04:21,540 --> 00:04:22,830
+And, by the way the problem
+同时 顺便要说的是
+
+124
+00:04:23,250 --> 00:04:25,370
+of estimating this distribution
+估计x及p的分布的问题
+
+125
+00:04:25,990 --> 00:04:27,130
+p of x, they're sometimes called
+这种问题通常被称为
+
+126
+00:04:28,280 --> 00:04:29,540
+the problem of density estimation.
+密度估计问题
+
+127
+00:04:30,420 --> 00:04:31,270
+Hence the title of the slide.
+正如幻灯片的标题上写的
+
+128
+00:04:33,800 --> 00:04:35,310
+So putting everything together, here
+把所有的整合到一起
+
+129
+00:04:35,500 --> 00:04:36,920
+is our anomaly detection algorithm.
+下面便是我们的异常检测算法
+
+130
+00:04:38,120 --> 00:04:40,290
+The first step is to choose
+第一步便是选择特征
+
+131
+00:04:40,650 --> 00:04:41,600
+features, or come up with
+或者是提出一些
+
+132
+00:04:41,700 --> 00:04:42,740
+features xi that we think
+我们认为其指数比较反常的样本
+
+133
+00:04:43,040 --> 00:04:45,340
+might be indicative of anomalous examples.
+的特征Xi
+
+134
+00:04:46,050 --> 00:04:47,020
+So what I mean by that,
+因此 我的意思是
+
+135
+00:04:47,240 --> 00:04:48,490
+is, try to come
+可以尝试着
+
+136
+00:04:48,680 --> 00:04:49,990
+up with features, so that when there's
+提出一些特征
+
+137
+00:04:50,280 --> 00:04:51,630
+an unusual user in your
+通过某些
+
+138
+00:04:52,190 --> 00:04:53,000
+system that may be doing
+在你的系统里的不同学寻常的用户
+
+139
+00:04:53,190 --> 00:04:54,790
+fraudulent things, or when the
+的反常和欺诈行为
+
+140
+00:04:55,020 --> 00:04:56,670
+aircraft engine examples, you know
+或者是那个飞机引擎的例子
+
+141
+00:04:56,760 --> 00:04:59,500
+there's something funny, something strange about one of the aircraft engines.
+在飞机的众多引擎里有一个奇怪的引擎
+
+142
+00:05:00,280 --> 00:05:01,230
+Choose features X I, that
+选择特征Xi
+
+143
+00:05:02,000 --> 00:05:03,330
+you think might take on unusually
+这个特征可能会
+
+144
+00:05:04,410 --> 00:05:05,860
+large values, or unusually
+呈现一个不同寻常特别大的数值
+
+145
+00:05:06,020 --> 00:05:08,750
+small values, for what an
+或者特别小的数值
+
+146
+00:05:08,880 --> 00:05:10,160
+anomalous example might look like.
+因为这个看起来本身就有些异常
+
+147
+00:05:10,910 --> 00:05:12,440
+But more generally, just try
+但更为普遍的是
+
+148
+00:05:12,690 --> 00:05:14,340
+to choose features that describe general
+尽可能尝试选择用来
+
+149
+00:05:16,160 --> 00:05:19,380
+properties of the things that you're collecting data on.
+描述这些收集的数据相关的属性的事情
+
+150
+00:05:20,030 --> 00:05:21,360
+Next, given a training set,
+下一步是给出训练集
+
+151
+00:05:22,020 --> 00:05:23,980
+of M, unlabled examples,
+以及没有标记的样本
+
+152
+00:05:25,000 --> 00:05:26,980
+X1 through X M, we
+从X1到X
+
+153
+00:05:27,170 --> 00:05:28,580
+then fit the parameters,
+我们从mu1到mun
+
+154
+00:05:29,090 --> 00:05:30,170
+mu 1 through mu n, and
+找出里面合适的参数
+
+155
+00:05:30,340 --> 00:05:31,480
+sigma squared 1 through sigma
+从σ1的平方到σn的平方
+
+156
+00:05:31,690 --> 00:05:33,460
+squared n, and so these
+同时
+
+157
+00:05:33,840 --> 00:05:34,810
+were the formulas similar to
+这些公式
+
+158
+00:05:34,840 --> 00:05:36,420
+the formulas we have
+和之前在视频里的
+
+159
+00:05:36,680 --> 00:05:37,610
+in the previous video, that we're
+公式比较相似
+
+160
+00:05:37,740 --> 00:05:39,180
+going to use the estimate
+因此 我们便可以
+
+161
+00:05:39,310 --> 00:05:41,120
+each of these parameters, and just to give
+估计这些参数的值
+
+162
+00:05:42,030 --> 00:05:43,670
+some interpretation, mu J,
+同时 给出对应的解释
+
+163
+00:05:44,060 --> 00:05:47,830
+that's my average value of the j feature.
+muJ是特征j的平均值
+
+164
+00:05:48,720 --> 00:05:51,580
+Mu j goes in this term p of xj.
+muj便在xj的p这项里面
+
+165
+00:05:52,440 --> 00:05:53,870
+which is parametrized by mu J
+被muJ
+
+166
+00:05:54,220 --> 00:05:55,590
+and sigma squared J. And
+σJ的平方参数化
+
+167
+00:05:55,920 --> 00:05:57,890
+so this says for the
+因此
+
+168
+00:05:58,360 --> 00:05:59,620
+mu J just take the
+这对muJ就好比
+
+169
+00:05:59,700 --> 00:06:00,720
+mean over my training
+通过特征j的
+
+170
+00:06:01,070 --> 00:06:02,930
+set of the values of the j feature.
+训练集数据取平均值
+
+171
+00:06:03,860 --> 00:06:05,100
+And, just to mention, that you
+同时
+
+172
+00:06:05,220 --> 00:06:07,410
+do this, you compute these
+计算这些公式从
+
+173
+00:06:07,620 --> 00:06:08,830
+formulas for j equals
+j的值从1到n
+
+174
+00:06:09,420 --> 00:06:10,360
+one through n. So use
+然后
+
+175
+00:06:10,700 --> 00:06:11,960
+these formulas to estimate mu
+用这些公式来估计mu1
+
+176
+00:06:12,230 --> 00:06:14,020
+1, to estimate mu
+再估计mu2
+
+177
+00:06:14,070 --> 00:06:15,620
+2, and so on up to
+直到mun
+
+178
+00:06:16,170 --> 00:06:17,460
+mu n, and similarly for sigma
+同样地 对于σ的平方
+
+179
+00:06:17,770 --> 00:06:19,060
+squared, and it's also
+同时
+
+180
+00:06:19,390 --> 00:06:21,530
+possible to come up with vectorized versions of these.
+对于矢量化的参数特征也可能适用该方法
+
+181
+00:06:21,830 --> 00:06:22,900
+So if you think of
+如果
+
+182
+00:06:23,000 --> 00:06:25,220
+mu as a vector, so mu
+你把mu假想成一个向量
+
+183
+00:06:25,920 --> 00:06:27,430
+if is a vector there's mu 1,
+那么向量mu就有
+
+184
+00:06:27,760 --> 00:06:29,230
+mu 2, down to mu
+mu1到mu2直到mun
+
+185
+00:06:29,570 --> 00:06:31,180
+n, then a vectorized
+那么
+
+186
+00:06:31,660 --> 00:06:33,510
+version of that set
+向量化的参数集合
+
+187
+00:06:33,910 --> 00:06:35,530
+of parameters can be written
+就能被写出来
+
+188
+00:06:36,440 --> 00:06:37,830
+like so sum from 1
+其和值
+
+189
+00:06:37,880 --> 00:06:39,610
+equals one through n xi.
+xi的值从1到n
+
+190
+00:06:40,290 --> 00:06:41,290
+So, this formula that I
+在这个公式中
+
+191
+00:06:41,410 --> 00:06:43,530
+just wrote out estimates this
+我刚刚写出用来估计xi值
+
+192
+00:06:43,990 --> 00:06:45,160
+xi as the feature vectors
+的特征向量
+
+193
+00:06:45,660 --> 00:06:48,140
+that estimates mu for all the values of n simultaneously.
+同时 对不同n的取值 这些向量也可以估计出mu的值来
+
+194
+00:06:49,140 --> 00:06:50,070
+And it's also possible to come
+由此 提出一种
+
+195
+00:06:50,430 --> 00:06:52,130
+up with a vectorized formula for
+针对估计σj的平方的值的
+
+196
+00:06:52,290 --> 00:06:55,110
+estimating sigma squared j. Finally,
+矢量化的公式成为可能
+
+197
+00:06:56,500 --> 00:06:57,890
+when you're given a new example, so
+最后 我们给出一个新案例
+
+198
+00:06:58,100 --> 00:06:59,270
+when you have a new aircraft engine
+当有一个全新的飞机引擎时
+
+199
+00:06:59,740 --> 00:07:01,420
+and you want to know is this aircraft engine anomalous.
+你想要知道这个飞机引擎是否出现异常
+
+200
+00:07:02,470 --> 00:07:03,430
+What we need to do is then
+我们要做的就是
+
+201
+00:07:03,570 --> 00:07:05,610
+compute p of x, what's the probability of this new example?
+计算出x的p值来 那么这个案例中的概率是多少呢
+
+202
+00:07:06,790 --> 00:07:07,670
+So, p of x is equal
+我们知道x的p值
+
+203
+00:07:07,990 --> 00:07:09,990
+to this product, and
+等于这个乘上
+
+204
+00:07:10,100 --> 00:07:11,140
+what you implement, what you compute,
+你现在
+
+205
+00:07:11,750 --> 00:07:14,040
+is this formula and
+正在计算的这个公式
+
+206
+00:07:15,000 --> 00:07:16,610
+where over here, this thing
+就是这个
+
+207
+00:07:16,840 --> 00:07:17,900
+here this is just the
+在这里的这个公式
+
+208
+00:07:18,260 --> 00:07:19,250
+formula for the Gaussian
+是针对计算高斯概率的
+
+209
+00:07:19,800 --> 00:07:21,000
+probability, so you compute
+计算出这一项
+
+210
+00:07:21,240 --> 00:07:22,880
+this thing, and finally if
+最后
+
+211
+00:07:22,940 --> 00:07:24,420
+this probability is very small,
+如果这个概率值很小
+
+212
+00:07:24,860 --> 00:07:26,370
+then you flag this thing as an anomaly.
+那么你就将这一项标注为异常
+
+213
+00:07:27,570 --> 00:07:29,380
+Here's an example of an application of this method.
+这就是我们应用这种方法的一个案例
+
+214
+00:07:30,870 --> 00:07:31,860
+Let's say we have this data
+下面就让我们谈谈
+
+215
+00:07:32,210 --> 00:07:35,430
+set plotted on the upper left of this slide.
+在幻灯片左侧上面绘制的数据集
+
+216
+00:07:36,670 --> 00:07:38,860
+if you look at this, well, lets look the feature of x1.
+如果你看到这个 让我们看这个x1的特征
+
+217
+00:07:39,610 --> 00:07:40,640
+If you look at this data set, it
+囚你看到这个数据集
+
+218
+00:07:40,750 --> 00:07:42,600
+looks like on average, the features
+它看起来比较平均
+
+219
+00:07:42,950 --> 00:07:44,330
+x1 has a mean of about 5
+x1的特征里有一个5个元素的平均值
+
+220
+00:07:45,540 --> 00:07:47,420
+and the standard deviation, if
+并且也有一个标准差
+
+221
+00:07:47,590 --> 00:07:48,660
+you only look at just the x1
+如果你看到了
+
+222
+00:07:49,010 --> 00:07:50,030
+values of this data set
+x1数据集的值
+
+223
+00:07:50,310 --> 00:07:51,720
+has the standard deviation of maybe 2.
+其标准差可能为2
+
+224
+00:07:52,370 --> 00:07:55,110
+So that sigma 1 and
+所以对于σ1
+
+225
+00:07:55,460 --> 00:07:57,380
+looks like x2 the
+以及看起来像
+
+226
+00:07:57,670 --> 00:07:59,070
+values of the features as
+从纵轴测量的
+
+227
+00:07:59,250 --> 00:08:00,370
+measured on the vertical axis,
+特征值x2
+
+228
+00:08:00,840 --> 00:08:01,730
+looks like it has an average
+看起来其平均值
+
+229
+00:08:02,010 --> 00:08:03,110
+value of about 3, and
+可能是3
+
+230
+00:08:03,380 --> 00:08:05,750
+a standard deviation of about 1. So if
+且其标准差值为1
+
+231
+00:08:05,880 --> 00:08:06,940
+you take this data set and if
+如果你使用这个数据集
+
+232
+00:08:07,010 --> 00:08:08,690
+you estimate mu1, mu2, sigma1,
+且如果你估计mu1、mu2、σ1、
+
+233
+00:08:09,030 --> 00:08:11,410
+sigma2, this is what you get.
+σ2 这就是你要求解的
+
+234
+00:08:11,610 --> 00:08:12,930
+And again, I'm writing sigma here,
+再次说明一下 我在这里写的σ
+
+235
+00:08:13,140 --> 00:08:14,620
+I'm think about standard deviations, but
+我在思考关于它的标准差
+
+236
+00:08:15,100 --> 00:08:16,240
+the formula on the previous 5
+但先前用公式算出来是5
+
+237
+00:08:16,280 --> 00:08:17,640
+actually gave the estimates of the squares
+实际上可以将这些项平方再进行计算
+
+238
+00:08:18,120 --> 00:08:20,670
+of theses things, so sigma squared 1 and sigma squared 2.
+这样可得到σ1的平方以及σ2的平方
+
+239
+00:08:20,940 --> 00:08:21,920
+So, just be careful whether
+但是你应该仔细
+
+240
+00:08:22,090 --> 00:08:23,260
+you are using sigma 1, sigma
+你现在是否在使用σ1、
+
+241
+00:08:23,380 --> 00:08:25,490
+2, or sigma squared 1 or sigma squared 2.
+σ2以及期其它的σ的平方项
+
+242
+00:08:25,960 --> 00:08:26,700
+So, sigma squared 1 of course
+所以
+
+243
+00:08:26,820 --> 00:08:28,500
+would be equal to 4, for
+σ1的平方的值当然等于4
+
+244
+00:08:31,130 --> 00:08:32,260
+example, as the square of 2.
+举个例子 2的平方
+
+245
+00:08:32,310 --> 00:08:34,010
+And in pictures what p of
+在图上
+
+246
+00:08:34,180 --> 00:08:35,550
+x1 parametrized by mu1 and
+被mu1和σ1的平方以及x2的p值
+
+247
+00:08:35,660 --> 00:08:36,830
+sigma squared 1 and p
+参数化的x1的p
+
+248
+00:08:37,120 --> 00:08:38,130
+of x2, parametrized by mu
+被参数化的mu2
+
+249
+00:08:38,230 --> 00:08:39,050
+2 and sigma squared 2, that
+和σ2的平方
+
+250
+00:08:39,190 --> 00:08:41,360
+would look like these two distributions over here.
+看起来像在这里的两个分布
+
+251
+00:08:42,650 --> 00:08:44,280
+And, turns out that
+其实是
+
+252
+00:08:44,480 --> 00:08:45,960
+if were to plot of p
+如果绘制x的p的图像
+
+253
+00:08:46,210 --> 00:08:47,540
+of x, right, which
+那么就会看到
+
+254
+00:08:47,710 --> 00:08:49,000
+is the product of these two
+是这两个的乘积
+
+255
+00:08:49,210 --> 00:08:50,450
+things, you can actually get
+事实上
+
+256
+00:08:50,800 --> 00:08:52,770
+a surface plot that looks like this.
+你已经得到了一个这样的的表面图像
+
+257
+00:08:53,360 --> 00:08:54,370
+This is a plot of p
+这是
+
+258
+00:08:54,640 --> 00:08:55,920
+of x, where the height
+x的p值的图像
+
+259
+00:08:56,390 --> 00:08:57,730
+above of this, where the
+其高度比这个高
+
+260
+00:08:57,830 --> 00:08:58,950
+height of this surface at
+这个面的高度
+
+261
+00:08:58,990 --> 00:09:01,360
+a particular point, so given a
+在一个特别的点上
+
+262
+00:09:01,470 --> 00:09:03,670
+particular x1 x2
+给出一个特别的X1和X2的值
+
+263
+00:09:03,930 --> 00:09:05,640
+values of x2 if
+如果
+
+264
+00:09:05,800 --> 00:09:07,830
+x1 equals 2, x equal 2, that's this point.
+x1等于2 x等于2 那么就是这个点
+
+265
+00:09:08,510 --> 00:09:09,450
+And the height of this 3-D
+在此处的的3-D表面图
+
+266
+00:09:09,710 --> 00:09:11,280
+surface here, that's p
+那里是x的p值
+
+267
+00:09:13,020 --> 00:09:14,420
+of x. So p of x, that is the height
+是这个图的高度
+
+268
+00:09:14,710 --> 00:09:16,220
+of this plot, is
+从表面上看
+
+269
+00:09:16,340 --> 00:09:17,520
+literally just p of x1
+这个x1的p值是高度
+
+270
+00:09:18,640 --> 00:09:20,010
+parametrized by mu 1 sigma
+同时x1被mu1
+
+271
+00:09:20,290 --> 00:09:22,540
+squared 1, times p
+σ1的平方
+
+272
+00:09:23,200 --> 00:09:25,050
+of x2 parametrized by
+乘x2的p值
+
+273
+00:09:25,120 --> 00:09:27,530
+mu 2 sigma squared 2.
+被mu2以及σ2的平方参数化
+
+274
+00:09:27,720 --> 00:09:29,180
+Now, so this is
+现在
+
+275
+00:09:29,320 --> 00:09:31,400
+how we fit the parameters to this data.
+我们便得到了这组数据的合适的参数值
+
+276
+00:09:31,930 --> 00:09:32,950
+Let's see if we have a couple of new examples.
+让我们看看我们是否有了很多新的案例
+
+277
+00:09:33,530 --> 00:09:35,090
+Maybe I have a new example there.
+或许在这里我们有了一个新案例
+
+278
+00:09:36,700 --> 00:09:38,340
+Is this an anomaly or not?
+这个案例是否异常
+
+279
+00:09:38,550 --> 00:09:39,220
+Or, maybe I have a different
+或许我现在有了一个与以往不同的案例
+
+280
+00:09:39,570 --> 00:09:41,860
+example, maybe I have a different second example over there.
+或许在这里也有了一个与先前不同的第二案例
+
+281
+00:09:42,140 --> 00:09:43,400
+So, is that an anomaly or not?
+那是否有异常呢?
+
+282
+00:09:44,360 --> 00:09:47,050
+They way we do that is, we
+我们可以这么做
+
+283
+00:09:47,190 --> 00:09:48,470
+would set some value for
+我们给计算机无穷小量设置某个数值
+
+284
+00:09:48,620 --> 00:09:49,490
+Epsilon, let's say I've chosen
+我设置的
+
+285
+00:09:50,020 --> 00:09:51,220
+Epsilon equals 0.02.
+计算机无穷小量数值为0.02
+
+286
+00:09:51,980 --> 00:09:54,110
+I'll say later how we choose Epsilon.
+我会在后面讲到如何选取无穷小量的数值
+
+287
+00:09:55,180 --> 00:09:56,110
+But let's take this first
+不过 我们先讲这个第一个案例
+
+288
+00:09:56,540 --> 00:09:57,360
+example, let me call this
+让我把这个称为
+
+289
+00:09:57,500 --> 00:09:59,500
+example X1 test.
+案例X1测试
+
+290
+00:10:00,200 --> 00:10:01,010
+And let me call the second example
+同时
+
+291
+00:10:02,800 --> 00:10:03,900
+X2 test.
+称第二个案例为X2测试
+
+292
+00:10:04,780 --> 00:10:05,670
+What we do is, we
+我们现在要做的
+
+293
+00:10:05,820 --> 00:10:07,380
+then compute p of
+就是把X1测试中的的p值计算出来
+
+294
+00:10:07,540 --> 00:10:08,740
+X1 test, so we use
+之后
+
+295
+00:10:08,990 --> 00:10:10,400
+this formula to compute it and
+观察用这个公式计算出来数值
+
+296
+00:10:11,140 --> 00:10:12,760
+this looks like a pretty large value.
+不难发现这是一个相当大的数
+
+297
+00:10:13,250 --> 00:10:15,560
+In particular, this is greater
+特别地 这个值可能大于
+
+298
+00:10:15,920 --> 00:10:18,480
+than, or greater than or equal to epsilon.
+或者大于等于计算机的无穷小量
+
+299
+00:10:18,670 --> 00:10:19,670
+And so this is a pretty
+但其实相较于计算机的无穷小量
+
+300
+00:10:19,810 --> 00:10:21,290
+high probability at least bigger
+这个数值还是比较大的
+
+301
+00:10:21,490 --> 00:10:22,510
+than epsilon, so we'll say that
+所以我们也说
+
+302
+00:10:22,970 --> 00:10:24,490
+X1 test is not an anomaly.
+X1检测的结果不是异常
+
+303
+00:10:25,650 --> 00:10:27,370
+Whereas, if you compute p of
+然而 如果你计算x2检测的p值
+
+304
+00:10:27,440 --> 00:10:29,810
+X2 test, well that is just a much smaller value.
+会是一个更加小的数
+
+305
+00:10:30,170 --> 00:10:31,340
+So this is less than
+比计算机的无穷小量还要小
+
+306
+00:10:31,490 --> 00:10:32,490
+epsilon and so we'll say
+同时我们也会说
+
+307
+00:10:32,720 --> 00:10:34,400
+that that is indeed an anomaly,
+那确实是一个异常数值
+
+308
+00:10:34,860 --> 00:10:37,350
+because it is much smaller than that epsilon that we then chose.
+因为那已经远远小于了先前我们定义的计算机无穷小量的值
+
+309
+00:10:38,450 --> 00:10:39,950
+And in fact, I'd improve it here.
+但事实上 我们也可以对这个方法进行改良
+
+310
+00:10:40,460 --> 00:10:43,340
+What this is really saying is that, you look through the 3d surface plot.
+看这个三维表面图
+
+311
+00:10:44,660 --> 00:10:46,270
+It's saying that all the
+从图上不难看出
+
+312
+00:10:46,350 --> 00:10:47,940
+values of x1 and x2
+x1和x2的所有数值
+
+313
+00:10:48,210 --> 00:10:50,570
+that have a high height
+所在位置都是比较高的
+
+314
+00:10:50,810 --> 00:10:52,770
+above the surface, corresponds to
+尤其是
+
+315
+00:10:52,910 --> 00:10:55,160
+an a non-anomalous example of an OK or normal example.
+和一个比较好的或者一个正常案例比起来
+
+316
+00:10:55,970 --> 00:10:57,450
+Whereas all the points far out
+然而所有的点
+
+317
+00:10:57,640 --> 00:10:58,940
+here, all the points out
+距离这里都比较远
+
+318
+00:10:59,150 --> 00:11:00,460
+here, all of those
+所有的这些点
+
+319
+00:11:00,580 --> 00:11:01,740
+points have very low
+其概率值都比较小
+
+320
+00:11:01,910 --> 00:11:02,940
+probability, so we are
+所以
+
+321
+00:11:03,020 --> 00:11:04,310
+going to flag those points
+我们会把这些点标注成异常
+
+322
+00:11:04,620 --> 00:11:06,350
+as anomalous, and so it's gonna define
+并依此定义一些区域
+
+323
+00:11:06,760 --> 00:11:07,790
+some region, that maybe looks
+那么这么做之后 看起来可能是这样
+
+324
+00:11:08,000 --> 00:11:09,480
+like this, so that everything
+那些
+
+325
+00:11:09,810 --> 00:11:12,160
+outside this, it flags
+在这之外的
+
+326
+00:11:12,380 --> 00:11:12,580
+as anomalous,
+被标注成为异常
+
+327
+00:11:14,940 --> 00:11:16,260
+whereas the things inside this
+然后在里面的
+
+328
+00:11:16,770 --> 00:11:18,340
+ellipse I just drew, if it
+这个我刚画的椭圆形的这个
+
+329
+00:11:18,570 --> 00:11:21,320
+considers okay, or non-anomalous, not anomalous examples.
+如果它被认为是正常的 也可以说是非异常的案例
+
+330
+00:11:22,110 --> 00:11:24,040
+And so this example x2
+案例X2
+
+331
+00:11:24,250 --> 00:11:26,260
+test lies outside
+位于
+
+332
+00:11:26,650 --> 00:11:27,510
+that region, and so it
+那个区域之外
+
+333
+00:11:27,620 --> 00:11:30,280
+has very small probability, and so we consider it an anomalous example.
+那就有一个很小的概率 并且我们认为它是一个异常的案例
+
+334
+00:11:31,400 --> 00:11:32,990
+In this video we talked about how to
+在本节视频中
+
+335
+00:11:33,460 --> 00:11:35,440
+estimate p of x, the probability of
+我们谈到了如何估计x的p值
+
+336
+00:11:35,590 --> 00:11:36,840
+x, for the purpose of
+x的概率值
+
+337
+00:11:36,930 --> 00:11:38,740
+developing an anomaly detection algorithm.
+以开发出一种异常检测的算法
+
+338
+00:11:39,880 --> 00:11:40,890
+And in this video, we also
+在本节视频中
+
+339
+00:11:41,260 --> 00:11:42,970
+stepped through an entire process
+我们使用给出的数据集
+
+340
+00:11:43,830 --> 00:11:45,090
+of giving data set, we
+将整个操作流程都一步步走了下来
+
+341
+00:11:45,340 --> 00:11:47,740
+have, fitting the parameters, doing parameter estimations.
+我们找到了和合适的参数值并得到了估计的参数值
+
+342
+00:11:48,370 --> 00:11:50,570
+We get mu and sigma parameters, and
+我们得到了mu和σ的参数
+
+343
+00:11:50,700 --> 00:11:52,180
+then taking new examples and deciding
+然后对这些新的案例进行使用
+
+344
+00:11:52,740 --> 00:11:54,110
+if the new examples are anomalous or not.
+判断这些新的案例是不是有异常
+
+345
+00:11:55,490 --> 00:11:56,800
+In the next few videos we
+在接下来的几段视频中
+
+346
+00:11:56,880 --> 00:11:58,580
+will delve deeper into this algorithm,
+我们将更加深入地挖掘这个算法的内含
+
+347
+00:11:58,980 --> 00:11:59,930
+and talk a bit more
+此外
+
+348
+00:12:00,230 --> 00:12:02,310
+about how to actually get this to work well.
+我们也会说道如何更好地将这个算法投入实际的生产和工作中
+
diff --git a/srt/15 - 4 - Developing and Evaluating an Anomaly Detection System (13 min).srt b/srt/15 - 4 - Developing and Evaluating an Anomaly Detection System (13 min).srt
new file mode 100644
index 00000000..638d3165
--- /dev/null
+++ b/srt/15 - 4 - Developing and Evaluating an Anomaly Detection System (13 min).srt
@@ -0,0 +1,1951 @@
+1
+00:00:00,120 --> 00:00:01,220
+In the last video, we developed
+在上一段视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,850 --> 00:00:03,200
+an anomaly detection algorithm.
+我们推导了异常检测算法
+
+3
+00:00:04,150 --> 00:00:05,240
+In this video, I like to
+在这段视频中
+
+4
+00:00:05,300 --> 00:00:06,870
+talk about the process of how
+我想介绍一下
+
+5
+00:00:07,090 --> 00:00:08,750
+to go about developing a specific
+如果开发一个
+
+6
+00:00:09,060 --> 00:00:10,790
+application of anomaly detection
+关于异常检测的应用
+
+7
+00:00:11,410 --> 00:00:12,810
+to a problem and in particular
+来解决一个实际问题
+
+8
+00:00:13,470 --> 00:00:14,500
+this will focus on the problem
+具体来说
+
+9
+00:00:15,090 --> 00:00:18,700
+of how to evaluate an anomaly detection algorithm. In
+我们将重点关注如何评价一个异常检测算法
+
+10
+00:00:18,880 --> 00:00:20,490
+previous videos, we've already talked
+在前面的视频中 我们已经提到了
+
+11
+00:00:20,800 --> 00:00:22,380
+about the importance of real
+使用实数评价法的重要性
+
+12
+00:00:22,570 --> 00:00:24,770
+number evaluation and this captures the idea that
+这样做的想法是
+
+13
+00:00:25,170 --> 00:00:26,810
+when you're trying to develop
+当你在用某个学习算法
+
+14
+00:00:27,270 --> 00:00:28,460
+a learning algorithm for a
+来开发一个具体的
+
+15
+00:00:28,690 --> 00:00:30,300
+specific application, you need to
+机器学习应用时
+
+16
+00:00:30,560 --> 00:00:31,540
+often make a lot of choices
+你常常需要做出很多决定
+
+17
+00:00:31,710 --> 00:00:34,410
+like, you know, choosing what features to use and then so on.
+比如说 选择用什么样的特征 等等
+
+18
+00:00:35,010 --> 00:00:36,800
+And making decisions about all
+而如果你找到某种
+
+19
+00:00:36,880 --> 00:00:38,540
+of these choices is often much
+评价算法的方式
+
+20
+00:00:38,780 --> 00:00:39,890
+easier, and if you have
+直接返回一个数字
+
+21
+00:00:40,040 --> 00:00:41,330
+a way to evaluate your learning
+来告诉你算法的好坏
+
+22
+00:00:41,410 --> 00:00:43,190
+algorithm that just gives you back a number.
+那么你做这些决定就显得更容易了
+
+23
+00:00:44,200 --> 00:00:44,950
+So if you're trying to decide,
+所以比如你要决定
+
+24
+00:00:45,980 --> 00:00:47,130
+you know, I have an idea for
+现在有一个额外的特征
+
+25
+00:00:47,220 --> 00:00:49,730
+one extra feature, do I include this feature or not.
+我要不要把这个特征考虑进来?
+
+26
+00:00:50,560 --> 00:00:51,560
+If you can run the algorithm
+如果你带上这个特征
+
+27
+00:00:51,760 --> 00:00:52,830
+with the feature, and run the
+运行你的算法
+
+28
+00:00:52,960 --> 00:00:54,420
+algorithm without the feature, and
+再去掉这个特征运行你的算法
+
+29
+00:00:54,570 --> 00:00:55,960
+just get back a number that
+然后得到某个返回的数字
+
+30
+00:00:56,100 --> 00:00:57,350
+tells you, you know, did
+这个数字就直接告诉你
+
+31
+00:00:57,460 --> 00:01:00,070
+it improve or worsen performance to add this feature?
+这个特征到底是让算法表现变好了还是变差了
+
+32
+00:01:00,670 --> 00:01:01,480
+Then it gives you a much better
+这样 你就有了一种更好
+
+33
+00:01:01,670 --> 00:01:04,370
+way, a much simpler way, with which
+更简单的方法
+
+34
+00:01:04,590 --> 00:01:06,110
+to decide whether or not to include that feature.
+来确定是不是应该加上这个特征
+
+35
+00:01:07,570 --> 00:01:09,010
+So in order to be
+为了更快地
+
+36
+00:01:09,200 --> 00:01:10,850
+able to develop an anomaly
+开发出一个
+
+37
+00:01:11,410 --> 00:01:13,880
+detection system quickly, it would
+异常检测系统
+
+38
+00:01:14,080 --> 00:01:14,960
+be a really helpful to have
+那么最好能找到某种
+
+39
+00:01:15,150 --> 00:01:17,820
+a way of evaluating an anomaly detection system.
+评价异常检测系统的方法
+
+40
+00:01:19,260 --> 00:01:20,420
+In order to do this,
+为了做到这一点
+
+41
+00:01:20,790 --> 00:01:22,380
+in order to evaluate an anomaly
+为了能评价一个
+
+42
+00:01:22,730 --> 00:01:24,080
+detection system, we're
+异常检测系统
+
+43
+00:01:24,310 --> 00:01:26,380
+actually going to assume have some labeled data.
+我们先假定已有了一些带标签的数据
+
+44
+00:01:27,270 --> 00:01:28,270
+So, so far, we'll be treating
+所以 我们要考虑的
+
+45
+00:01:28,420 --> 00:01:29,870
+anomaly detection as an
+异常检测问题
+
+46
+00:01:30,310 --> 00:01:31,770
+unsupervised learning problem, using
+是一个非监督问题
+
+47
+00:01:32,210 --> 00:01:33,560
+unlabeled data.
+使用的是无标签数据
+
+48
+00:01:34,010 --> 00:01:35,190
+But if you have some labeled
+但如果你有一些
+
+49
+00:01:35,560 --> 00:01:37,390
+data that specifies what
+带标签的数据
+
+50
+00:01:37,700 --> 00:01:39,570
+are some anomalous examples, and
+能够指明哪些是异常样本
+
+51
+00:01:39,670 --> 00:01:42,030
+what are some non-anomalous examples, then
+哪些是非异常样本
+
+52
+00:01:42,470 --> 00:01:43,350
+this is how we actually
+那么这就是我们要找的
+
+53
+00:01:43,630 --> 00:01:45,670
+think of as the standard way of evaluating an anomaly detection algorithm.
+能够评价异常检测算法的标准方法
+
+54
+00:01:45,820 --> 00:01:49,020
+So taking the
+还是以
+
+55
+00:01:49,300 --> 00:01:50,580
+aircraft engine example again.
+飞机发动机的为例
+
+56
+00:01:51,010 --> 00:01:52,680
+Let's say that, you know, we have some
+现在假如你有了一些
+
+57
+00:01:53,070 --> 00:01:55,840
+label data of just a few anomalous
+带标签数据
+
+58
+00:01:56,330 --> 00:01:57,890
+examples of some aircraft engines
+也就是有异常的飞机引擎的样本
+
+59
+00:01:58,400 --> 00:02:00,780
+that were manufactured in the past that turns out to be anomalous.
+这批制造的飞机发动机是有问题的
+
+60
+00:02:01,520 --> 00:02:02,360
+Turned out to be flawed or strange in some way.
+可能有瑕疵 或者别的什么问题
+
+61
+00:02:02,400 --> 00:02:04,130
+Let's say we
+同时我们还有
+
+62
+00:02:04,360 --> 00:02:05,750
+use we also have some non-anomalous
+一些无异常的样本
+
+63
+00:02:06,100 --> 00:02:07,810
+examples, so some
+也就是一些
+
+64
+00:02:08,050 --> 00:02:10,200
+perfectly okay examples.
+完全没问题的样本
+
+65
+00:02:10,940 --> 00:02:12,050
+I'm going to use y equals 0
+我用y=0来表示那些
+
+66
+00:02:12,110 --> 00:02:13,600
+to denote the normal or the
+完全正常
+
+67
+00:02:13,790 --> 00:02:15,470
+non-anomalous example and
+没有问题的样本
+
+68
+00:02:15,530 --> 00:02:21,450
+y equals 1 to denote the anomalous examples.
+用y=1来代表那些异常样本
+
+69
+00:02:22,450 --> 00:02:24,670
+The process of developing and evaluating an anomaly
+那么异常检测算法的推导和评价方法
+
+70
+00:02:25,130 --> 00:02:26,450
+detection algorithm is as follows.
+如下所示
+
+71
+00:02:27,500 --> 00:02:28,300
+We're going to think of it as
+我们先考虑
+
+72
+00:02:28,560 --> 00:02:29,830
+a training set and talk
+训练样本
+
+73
+00:02:30,000 --> 00:02:31,310
+about the cross validation in test
+交叉验证和测试集等下考虑
+
+74
+00:02:31,440 --> 00:02:32,580
+sets later, but the training set we usually
+对于训练集
+
+75
+00:02:33,280 --> 00:02:34,000
+think of this as still the unlabeled
+我们还是看成无标签的
+
+76
+00:02:35,040 --> 00:02:36,180
+training set.
+训练集
+
+77
+00:02:36,510 --> 00:02:37,250
+And so this is our large
+所以这些就是
+
+78
+00:02:37,560 --> 00:02:39,580
+collection of normal, non-anomalous
+所有正常的
+
+79
+00:02:40,190 --> 00:02:41,130
+or not anomalous examples.
+或者说无异常样本的集合
+
+80
+00:02:42,400 --> 00:02:43,530
+And usually we think
+通常来讲
+
+81
+00:02:43,690 --> 00:02:44,750
+of this as being as non-anomalous,
+我们把这些都看成无异常的
+
+82
+00:02:45,010 --> 00:02:46,490
+but it's actually okay even
+但可能有一些异常的
+
+83
+00:02:46,740 --> 00:02:48,660
+if a few anomalies slip into
+也被分到你的训练集里
+
+84
+00:02:48,660 --> 00:02:51,240
+your unlabeled training set.
+这也没关系
+
+85
+00:02:51,420 --> 00:02:52,100
+And next we are going to
+接下来我们要
+
+86
+00:02:52,310 --> 00:02:53,830
+define a cross validation set
+定义交叉验证集
+
+87
+00:02:54,100 --> 00:02:55,510
+and a test set, with which
+和测试集
+
+88
+00:02:55,750 --> 00:02:58,360
+to evaluate a particular anomaly detection algorithm.
+通过这两个集合我们将得到异常检测算法
+
+89
+00:02:59,230 --> 00:03:00,850
+So, specifically, for both the
+具体来说
+
+90
+00:03:01,000 --> 00:03:01,960
+cross validation test sets we're
+对交叉验证集和测试集
+
+91
+00:03:02,080 --> 00:03:03,590
+going to assume that, you know, we
+我们将假设
+
+92
+00:03:03,800 --> 00:03:05,030
+can include a few examples
+我们的交叉验证集
+
+93
+00:03:05,670 --> 00:03:06,720
+in the cross validation set and
+和测试集中
+
+94
+00:03:06,900 --> 00:03:08,150
+the test set that contain examples
+有一些样本
+
+95
+00:03:08,910 --> 00:03:09,660
+that are known to be anomalous.
+这些样本都是异常的
+
+96
+00:03:10,200 --> 00:03:11,410
+So the test sets say
+所以比如测试集
+
+97
+00:03:11,950 --> 00:03:13,270
+we have a few examples with
+里面的样本就是
+
+98
+00:03:13,340 --> 00:03:14,770
+y equals 1 that
+带标签y=1的
+
+99
+00:03:15,040 --> 00:03:17,470
+correspond to anomalous aircraft engines.
+这表示有异常的飞机引擎
+
+100
+00:03:18,640 --> 00:03:19,800
+So here's a specific example.
+这是一个具体的例子
+
+101
+00:03:20,930 --> 00:03:23,120
+Let's say that, altogether, this
+假如说
+
+102
+00:03:23,280 --> 00:03:24,990
+is the data that we have.
+这是我们总的数据
+
+103
+00:03:25,260 --> 00:03:27,410
+We have manufactured 10,000 examples
+我们有10000制造的引擎
+
+104
+00:03:28,130 --> 00:03:29,140
+of engines that, as far
+作为样本
+
+105
+00:03:29,450 --> 00:03:30,740
+as we know we're perfectly normal,
+就我们所知 这些样本
+
+106
+00:03:31,220 --> 00:03:33,110
+perfectly good aircraft engines.
+都是正常没有问题的飞机引擎
+
+107
+00:03:34,060 --> 00:03:35,240
+And again, it turns out to be okay even
+同样地 如果有一小部分
+
+108
+00:03:35,560 --> 00:03:37,310
+if a few flawed engine
+有问题的引擎
+
+109
+00:03:37,740 --> 00:03:39,400
+slips into the set of
+也被混入了这10000个样本
+
+110
+00:03:39,550 --> 00:03:40,860
+10,000 is actually okay, but
+别担心 没有关系
+
+111
+00:03:40,970 --> 00:03:41,970
+we kind of assumed that the vast
+我们假设
+
+112
+00:03:42,410 --> 00:03:44,300
+majority of these
+这10000个样本中
+
+113
+00:03:44,500 --> 00:03:47,660
+10,000 examples are, you know, good and normal non-anomalous engines.
+大多数都是好的 没有问题的引擎
+
+114
+00:03:48,480 --> 00:03:50,940
+And let's say that, you know, historically, however
+而且实际上 从过去的经验来看
+
+115
+00:03:51,200 --> 00:03:52,120
+long we've been running on manufacturing
+无论是制造了多少年
+
+116
+00:03:52,650 --> 00:03:54,130
+plant, let's say that
+引擎的工厂
+
+117
+00:03:54,480 --> 00:03:55,930
+we end up getting features,
+我们都会得到这些数据
+
+118
+00:03:56,440 --> 00:03:57,970
+getting 24 to 28
+都会得到大概24到28个
+
+119
+00:03:58,240 --> 00:04:00,180
+anomalous engines as well.
+有问题的引擎
+
+120
+00:04:01,120 --> 00:04:03,030
+And for a pretty typical application of
+对于异常检测的典型应用来说
+
+121
+00:04:03,310 --> 00:04:05,490
+anomaly detection, you know, the number non-anomalous
+异常样本的个数
+
+122
+00:04:06,740 --> 00:04:08,090
+examples, that is with y equals
+也就是y=1的样本
+
+123
+00:04:08,760 --> 00:04:10,650
+1, we may have anywhere from, you know, 20 to 50.
+基本上很多都是20到50个
+
+124
+00:04:10,820 --> 00:04:12,920
+It would be a pretty typical
+通常这个范围
+
+125
+00:04:13,360 --> 00:04:14,570
+range of examples, number of
+对y=1的样本数量
+
+126
+00:04:14,830 --> 00:04:16,710
+examples that we have with y equals 1.
+还是很常见的
+
+127
+00:04:16,910 --> 00:04:17,730
+And usually we will have a
+并且通常我们的
+
+128
+00:04:17,860 --> 00:04:20,000
+much larger number of good examples.
+正常样本的数量要大得多
+
+129
+00:04:21,810 --> 00:04:23,150
+So, given this data set,
+有了这组数据
+
+130
+00:04:24,180 --> 00:04:25,410
+a fairly typical way to split
+把数据分为训练集
+
+131
+00:04:25,850 --> 00:04:27,150
+it into the training set,
+交叉验证集和测试集
+
+132
+00:04:27,430 --> 00:04:29,210
+cross validation set and test set would be as follows.
+一种典型的分法如下
+
+133
+00:04:30,390 --> 00:04:31,880
+Let's take 10,000 good aircraft
+我们把这10000个正常的引擎
+
+134
+00:04:32,360 --> 00:04:34,060
+engines and put 6,000
+放6000个到
+
+135
+00:04:34,260 --> 00:04:37,100
+of that into the unlabeled training set.
+无标签的训练集中
+
+136
+00:04:37,620 --> 00:04:38,800
+So, I'm calling this an unlabeled training
+我叫它“无标签训练集”
+
+137
+00:04:39,130 --> 00:04:40,050
+set but all of these examples
+但其实所有这些样本
+
+138
+00:04:40,640 --> 00:04:42,510
+are really ones that correspond to
+实际上都对应
+
+139
+00:04:42,810 --> 00:04:44,380
+y equals 0, as far as we know.
+y=0的情况 至少据我们所知是这样
+
+140
+00:04:45,300 --> 00:04:46,350
+And so, we will use this to
+所以 我们要用它们
+
+141
+00:04:46,520 --> 00:04:48,840
+fit p of x, right.
+来拟合p(x)
+
+142
+00:04:49,150 --> 00:04:49,850
+So, we will use these 6000 engines
+也就是是我们用这6000个引擎
+
+143
+00:04:50,350 --> 00:04:51,180
+to fit p of x, which
+来拟合p(x)
+
+144
+00:04:51,360 --> 00:04:52,190
+is that p of x
+也就是p 括号
+
+145
+00:04:52,420 --> 00:04:53,930
+one parametrized by Mu
+x1 参数是μ1
+
+146
+00:04:54,330 --> 00:04:56,380
+1, sigma squared 1, up
+σ1的平方
+
+147
+00:04:56,540 --> 00:04:57,700
+to p of Xn parametrized
+一直到p(xn; μn, σn^2)
+
+148
+00:04:58,370 --> 00:04:59,570
+by Mu N sigma squared
+参数是μn σn的平方
+
+149
+00:05:00,790 --> 00:05:02,300
+n. And so it would be these
+因此我们就是要用这
+
+150
+00:05:02,500 --> 00:05:03,930
+6,000 examples that we would
+6000个样本
+
+151
+00:05:04,110 --> 00:05:05,370
+use to estimate the parameters
+来估计参数
+
+152
+00:05:05,590 --> 00:05:06,760
+Mu 1, sigma squared 1,
+μ1, σ1
+
+153
+00:05:07,140 --> 00:05:08,960
+up to Mu N, sigma
+一直到
+
+154
+00:05:09,200 --> 00:05:10,280
+squared N. And so that's our training
+μn, σn
+
+155
+00:05:10,500 --> 00:05:11,960
+set of all, you know,
+这就是训练集中的好的样本
+
+156
+00:05:12,150 --> 00:05:13,980
+good, or the vast majority of good examples.
+或者说大多数好的样本
+
+157
+00:05:15,430 --> 00:05:16,950
+Next we will take our good
+然后 我们取一些
+
+158
+00:05:17,140 --> 00:05:18,380
+aircraft engines and put some
+好的飞机引擎样本
+
+159
+00:05:18,660 --> 00:05:19,470
+number of them in a cross
+放一些到交叉验证集
+
+160
+00:05:19,580 --> 00:05:21,320
+validation set plus some number
+再放一些到
+
+161
+00:05:21,570 --> 00:05:22,970
+of them in the test sets.
+测试集中
+
+162
+00:05:23,280 --> 00:05:24,300
+So 6,000 plus 2,000 plus 2,000,
+正好6000加2000加2000
+
+163
+00:05:24,480 --> 00:05:25,470
+that's how we split up our
+这10000个样本
+
+164
+00:05:25,740 --> 00:05:28,820
+10,000 good aircraft engines.
+就这样进行分割了
+
+165
+00:05:29,260 --> 00:05:31,460
+And then we also have 20
+同时 我们还有20个
+
+166
+00:05:31,930 --> 00:05:33,380
+flawed aircraft engines, and we'll
+异常的发动机样本
+
+167
+00:05:33,490 --> 00:05:34,890
+take that and maybe split it
+同样也把它们进行一个分割
+
+168
+00:05:35,160 --> 00:05:36,100
+up, you know, put ten of them in
+放10个到验证集中
+
+169
+00:05:36,200 --> 00:05:37,230
+the cross validation set and put
+剩下10个
+
+170
+00:05:37,370 --> 00:05:39,560
+ten of them in the test sets.
+放入测试集中
+
+171
+00:05:39,850 --> 00:05:41,320
+And in the next slide
+在下一张幻灯片中
+
+172
+00:05:41,660 --> 00:05:42,460
+we will talk about how to
+我们将看到如何用
+
+173
+00:05:42,750 --> 00:05:43,800
+actually use this to evaluate
+这些分好的数据
+
+174
+00:05:44,520 --> 00:05:46,330
+the anomaly detection algorithm.
+来推导异常检测的算法
+
+175
+00:05:48,130 --> 00:05:49,140
+So what I have
+好的
+
+176
+00:05:49,220 --> 00:05:50,610
+just described here is a you
+刚才我介绍的这些内容
+
+177
+00:05:50,790 --> 00:05:52,300
+know probably the recommend a
+可能是一种
+
+178
+00:05:52,440 --> 00:05:55,290
+good way of splitting the labeled and unlabeled example.
+比较推荐的方法来划分带标签和无标签的数据
+
+179
+00:05:55,820 --> 00:05:57,970
+The good and the flawed aircraft engines.
+来划分好的和坏的飞机引擎样本
+
+180
+00:05:58,480 --> 00:06:00,380
+Where we use like
+我们使用了
+
+181
+00:06:00,730 --> 00:06:01,650
+a 60, 20, 20% split for
+6:2:2的比例
+
+182
+00:06:01,800 --> 00:06:03,350
+the good engines and we take
+来分配好的引擎样本
+
+183
+00:06:03,570 --> 00:06:04,780
+the flawed engines, and we
+而坏的引擎样本
+
+184
+00:06:04,910 --> 00:06:05,750
+put them just in the cross
+我们只把它们放到
+
+185
+00:06:05,870 --> 00:06:06,940
+validation set, and just in
+交叉验证集和测试集中
+
+186
+00:06:07,030 --> 00:06:09,200
+the test set, then we'll see in the next slide why that's the case.
+在下一页中我们将讲解这样分的理由
+
+187
+00:06:10,370 --> 00:06:12,080
+Just as an aside, if you
+顺便说一下
+
+188
+00:06:12,360 --> 00:06:13,360
+look at how people apply anomaly
+如果你看到别人应用
+
+189
+00:06:13,750 --> 00:06:15,400
+detection algorithms, sometimes you see
+异常检测的算法时
+
+190
+00:06:15,510 --> 00:06:16,980
+other peoples' split the data differently as well.
+有时候也可能会有不同的分配方法
+
+191
+00:06:17,460 --> 00:06:19,400
+So, another alternative, this is really
+另一种分配数据的方法是这样的
+
+192
+00:06:19,660 --> 00:06:21,290
+not a recommended alternative, but
+其实我真的不推荐这么分
+
+193
+00:06:21,470 --> 00:06:23,650
+some people want to
+但就有人喜欢这么分
+
+194
+00:06:23,790 --> 00:06:24,770
+take off your 10,000 good engines, maybe put 6000
+也就是把10000个好的引擎分出6000个
+
+195
+00:06:24,820 --> 00:06:26,020
+of them in your training set
+放到训练集中
+
+196
+00:06:26,320 --> 00:06:27,130
+and then put the same
+然后把剩下的4000个样本
+
+197
+00:06:27,650 --> 00:06:28,800
+4000 in the cross validation
+既用作交叉验证集
+
+198
+00:06:30,380 --> 00:06:31,020
+set and the test set.
+也用作测试集
+
+199
+00:06:31,170 --> 00:06:32,030
+And so, you know, we like to think of the cross
+通常来说我们要交叉验证集
+
+200
+00:06:32,360 --> 00:06:33,340
+validation set and the
+和测试集当作是
+
+201
+00:06:33,400 --> 00:06:34,620
+test set as being completely
+完全互不相同的
+
+202
+00:06:35,280 --> 00:06:36,370
+different data sets to each other.
+两个数据组
+
+203
+00:06:37,690 --> 00:06:39,030
+But you know, in anomaly detection, you know,
+但就像我说的 在异常检测中
+
+204
+00:06:39,230 --> 00:06:40,360
+for sometimes you see
+有时候你会发现有些人
+
+205
+00:06:40,600 --> 00:06:41,760
+people, sort of, use the
+会使用相同的
+
+206
+00:06:42,070 --> 00:06:42,970
+same set of good engines
+一部分好的引擎样本
+
+207
+00:06:43,370 --> 00:06:44,640
+in the cross validation sets, and
+用作交叉验证集
+
+208
+00:06:44,710 --> 00:06:46,150
+the test sets, and sometimes you
+也用作测试集
+
+209
+00:06:46,250 --> 00:06:48,070
+see people use exactly the
+并且有时候你还会发现
+
+210
+00:06:48,180 --> 00:06:49,800
+same sets of anomalous
+他们会把同样的一些异常样本
+
+211
+00:06:50,880 --> 00:06:54,190
+engines in the cross validation set and the test set.
+放入交叉验证集合测试集
+
+212
+00:06:54,590 --> 00:06:55,970
+And so, all of these are considered, you know,
+总之 所有这样考虑的
+
+213
+00:06:56,850 --> 00:06:59,030
+less good practices and definitely less recommended.
+都不是一个好的尝试 非常不推荐
+
+214
+00:07:00,250 --> 00:07:01,360
+Certainly using the same
+把交叉验证集
+
+215
+00:07:01,650 --> 00:07:02,530
+data in the cross validation
+和测试集数据
+
+216
+00:07:03,200 --> 00:07:04,220
+set and the test set, that
+混在一起共用
+
+217
+00:07:04,510 --> 00:07:06,400
+is not considered a good machine learning practice.
+确实不是一个比较好的机器学习惯例
+
+218
+00:07:07,180 --> 00:07:08,780
+But, sometimes you see people do this too.
+但这种情况还不少见
+
+219
+00:07:09,790 --> 00:07:11,330
+So, given the training cross
+话说回来 给出之前分好的
+
+220
+00:07:11,550 --> 00:07:12,780
+validation and test sets,
+训练集、交叉验证集和测试集
+
+221
+00:07:13,260 --> 00:07:15,220
+here's how you evaluate or
+异常检测算法的
+
+222
+00:07:15,370 --> 00:07:16,920
+here is how you develop and evaluate an algorithm.
+推导和评估方法如下
+
+223
+00:07:18,490 --> 00:07:19,510
+First, we take the training sets
+首先 我们使用训练样本
+
+224
+00:07:19,910 --> 00:07:20,750
+and we fit the model p of
+来拟合模型p(x)
+
+225
+00:07:20,840 --> 00:07:21,860
+x. So, we fit, you know, all these
+也就是说
+
+226
+00:07:22,210 --> 00:07:24,600
+Gaussians to my m
+我们用所有这些高斯函数
+
+227
+00:07:25,060 --> 00:07:26,690
+unlabeled examples of aircraft engines,
+来拟合m个无标签的飞机引擎样本
+
+228
+00:07:27,090 --> 00:07:28,140
+and these, I am calling them
+虽然这里我称它们为
+
+229
+00:07:28,270 --> 00:07:29,560
+unlabeled examples, but these are
+无标签的样本
+
+230
+00:07:29,660 --> 00:07:31,370
+really examples that we're assuming
+但实际上是我们假设的
+
+231
+00:07:31,790 --> 00:07:33,390
+our goods are the normal aircraft engines.
+都是正常的飞机引擎
+
+232
+00:07:34,580 --> 00:07:36,510
+Then imagine that your
+然后假定你的
+
+233
+00:07:36,650 --> 00:07:38,590
+anomaly detection algorithm is actually making prediction.
+异常检测算法作出了预测
+
+234
+00:07:39,030 --> 00:07:40,070
+So, on the cross validation
+所以 给出交叉验证集
+
+235
+00:07:40,630 --> 00:07:42,280
+of the test set, given that,
+或者测试集
+
+236
+00:07:42,610 --> 00:07:44,660
+say, test example X, think
+给出某个测试样本x
+
+237
+00:07:44,840 --> 00:07:46,490
+of the algorithm as predicting that
+假设这个算法对 p(x)<ε
+
+238
+00:07:46,730 --> 00:07:48,090
+y is equal to 1, p
+的情况作出的
+
+239
+00:07:48,230 --> 00:07:49,320
+of x is less than epsilon,
+预测为 y=1
+
+240
+00:07:50,040 --> 00:07:51,740
+we must be taking zero, if
+而p(x)≥ε时
+
+241
+00:07:52,280 --> 00:07:54,760
+p of x is
+算法作出的预测为
+
+242
+00:07:54,950 --> 00:07:57,340
+greater than or equal to epsilon.
+y=0
+
+243
+00:07:58,390 --> 00:07:59,280
+So, given x, it's trying to predict, what is
+也就是说 给出x的值
+
+244
+00:07:59,340 --> 00:08:00,270
+the label, given y equals 1 corresponding
+预测出y的值 y=1对应
+
+245
+00:08:00,500 --> 00:08:01,470
+to an anomaly or is
+有异常的样本
+
+246
+00:08:01,630 --> 00:08:06,380
+it y equals 0 corresponding to a normal example?
+y=0对应正常样本
+
+247
+00:08:07,290 --> 00:08:09,480
+So given the training, cross validation, and test sets.
+(下面内容重复了,译者注) 给定训练集、交叉验证集和测试集
+
+248
+00:08:09,940 --> 00:08:11,080
+How do you develop an algorithm?
+你应该如何推导算法呢?
+
+249
+00:08:11,480 --> 00:08:12,920
+And more specifically, how do
+或者更具体来说
+
+250
+00:08:13,010 --> 00:08:15,450
+you evaluate an anomaly detection algorithm?
+怎样推导异常检测算法来进行评估呢?
+
+251
+00:08:15,790 --> 00:08:17,470
+Well, to this whole,
+为了推导算法
+
+252
+00:08:17,820 --> 00:08:19,410
+the first step is to take
+我们的第一步是
+
+253
+00:08:19,670 --> 00:08:21,130
+the unlabeled training set, and
+取出所有的无标签的训练样本
+
+254
+00:08:21,290 --> 00:08:23,520
+to fit the model p of x lead training data.
+拟合出模型p(x)
+
+255
+00:08:23,990 --> 00:08:25,060
+So you take this, you know
+虽然我说的是
+
+256
+00:08:25,130 --> 00:08:26,620
+on I'm coming, unlabeled training set,
+无标签的训练集
+
+257
+00:08:26,910 --> 00:08:28,390
+but really, these are examples
+但实际上这些样本
+
+258
+00:08:28,870 --> 00:08:30,290
+that we are assuming, vast majority
+我们已经假设它们
+
+259
+00:08:30,750 --> 00:08:32,400
+of which are normal aircraft engines,
+大多数都是正常的飞机引擎
+
+260
+00:08:32,900 --> 00:08:34,020
+not because they're not anomalies
+是没有异常的
+
+261
+00:08:34,150 --> 00:08:35,380
+and it will
+然后要用这些训练集
+
+262
+00:08:35,490 --> 00:08:36,470
+fit the model p of x. It
+拟合出模型p(x)
+
+263
+00:08:36,640 --> 00:08:38,110
+will fit all those parameters for all
+也就是用所有这些
+
+264
+00:08:38,240 --> 00:08:40,330
+the Gaussians on this data.
+高斯模型拟合出参数
+
+265
+00:08:41,560 --> 00:08:43,190
+Next on the cross validation
+接下来 对交叉验证集
+
+266
+00:08:43,300 --> 00:08:44,480
+of the test set, we're
+和测试集
+
+267
+00:08:44,600 --> 00:08:45,460
+going to think of the anomaly
+我们要让异常检测算法
+
+268
+00:08:46,100 --> 00:08:47,530
+detention algorithm as trying to
+来对y的值
+
+269
+00:08:47,640 --> 00:08:48,580
+predict the value of
+作出一个预测
+
+270
+00:08:49,540 --> 00:08:51,670
+y. So in each of like
+所以 假如对我的每一个
+
+271
+00:08:52,430 --> 00:08:53,470
+say test examples.
+测试样本
+
+272
+00:08:54,140 --> 00:08:56,110
+We have these X-I tests,
+我们有
+
+273
+00:08:57,200 --> 00:08:58,720
+Y-I test, where y is
+(x(i)test, y(i)test)
+
+274
+00:08:58,870 --> 00:09:00,100
+going to be equal to
+其中y=1或0
+
+275
+00:09:00,320 --> 00:09:03,230
+1 or 0 depending on whether this was an anomalous example.
+对应于这个样本是否是异常的
+
+276
+00:09:04,370 --> 00:09:05,510
+So given input x in
+因此 给定测试集中的输入x
+
+277
+00:09:05,600 --> 00:09:07,340
+my test set, my anomaly detection
+我的异常检测算法
+
+278
+00:09:07,730 --> 00:09:08,850
+algorithm think of it as
+将作出预测
+
+279
+00:09:09,100 --> 00:09:11,880
+predicting the y as 1 if p of x is less than epsilon.
+当p(x)小于ε时 预测y=1
+
+280
+00:09:12,240 --> 00:09:15,120
+So predicting that it is an anomaly, it is probably is very low.
+所以 在概率值很小的时候 预测样本是异常的
+
+281
+00:09:15,960 --> 00:09:17,810
+And we think of the algorithm is predicting that y is equal to 0.
+如果p(x)的值大于或等于ε时
+
+282
+00:09:17,970 --> 00:09:20,830
+If p of x is greater then or equals epsilon.
+算法将预测y=0
+
+283
+00:09:21,290 --> 00:09:23,600
+So predicting those normal example
+也就是说如果概率p(x)比较大的时候
+
+284
+00:09:24,200 --> 00:09:26,340
+if the p of x is reasonably large.
+预测该样本为正常样本
+
+285
+00:09:27,350 --> 00:09:28,510
+And so we can now
+所以现在
+
+286
+00:09:28,720 --> 00:09:29,930
+think of the anomaly detection algorithm
+我们可以把
+
+287
+00:09:30,580 --> 00:09:32,040
+as making predictions for what
+异常检测算法想成是
+
+288
+00:09:32,240 --> 00:09:33,490
+are the values of these y
+对交叉验证集
+
+289
+00:09:33,830 --> 00:09:35,080
+labels in the test sets
+和测试集中的y
+
+290
+00:09:36,050 --> 00:09:37,000
+or on the cross validation set.
+进行一个预测
+
+291
+00:09:37,720 --> 00:09:39,140
+And this puts us somewhat more similar
+这样多多少少让人感到
+
+292
+00:09:39,670 --> 00:09:41,250
+to the supervised learning setting, right?
+和监督学习有点类似 不是吗?
+
+293
+00:09:41,620 --> 00:09:42,870
+Where we have label test
+我们有带标签的测试集
+
+294
+00:09:43,170 --> 00:09:44,550
+set and our algorithm is
+而我们的算法就是
+
+295
+00:09:44,800 --> 00:09:46,060
+making predictions on these labels
+对这些标签作出预测
+
+296
+00:09:46,190 --> 00:09:48,050
+and so we can evaluate it you
+所以我们可以通过
+
+297
+00:09:48,480 --> 00:09:50,930
+know by seeing how often it gets these labels right.
+对标签预测正确的次数来进行评价
+
+298
+00:09:52,180 --> 00:09:53,820
+Of course these labels are
+当然 这些标签会比较偏向
+
+299
+00:09:54,540 --> 00:09:56,420
+will be very skewed because y
+因为y=0
+
+300
+00:09:56,710 --> 00:09:57,930
+equals zero, that is normal
+也就是正常的样本
+
+301
+00:09:58,300 --> 00:10:00,490
+examples, usually be much
+肯定是比出现
+
+302
+00:10:00,780 --> 00:10:01,930
+more common than y equals
+y=1 也就是异常样本
+
+303
+00:10:02,310 --> 00:10:03,520
+1 than anomalous examples.
+的情况更多
+
+304
+00:10:04,660 --> 00:10:05,610
+But, you know, this is much closer
+这跟我们在监督学习中
+
+305
+00:10:06,040 --> 00:10:06,990
+to the source of evaluation
+用到的评价度量
+
+306
+00:10:07,690 --> 00:10:09,770
+metrics we can use in supervised learning.
+方法非常接近
+
+307
+00:10:12,390 --> 00:10:14,500
+So what's a good evaluation metric to use.
+那么用什么评价度量好呢?
+
+308
+00:10:14,790 --> 00:10:18,530
+Well, because the data is
+因为数据是非常偏斜的
+
+309
+00:10:18,840 --> 00:10:20,450
+very skewed, because y equals 0 is
+因为y=0是
+
+310
+00:10:20,880 --> 00:10:22,980
+much more common, classification accuracy
+更加常见的
+
+311
+00:10:23,520 --> 00:10:24,950
+would not be a good the evaluation metrics.
+因此分类准确度不是一个好的度量法
+
+312
+00:10:25,360 --> 00:10:26,760
+So, we talked about this in the earlier video.
+我们之前的视频中也讲过
+
+313
+00:10:28,360 --> 00:10:29,130
+So, if you have a very
+如果你有一个
+
+314
+00:10:29,410 --> 00:10:31,360
+skewed data set, then predicting
+比较偏斜的数据集
+
+315
+00:10:31,740 --> 00:10:32,750
+y equals 0 all the time,
+那么总是预测y=0
+
+316
+00:10:33,430 --> 00:10:34,300
+will have very high classification accuracy.
+它的分类准确度自然会很高
+
+317
+00:10:35,690 --> 00:10:36,820
+Instead, we should use evaluation
+取而代之的
+
+318
+00:10:37,330 --> 00:10:38,920
+metrics, like computing the fraction
+我们应该算出
+
+319
+00:10:39,530 --> 00:10:41,030
+of true positives, false positives,
+真阳性、假阳性、
+
+320
+00:10:41,670 --> 00:10:42,940
+false negatives, true negatives or
+假阴性和真阴性的比率
+
+321
+00:10:43,580 --> 00:10:44,830
+compute the position of the
+来作为评价度量值
+
+322
+00:10:44,890 --> 00:10:46,370
+v curve of this algorithm or
+我们也可以算出查准率和召回率
+
+323
+00:10:46,790 --> 00:10:48,370
+do things like compute the
+或者算出
+
+324
+00:10:48,520 --> 00:10:50,510
+f1 score, right, which is
+F1-积分
+
+325
+00:10:50,630 --> 00:10:51,680
+a single real number way of summarizing
+通过一个很简单的数字
+
+326
+00:10:52,600 --> 00:10:53,450
+the position and the recall numbers.
+来总结出查准和召回的大小
+
+327
+00:10:54,340 --> 00:10:55,090
+And so these would be ways
+通过这些方法
+
+328
+00:10:55,750 --> 00:10:56,940
+to evaluate an anomaly detection
+你就可以评价你的异常检测算法
+
+329
+00:10:57,320 --> 00:10:59,090
+algorithm on your cross validation set or on your test set.
+在交叉验证和测试集样本中的表现
+
+330
+00:11:01,550 --> 00:11:02,960
+Finally, earlier in the
+最后一点 之前在
+
+331
+00:11:03,100 --> 00:11:05,050
+anomaly detection algorithm, we
+异常检测算法中
+
+332
+00:11:05,200 --> 00:11:06,720
+also had this parameter epsilon, right?
+我们有一个参数ε对吧?
+
+333
+00:11:07,400 --> 00:11:09,100
+So, epsilon is this threshold
+这个ε是我们用来决定
+
+334
+00:11:09,920 --> 00:11:11,320
+that we would use to decide
+什么时候把一个样本当作是
+
+335
+00:11:11,790 --> 00:11:13,630
+when to flag something as an anomaly.
+异常样本的一个阈值
+
+336
+00:11:14,840 --> 00:11:16,740
+And so, if you have
+所以 如果你有
+
+337
+00:11:16,840 --> 00:11:18,380
+a cross validation set, another way
+一组交叉验证集样本
+
+338
+00:11:19,000 --> 00:11:20,320
+to and to choose this parameter
+一种选择参数ε的方法
+
+339
+00:11:20,710 --> 00:11:22,020
+epsilon, would be to
+就是你可以试一试
+
+340
+00:11:22,240 --> 00:11:24,090
+try a different, try many
+多个不同的
+
+341
+00:11:24,340 --> 00:11:26,220
+different values of epsilon, and
+ε的取值
+
+342
+00:11:26,380 --> 00:11:27,520
+then pick the value of epsilon
+然后选出一个
+
+343
+00:11:27,990 --> 00:11:30,670
+that, let's say, maximizes f1
+使得F1-积分的值最大的那个ε
+
+344
+00:11:31,620 --> 00:11:33,940
+score, or that otherwise does well on your cross validation set.
+也就是在交叉验证集中表现最好的
+
+345
+00:11:35,340 --> 00:11:36,770
+And more generally, the way
+更一般来说
+
+346
+00:11:37,000 --> 00:11:38,230
+to reduce the training, testing,
+我们使用训练集、测试集
+
+347
+00:11:38,630 --> 00:11:40,230
+and cross validation sets, is that
+和交叉验证集的方法是
+
+348
+00:11:41,760 --> 00:11:43,020
+when we are trying to make decisions,
+当我们需要作出决定时
+
+349
+00:11:43,640 --> 00:11:45,430
+like what features to include, or
+比如要包括哪些特征
+
+350
+00:11:45,570 --> 00:11:46,580
+trying to, you know, tune the parameter
+或者说要确定参数ε取多大合适
+
+351
+00:11:47,100 --> 00:11:48,160
+epsilon, we would then
+我们就可以
+
+352
+00:11:48,410 --> 00:11:51,010
+continually evaluate the algorithm
+不断地用交叉验证集
+
+353
+00:11:51,500 --> 00:11:52,870
+on the cross validation sets and
+来评价这个算法
+
+354
+00:11:53,000 --> 00:11:54,120
+make all those decisions like what
+然后决定我们应该
+
+355
+00:11:54,320 --> 00:11:55,700
+features did you use, you know,
+用哪些特征
+
+356
+00:11:55,790 --> 00:11:57,650
+how to set epsilon, use that, evaluate
+怎样选择ε
+
+357
+00:11:58,240 --> 00:11:59,410
+the algorithm on the cross validation
+所以 就是在交叉验证集中
+
+358
+00:11:59,880 --> 00:12:00,870
+set, and then when we've
+评价算法
+
+359
+00:12:01,320 --> 00:12:02,770
+picked the set of features, when
+然后我们选出一组特征
+
+360
+00:12:02,910 --> 00:12:03,860
+we've found the value of
+或者当我们找到了
+
+361
+00:12:04,070 --> 00:12:05,150
+epsilon that we're happy with, we
+能符合我们要求的ε的值后
+
+362
+00:12:05,250 --> 00:12:07,030
+can then take the final model and
+我们就能用最终的模型
+
+363
+00:12:07,270 --> 00:12:08,680
+evaluate it, you know, do the
+来评价这个算法
+
+364
+00:12:08,770 --> 00:12:11,360
+final evaluation of the algorithm on the test sets.
+或者说 用测试集 来最终评价算法的表现
+
+365
+00:12:12,680 --> 00:12:13,900
+So, in this video, we talked
+在这段视频中
+
+366
+00:12:14,180 --> 00:12:15,240
+about the process of how
+我们介绍了
+
+367
+00:12:15,520 --> 00:12:17,060
+to evaluate an anomaly
+如何评价一个异常检测算法
+
+368
+00:12:17,520 --> 00:12:18,970
+detection algorithm, and again,
+同样地
+
+369
+00:12:19,960 --> 00:12:21,220
+having being able to evaluate
+在能够评价算法之后
+
+370
+00:12:21,410 --> 00:12:22,540
+an algorithm, you know, with
+通过一个简单的的
+
+371
+00:12:22,640 --> 00:12:23,830
+a single real number evaluation,
+数值的评价方法
+
+372
+00:12:24,730 --> 00:12:25,630
+with a number like an F1 score
+用一个简单的F1-积分
+
+373
+00:12:26,530 --> 00:12:27,660
+that often allows you to
+这样你就能更有效率地
+
+374
+00:12:28,080 --> 00:12:29,710
+much more efficient use
+在开发异常检测系统时
+
+375
+00:12:29,960 --> 00:12:31,160
+of your time when you are
+更有效率地利用好你的时间
+
+376
+00:12:31,280 --> 00:12:33,250
+trying to develop an anomaly detection system.
+把时间用在刀刃上
+
+377
+00:12:33,800 --> 00:12:34,970
+And we try to make these sorts of decisions.
+我们能够作出决定
+
+378
+00:12:35,650 --> 00:12:38,020
+I have to chose epsilon, what features to include, and so on.
+确定应该如何选取ε 应该包括哪些特征等等
+
+379
+00:12:38,970 --> 00:12:39,920
+In this video, we started
+在这段视频中
+
+380
+00:12:40,330 --> 00:12:40,830
+to use a bit of labeled
+刚开始我们用了一组
+
+381
+00:12:41,590 --> 00:12:42,750
+data in order to
+带标签的数据
+
+382
+00:12:43,020 --> 00:12:43,550
+evaluate the anomaly detection algorithm and
+目的是为了评价异常检测算法
+
+383
+00:12:43,570 --> 00:12:45,710
+this takes us a
+这让我们感觉到
+
+384
+00:12:45,890 --> 00:12:48,340
+little bit closer to a supervised learning setting.
+跟监督学习很相像
+
+385
+00:12:49,610 --> 00:12:50,810
+In the next video, I'm going
+在下一节视频中
+
+386
+00:12:51,000 --> 00:12:52,780
+to say a bit more about that.
+我们将更深入地谈到这个问题
+
+387
+00:12:52,940 --> 00:12:54,240
+And in particular we'll talk about when should
+具体来说 我们将谈到
+
+388
+00:12:54,440 --> 00:12:55,860
+you be using an anomaly detection
+应该如何使用异常检测算法
+
+389
+00:12:56,310 --> 00:12:57,130
+algorithm and when should we
+以及什么时候我们应该用监督学习
+
+390
+00:12:57,560 --> 00:13:00,770
+be thinking about using supervised learning instead, and what are the differences between these two formalisms.
+这两种算法到底有什么区别
+
diff --git a/srt/15 - 5 - Anomaly Detection vs. Supervised Learning (8 min).srt b/srt/15 - 5 - Anomaly Detection vs. Supervised Learning (8 min).srt
new file mode 100644
index 00000000..32ecd865
--- /dev/null
+++ b/srt/15 - 5 - Anomaly Detection vs. Supervised Learning (8 min).srt
@@ -0,0 +1,1166 @@
+1
+00:00:00,180 --> 00:00:01,210
+In the last video, we talked
+在上一讲中,我们讨论了(字幕翻译:中国海洋大学 刘竞)
+
+2
+00:00:01,580 --> 00:00:02,950
+about the process of evaluating
+评估一个
+
+3
+00:00:03,790 --> 00:00:05,780
+an anomaly detection algorithm and
+异常检测算法的过程
+
+4
+00:00:05,910 --> 00:00:06,980
+there we started to use some
+当时我们使用了一些带有标签的数据,
+
+5
+00:00:07,210 --> 00:00:08,810
+labelled data, with examples
+对于使用的例子,我们知道
+
+6
+00:00:08,880 --> 00:00:10,150
+that we knew were either anomalous
+它们要么是正常的
+
+7
+00:00:11,010 --> 00:00:13,170
+or not anomalous, with y equals 1 or y equals 0.
+要么是异常的,y要么等于1,要么等于0.
+
+8
+00:00:14,690 --> 00:00:15,380
+So the question then arises, if
+所以问题就出来了,
+9
+00:00:15,690 --> 00:00:17,700
+we have this labeled data,
+假如我们有这些带有标签的数据,
+
+9
+00:00:18,130 --> 00:00:19,620
+we have some examples that are
+我们有一些例子,我们知道
+
+10
+00:00:19,750 --> 00:00:20,840
+known to be anomalies and some
+它们是异常,还有一些例子
+
+11
+00:00:21,020 --> 00:00:21,850
+that are known not to be not
+我们知道它们是正常的,
+
+12
+00:00:22,090 --> 00:00:23,540
+anomalies, why don't we
+那么为什么我们不用
+
+13
+00:00:23,640 --> 00:00:25,580
+just use a supervised learning algorithm,
+一个监督学习算法,
+
+14
+00:00:25,720 --> 00:00:26,790
+so why don't we just use
+那么为什么我们不用一个
+
+15
+00:00:27,110 --> 00:00:28,360
+logistic regression or a neural
+逻辑回归或者一个神经
+
+16
+00:00:28,680 --> 00:00:29,770
+network to try to
+网络算法去试图
+
+17
+00:00:30,020 --> 00:00:31,260
+learn directly from our labeled
+从我们的带标签的数据中直接学习,
+
+18
+00:00:31,550 --> 00:00:34,120
+data, to predict whether y equals one or y equals zero.
+并去预测是否y等于1或者y等于0呢?
+
+19
+00:00:34,900 --> 00:00:35,900
+In this video, I'll try to
+在这个视频中,我将尝试去
+
+20
+00:00:36,160 --> 00:00:37,170
+share with you some of
+和大学一起分享
+
+21
+00:00:37,350 --> 00:00:38,820
+the thinking and some guidelines for
+一些思想以及指导方针,这将用于
+
+22
+00:00:39,130 --> 00:00:40,610
+when you should probably use an
+当你可能会使用一个
+
+23
+00:00:40,720 --> 00:00:42,160
+anomaly detection algorithm and when
+异常检测算法的情况,但是当你
+
+24
+00:00:42,440 --> 00:00:43,500
+it might be more fruitful to consider
+使用一个监督学习算法时
+
+25
+00:00:43,920 --> 00:00:45,380
+using a supervised learning algorithm.
+将会更加有效的情况。
+
+26
+00:00:47,160 --> 00:00:48,950
+This slide shows, what are
+从这个幻灯片可以看到,
+
+27
+00:00:49,010 --> 00:00:50,130
+the settings under which you should
+什么情况下你
+
+28
+00:00:50,900 --> 00:00:52,370
+maybe use anomaly detection versus
+很可能使用异常检测算法,
+
+29
+00:00:52,930 --> 00:00:54,590
+when supervised learning might be more fruitful.
+以及什么情况下使用监督学习算法更有效。
+
+30
+00:00:56,030 --> 00:00:57,440
+If you have a problem with a
+如果你有一个问题,问题中有
+
+31
+00:00:57,560 --> 00:00:58,820
+very small number of positive
+很少数量的正例,
+
+32
+00:00:59,720 --> 00:01:01,780
+examples, and remember examples of
+记住,当这些例子
+
+33
+00:01:01,890 --> 00:01:03,000
+y equals one are the
+的y的取值等于1时,
+
+34
+00:01:03,650 --> 00:01:05,520
+anomalous examples, then
+它们是异常的例子,
+
+35
+00:01:06,170 --> 00:01:08,160
+you might consider using an anomaly detection algorithm inset.
+那么你可以考虑使用一个嵌入的异常检测算法。
+
+36
+00:01:09,260 --> 00:01:10,430
+So having 0 to 20,
+所以如果有0到20个
+
+37
+00:01:10,600 --> 00:01:12,740
+maybe up to 50 positive examples,
+可能最多到50个正例,
+
+38
+00:01:13,450 --> 00:01:15,190
+might be pretty typical, and usually,
+这可能是很典型的,并且通常情况下,
+
+39
+00:01:15,680 --> 00:01:16,810
+we have such a small set
+我们有很少数量的
+
+40
+00:01:17,130 --> 00:01:18,340
+of positive examples,
+正例,
+
+41
+00:01:19,270 --> 00:01:20,170
+we are going to save the positive
+我们会把这些正例保留下来,
+
+42
+00:01:20,510 --> 00:01:21,530
+examples just for the cross
+仅是把它们作为交叉
+44
+00:01:21,840 --> 00:01:24,440
+validation sets and test sets.
+验证集和测试集。
+
+43
+00:01:24,850 --> 00:01:26,130
+In contrast, in a typical
+与些相反,在一个典型的
+
+44
+00:01:26,510 --> 00:01:28,560
+normal anomaly detection setting,
+正规的异常检测下,
+
+45
+00:01:29,340 --> 00:01:30,630
+we will often have a relatively
+我们通常将会有一个相对更大
+
+46
+00:01:31,010 --> 00:01:32,340
+large number of negative examples,
+数量的反例,
+
+47
+00:01:33,110 --> 00:01:34,300
+of these normal examples of
+比如说这些正常的
+
+48
+00:01:34,910 --> 00:01:36,710
+normal aircraft engines.
+航空发动机。
+
+49
+00:01:37,720 --> 00:01:38,900
+And we can then use this very
+我们可以用这些更大
+
+50
+00:01:39,200 --> 00:01:40,240
+large number of negative examples,
+数量的反例,
+
+51
+00:01:41,470 --> 00:01:42,510
+with which to fit the model
+去拟合
+
+52
+00:01:43,000 --> 00:01:44,090
+p of x. And so, there is
+关于x的模型p. 因此,
+55
+00:01:44,190 --> 00:01:45,930
+this idea in many anomaly detection
+在许多异常检测
+
+53
+00:01:46,320 --> 00:01:48,510
+applications, you have
+应用中有这样一个思想,
+
+54
+00:01:48,760 --> 00:01:50,220
+very few positive examples, and
+你有很少的正例,
+
+55
+00:01:50,320 --> 00:01:52,540
+lots of negative examples, and when
+很多的反例,
+
+56
+00:01:52,810 --> 00:01:54,960
+we are doing the process of
+当我们在估计关于x的模型p的过程中,
+
+57
+00:01:55,220 --> 00:01:57,520
+estimating p of x, of fitting all those Gaussian parameters,
+当我们拟合这些高斯参数时,
+
+58
+00:01:58,650 --> 00:02:00,690
+we need only negative examples to do that.
+我们只需要反例就够了。
+
+59
+00:02:00,850 --> 00:02:01,680
+So if you have a lot of negative data,
+所以,如果你有大量的反例数据,
+
+60
+00:02:02,140 --> 00:02:04,310
+we can still fit to p of x pretty well.
+你仍然可以很好的拟合关于x的模型p.
+
+61
+00:02:05,340 --> 00:02:07,090
+In contrast, for supervised learning,
+与此相反,对于监督学习而言,
+
+62
+00:02:07,760 --> 00:02:08,790
+more typically we would have
+更为典型情况是我们需要
+
+63
+00:02:09,180 --> 00:02:10,810
+a reasonably large number of
+有一个合理的数量比较大的
+
+64
+00:02:11,050 --> 00:02:12,370
+both positive and negative examples.
+正例和反例。
+
+65
+00:02:13,920 --> 00:02:14,970
+And so this is one
+所以,这是一个
+
+66
+00:02:15,070 --> 00:02:16,240
+way to look at your problem
+观察你的问题的方式
+
+67
+00:02:16,770 --> 00:02:17,860
+and decide if you should use
+以决定你是应该使用
+
+68
+00:02:18,240 --> 00:02:20,180
+an anomaly detection algorithm or a supervised learning algorithm.
+一个异常检测算法还是一个监督学习算法。
+
+69
+00:02:21,750 --> 00:02:24,750
+Here is another way people often think about anomaly detection algorithms.
+这有另外一个人们经常思考异常检测算法的方式。
+
+70
+00:02:25,530 --> 00:02:26,890
+So, for anomaly detection applications
+确实如此,对于许多异常检测的应用,
+
+71
+00:02:27,900 --> 00:02:28,890
+often there are many
+经常有许多
+
+72
+00:02:29,040 --> 00:02:30,600
+different types of anomalies.
+不同类型的异常。
+
+73
+00:02:31,280 --> 00:02:31,770
+So think about aircraft engines.
+比如航空发动机。
+
+74
+00:02:32,040 --> 00:02:34,680
+You know there are so many different ways for aircraft engines to go wrong.
+你们知道当发动机异常时,有许多不同的异常形式。
+
+75
+00:02:34,880 --> 00:02:36,980
+Right? There are so many things that could go wrong that could break an aircraft engine.
+对吧?有许多方面可以出现故障,打断航空发动机的正常工作。
+
+76
+00:02:38,830 --> 00:02:40,080
+And so, if that's the
+因此,如果情况就是这样,
+
+77
+00:02:40,120 --> 00:02:40,940
+case and you have a pretty small
+你有很少数量的
+
+78
+00:02:41,140 --> 00:02:43,560
+set of positive examples, then
+正例,那么
+
+79
+00:02:44,430 --> 00:02:46,760
+it can be difficult for
+这将很困难,
+
+80
+00:02:47,580 --> 00:02:48,380
+an algorithm to learn from your small
+如果使用一个算法从你的
+
+81
+00:02:48,740 --> 00:02:50,130
+set of positive examples what the anomalies look like.
+这么少数量的正例中去学习异常是什么。
+
+82
+00:02:50,180 --> 00:02:51,880
+And in particular,
+尤其是
+
+83
+00:02:52,800 --> 00:02:54,050
+you know, future anomalies may look
+你也知道,未来可能出现的异常
+
+84
+00:02:54,220 --> 00:02:55,750
+nothing like the ones you've seen so far.
+可能与你至今见过的异常完全不同。
+
+85
+00:02:56,050 --> 00:02:57,540
+So maybe in your set
+可能在你的
+
+86
+00:02:57,790 --> 00:02:59,030
+of positive examples, maybe you
+正例中,有5个或者10个,或者20个
+
+87
+00:02:59,190 --> 00:02:59,740
+had seen 5 or 10, or 20
+你已经见过的
+
+88
+00:02:59,950 --> 00:03:02,960
+different ways that an aircraft engine could go wrong.
+航空发动机发生故障的不同情况。
+
+89
+00:03:03,780 --> 00:03:05,600
+But maybe tomorrow, you
+但是,可能到了明天,你
+
+90
+00:03:05,780 --> 00:03:07,110
+need to detect a totally
+就需要去检测一个全新
+
+91
+00:03:07,440 --> 00:03:08,870
+new set, a totally new
+的集合,一个全新
+
+92
+00:03:09,250 --> 00:03:10,620
+type of anomaly, a totally
+异常类型, 一个全
+
+93
+00:03:10,820 --> 00:03:12,170
+new way for an aircraft
+新的航空
+
+94
+00:03:12,570 --> 00:03:13,870
+engine to be broken that
+发动机故障的方式,
+
+95
+00:03:14,090 --> 00:03:15,660
+you have just never seen before,
+而且这个你之前从来就没有见过,
+
+96
+00:03:15,950 --> 00:03:17,010
+and if that is the case,
+并且,假如情况就是这样,
+
+97
+00:03:17,550 --> 00:03:18,460
+then it might be more
+那么它将更
+
+98
+00:03:18,650 --> 00:03:20,020
+promising to just model
+有希望,如果去为反例建立
+
+99
+00:03:20,480 --> 00:03:21,770
+the negative examples, with a
+一个模型,
+
+100
+00:03:21,970 --> 00:03:23,620
+sort of a Gaussian model
+模型是一个种关于x的高斯
+
+101
+00:03:23,970 --> 00:03:24,950
+P of X. Rather than try
+模型P。而不是很费劲地
+
+102
+00:03:25,290 --> 00:03:26,250
+too hard to model the positive
+去建立模型去拟合
+
+103
+00:03:26,640 --> 00:03:27,850
+examples, because, you know,
+正例,因为,你也知道
+
+104
+00:03:28,040 --> 00:03:29,310
+tomorrow's anomaly may be
+明天的异常可能
+
+105
+00:03:29,420 --> 00:03:32,680
+nothing like the ones you've seen so far.
+与你至今见过的这些完全不同。
+
+106
+00:03:33,140 --> 00:03:34,640
+In contrast, in some other
+与此相反,在一些其它的
+
+107
+00:03:34,790 --> 00:03:36,170
+problems you have enough
+问题中,你有足够的
+
+108
+00:03:36,600 --> 00:03:37,790
+positive examples for an algorithm
+正例使得一个算法
+
+109
+00:03:38,730 --> 00:03:40,850
+to get a sense of what the positive examples are like.
+能够学习到正例是什么样子的。
+
+110
+00:03:40,980 --> 00:03:42,860
+And in particular, if you
+并且尤其是,假如你
+
+111
+00:03:42,960 --> 00:03:44,270
+think that future positive examples
+认为未来的正例
+
+112
+00:03:44,870 --> 00:03:45,690
+are likely to be similar
+很可能与你当前
+
+113
+00:03:46,130 --> 00:03:46,980
+to ones in the training set,
+训练集中的正例很相似。
+
+114
+00:03:47,670 --> 00:03:49,090
+then in that setting it might
+那么,在这种情况下,
+
+115
+00:03:49,230 --> 00:03:51,720
+be more reasonable to have a supervised learning algorithm,
+使用一个监督学习算法将会更合理,
+
+116
+00:03:52,550 --> 00:03:53,390
+that looks at a lot of
+算法通过察看
+
+117
+00:03:53,520 --> 00:03:54,760
+the positive examples, looks at a
+正例,察看很多的
+
+118
+00:03:54,930 --> 00:03:56,530
+lot of the negative examples, and
+反例,通过观察到的这些知识,
+
+119
+00:03:56,650 --> 00:03:58,980
+uses that to try to distinguish between positives and negatives.
+去尝试区分正例和反例。
+
+120
+00:04:01,620 --> 00:04:02,780
+So hopefully this gives you
+希望这些可以让你
+
+121
+00:04:02,870 --> 00:04:04,180
+a sense of if you have
+明白,当你有一个
+
+122
+00:04:04,520 --> 00:04:05,690
+a specific problem you should think
+特定的问题,你能知道
+
+123
+00:04:05,950 --> 00:04:07,800
+about using the anomaly
+你应该使用异常检测算法
+
+124
+00:04:08,110 --> 00:04:09,450
+detection algorithm or a supervised learning algorithm.
+还是监督学习算法。
+
+125
+00:04:11,110 --> 00:04:12,340
+And the key difference really is,
+最关键的不同之处在于
+
+126
+00:04:12,520 --> 00:04:13,870
+that in anomaly detection, after
+在异常检测中,
+
+127
+00:04:14,020 --> 00:04:15,040
+we have such a small
+我们只有很少数量的
+
+128
+00:04:15,330 --> 00:04:17,200
+number of positive examples that there
+正例,以致于对于一个
+
+129
+00:04:17,240 --> 00:04:18,640
+is not possible, for a learning
+学习算法而言,它是不可能
+
+130
+00:04:19,330 --> 00:04:21,810
+algorithm to learn that much from the positive examples.
+从正例中学习到足够的知识。
+
+131
+00:04:22,430 --> 00:04:23,440
+And so what we do instead,
+所以,我们应该做的是
+
+132
+00:04:23,890 --> 00:04:25,050
+is take a large set of
+采用大量的
+
+133
+00:04:25,230 --> 00:04:26,420
+negative examples, and have it just
+反例, 并且让它学习
+
+134
+00:04:27,050 --> 00:04:28,070
+learned a lot, learned p
+很多,学习到
+
+135
+00:04:28,230 --> 00:04:29,300
+of x from just the negative
+关于反例的,关于x的模型p,
+
+136
+00:04:29,500 --> 00:04:31,730
+examples of the normal aircraft engines, say.
+例如这些反例就是正常的航空发动机。
+
+137
+00:04:32,190 --> 00:04:33,480
+And we reserve the small
+我们保留小数量的
+
+138
+00:04:33,640 --> 00:04:36,740
+number of positive examples for evaluating our algorithm
+正例去评估我们的算法,
+
+139
+00:04:37,350 --> 00:04:39,680
+to use in either the cross validation sets or the test sets.
+这一少部分的正例要么用于交叉验证集要么用于测试集。
+
+140
+00:04:41,210 --> 00:04:42,380
+And just as a side comment about
+仅仅是对这
+
+141
+00:04:42,620 --> 00:04:43,970
+these many different types of
+许多不同类型的异常
+145
+00:04:44,090 --> 00:04:45,490
+anomalies, you know, in
+的一个方面的说明,你知道的,
+
+142
+00:04:45,790 --> 00:04:46,910
+some earlier videos we talked
+在前几讲的视频中,我们讨论的
+
+143
+00:04:47,050 --> 00:04:49,060
+about the email SPAM examples.
+垃圾邮件的例子。
+
+144
+00:04:50,020 --> 00:04:51,510
+In those examples, there are
+在这些例子中,实际有
+
+145
+00:04:51,910 --> 00:04:53,450
+actually many different types of SPAM email.
+有许多不同类型的垃圾邮件。
+
+146
+00:04:53,930 --> 00:04:54,750
+The SPAM email is trying to
+有种垃圾邮件企图向你
+
+147
+00:04:55,030 --> 00:04:57,650
+sell you things spam email, trying to steal your passwords,
+兜售物品,企图偷取你的密码,
+
+148
+00:04:58,470 --> 00:05:01,060
+this is called fishing emails, and many different types of SPAM emails.
+这种叫做钓鱼邮件,而且有许多种其它类型的垃圾邮件。
+
+149
+00:05:01,820 --> 00:05:03,490
+But for the SPAM problem, we usually
+但是对于垃圾邮件问题,我们通常
+
+150
+00:05:03,930 --> 00:05:05,660
+have enough examples of spam
+有足够的垃圾
+
+151
+00:05:06,000 --> 00:05:07,400
+email to see, you know,
+邮件例子去观察,你也知道,
+
+152
+00:05:07,490 --> 00:05:08,650
+most of these different types of
+对于这许多不同类型的垃圾邮件,
+
+153
+00:05:08,890 --> 00:05:10,200
+SPAM email, because we have a
+因为我们有大量的
+
+154
+00:05:10,410 --> 00:05:11,650
+large set of examples of
+的例子,
+
+155
+00:05:11,860 --> 00:05:13,050
+SPAM, and that's why we
+这就是为什么我们
+
+156
+00:05:13,330 --> 00:05:14,800
+usually think of SPAM as
+通常把垃圾邮件问题看成是
+
+157
+00:05:14,980 --> 00:05:16,510
+a supervised learning setting, even
+一个监督学习问题,
+
+158
+00:05:16,710 --> 00:05:17,390
+though, you know, there may be
+即使我们都知道,有许多
+
+159
+00:05:17,530 --> 00:05:19,230
+many different types of SPAM.
+不同各类的垃圾邮件。
+
+160
+00:05:21,890 --> 00:05:23,170
+And so, if we look at
+因此, 假如我们
+
+161
+00:05:23,310 --> 00:05:24,940
+some applications of anomaly detection
+通过于监督学习相对比,
+
+162
+00:05:25,600 --> 00:05:27,290
+versus supervised learning, we'll find
+观察一些使用异常监测的应用,我们会
+
+163
+00:05:27,480 --> 00:05:29,280
+that, in fraud detection, if
+发现,在欺骗检测中,
+
+164
+00:05:29,410 --> 00:05:31,040
+you have many different types
+如果你有许多不同的方式
+
+165
+00:05:31,450 --> 00:05:32,510
+of ways for people to
+使人们可以
+
+166
+00:05:32,680 --> 00:05:34,120
+try to commit fraud, and a
+尝试欺骗,并且有相对
+
+167
+00:05:34,170 --> 00:05:35,730
+relevantly small training set, a
+较少的训练集,
+
+168
+00:05:35,880 --> 00:05:37,500
+small number of fraudulent users
+以及网站上只有很小一部分的欺骗性
+
+169
+00:05:37,920 --> 00:05:40,300
+on your website, then I would use an anomaly detection algorithm.
+用户,那么我将使用一个异常检测算法。
+
+170
+00:05:41,310 --> 00:05:42,520
+I should say, if you
+我应该说,假如你有,
+
+171
+00:05:42,650 --> 00:05:44,560
+have, if you are very a
+假如你是一个
+
+172
+00:05:44,700 --> 00:05:46,810
+major online retailer, and
+大的在线零售商,
+
+173
+00:05:46,930 --> 00:05:48,170
+if you actually have had a
+如果实际上
+
+174
+00:05:48,330 --> 00:05:49,230
+lot of people try to commit
+在你的网站上有很多人
+
+175
+00:05:49,390 --> 00:05:50,420
+fraud on your website, so if
+尝试欺骗,那么
+
+176
+00:05:50,480 --> 00:05:51,340
+you actually have a lot of
+你实际上有许多例子,
+
+177
+00:05:51,410 --> 00:05:53,760
+examples where y equals 1, then
+这些例子的y等于1,那么,
+
+178
+00:05:53,970 --> 00:05:55,410
+you know, sometimes fraud detection
+你也知道,有时欺骗检测
+
+179
+00:05:55,700 --> 00:05:58,030
+could actually shift over to the supervised learning column.
+实际上可以转换成监督学习问题。
+
+180
+00:05:58,850 --> 00:06:01,000
+But, if you
+但是,如果你
+
+181
+00:06:01,210 --> 00:06:02,440
+haven't seen that many
+还没有见过许多例子,
+
+182
+00:06:02,940 --> 00:06:04,480
+examples of users doing
+在这些例子中,用户
+
+183
+00:06:04,690 --> 00:06:05,720
+strange things on your website
+在你的网站上做奇怪的事情,
+
+184
+00:06:05,920 --> 00:06:07,970
+then, more frequently, fraud detection
+那么,更常见的是,欺骗检测
+
+185
+00:06:08,510 --> 00:06:09,730
+is actually treated as an
+实际是被异常检测算法来处理了,
+
+186
+00:06:09,990 --> 00:06:12,060
+anomaly detection algorithm, rather than one of the supervised learning algorithm.
+而不是被监督学习算法处理。
+
+187
+00:06:14,140 --> 00:06:15,160
+Other examples, we talked about
+其它的例子,我们讨论过的
+
+188
+00:06:15,310 --> 00:06:16,810
+manufacturing already, hopefully you'll
+生产问题,希望你可以
+
+189
+00:06:16,950 --> 00:06:18,230
+see more normal examples,
+看到更多正常的例子,
+
+190
+00:06:19,110 --> 00:06:19,840
+not that many anomalies.
+而不是那么多异常例子。
+
+191
+00:06:20,520 --> 00:06:21,560
+But then again, for some manufacturing
+但是,话又说回来,对于一些生产
+
+192
+00:06:22,180 --> 00:06:23,900
+processes, if you're
+过程,如果你在
+
+193
+00:06:23,990 --> 00:06:25,690
+manufacturing very large volumes
+进行大量生产,
+
+194
+00:06:25,860 --> 00:06:26,780
+and you've seen a lot
+并且你已经见到了许多
+
+195
+00:06:27,230 --> 00:06:29,220
+of bad examples, maybe manufacturing
+坏的例子,可能生产问题
+
+196
+00:06:29,790 --> 00:06:31,690
+could shift to the supervised learning column as well.
+也转换到了监督学习问题。
+
+197
+00:06:32,630 --> 00:06:33,680
+But, if you haven't seen that
+但是,假如你在过去的产品中
+
+198
+00:06:33,950 --> 00:06:35,640
+many bad examples of
+还没有见过许多坏的例子,
+
+199
+00:06:35,830 --> 00:06:38,140
+the old products, then I'll do this anomaly detection.
+那么,我将做异常检测。
+
+200
+00:06:39,180 --> 00:06:40,290
+Monitoring machines in the
+在数据中心
+
+201
+00:06:40,400 --> 00:06:42,450
+data center, again similar
+的监视机上,
+
+202
+00:06:42,880 --> 00:06:44,050
+sorts of arguments apply.
+也是相似的情况。
+
+203
+00:06:45,280 --> 00:06:46,650
+Whereas, email SPAM
+然而,对于
+
+204
+00:06:47,070 --> 00:06:48,950
+classification, weather prediction, and classifying
+垃圾邮件的分类,天气预报,
+
+205
+00:06:49,510 --> 00:06:50,580
+cancers, if you have
+关于癌症的分类这类问题,如果
+
+206
+00:06:51,200 --> 00:06:52,850
+equal numbers of positive and
+你有相同数量的正例
+
+207
+00:06:52,870 --> 00:06:53,920
+negative examples, a lot of you
+和反例,你们有许多,
+
+208
+00:06:54,010 --> 00:06:55,550
+have many examples of your
+这样的例子,在这些例子中,
+
+209
+00:06:55,670 --> 00:06:56,780
+positive and your negative
+你有许多正例和反例,
+
+210
+00:06:57,030 --> 00:06:57,870
+examples, then, we would tend to
+那么,我们会把这些情况
+
+211
+00:06:58,080 --> 00:07:00,630
+treat all of these as supervised learning problems.
+都采用监督学习来处理。
+
+212
+00:07:03,400 --> 00:07:04,500
+So, hopefully, that gives you
+所以,我希望,这些可以
+
+213
+00:07:04,580 --> 00:07:05,600
+a sense of what are the
+让你们明白什么样的
+
+214
+00:07:05,770 --> 00:07:07,050
+properties of a learning
+学习方法的特征
+
+215
+00:07:07,350 --> 00:07:08,980
+problem that would cause you to
+让你选择
+
+216
+00:07:09,420 --> 00:07:10,410
+treat it as an anomaly
+采用异常检测算法去处理,
+
+217
+00:07:10,810 --> 00:07:12,660
+detention problem verses a supervised learning
+什么样的特征让你选择采用监督
+
+218
+00:07:14,250 --> 00:07:14,250
+problem.
+学习算法去处理。
+
+219
+00:07:14,690 --> 00:07:16,020
+And for many of the problems that are
+对于各种各样的技术公司
+
+220
+00:07:16,260 --> 00:07:17,820
+faced by various technology companies
+所面临的问题中,
+
+221
+00:07:18,200 --> 00:07:19,780
+and so on, we actually are
+当我们实际
+
+222
+00:07:19,860 --> 00:07:20,900
+in these settings where we have
+处于这样的情况下,当我们
+
+223
+00:07:21,510 --> 00:07:23,320
+very few or sometimes zero
+有很少甚至是没有
+
+224
+00:07:24,060 --> 00:07:25,090
+positive training examples,
+正训练例子时,
+
+225
+00:07:25,400 --> 00:07:26,830
+maybe there are so many
+可能会大量的
+
+226
+00:07:26,980 --> 00:07:28,410
+different types of anomalies that we've never
+不同类型的我们从
+
+227
+00:07:28,530 --> 00:07:29,810
+seen them before, and for those
+没有见过的异常,对于这样的
+
+228
+00:07:29,960 --> 00:07:31,900
+sorts of problems, very often,
+问题,通常情况下
+
+229
+00:07:32,440 --> 00:07:33,580
+the algorithm that is used
+使用的算法是
+
+230
+00:07:33,790 --> 00:07:35,170
+is an anomaly detection algorithm.
+一个异常检测算法。
+
diff --git a/srt/15 - 6 - Choosing What Features to Use (12 min).srt b/srt/15 - 6 - Choosing What Features to Use (12 min).srt
new file mode 100644
index 00000000..1d1d2775
--- /dev/null
+++ b/srt/15 - 6 - Choosing What Features to Use (12 min).srt
@@ -0,0 +1,1829 @@
+1
+00:00:00,200 --> 00:00:01,770
+By now you've seen the anomaly
+在此之前 你已经学习了
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,250 --> 00:00:03,540
+detection algorithm and we've
+异常检测算法 并且
+
+3
+00:00:03,740 --> 00:00:05,240
+also talked about how to
+我们也讨论了如何
+
+4
+00:00:05,570 --> 00:00:06,870
+evaluate an anomaly detection
+评估一个异常检测算法
+
+5
+00:00:07,330 --> 00:00:08,880
+algorithm. It turns out,
+事实上
+
+6
+00:00:09,530 --> 00:00:10,800
+that when you're applying anomaly
+当你应用异常检测时
+
+7
+00:00:11,170 --> 00:00:12,400
+detection, one of the
+对它的效率
+
+8
+00:00:12,460 --> 00:00:13,290
+things that has a huge
+影响最大的
+
+9
+00:00:13,720 --> 00:00:14,860
+effect on how well it
+因素之一是
+
+10
+00:00:14,940 --> 00:00:16,440
+does, is what features you
+你使用什么特征变量
+
+11
+00:00:16,520 --> 00:00:17,720
+use, and what features you choose,
+你选择什么特征变量
+
+12
+00:00:18,530 --> 00:00:19,910
+to give the anomaly detection algorithm.
+来输入异常检测算法
+
+13
+00:00:20,830 --> 00:00:22,170
+So in this video, what I'd
+那么 在本视频中
+
+14
+00:00:22,280 --> 00:00:23,390
+like to do is say a few
+我将要做的事情就是
+
+15
+00:00:23,480 --> 00:00:24,890
+words, give some suggestions and
+给你们一些建议
+
+16
+00:00:25,000 --> 00:00:26,250
+guidelines for how to
+关于如何设计或选择
+
+17
+00:00:26,370 --> 00:00:27,920
+go about designing or selecting
+异常检测算法的
+
+18
+00:00:28,470 --> 00:00:30,950
+features give to an anomaly detection algorithm.
+特征变量
+
+19
+00:00:33,920 --> 00:00:35,310
+In our anomaly detection algorithm,
+在我们的异常检测算法中
+
+20
+00:00:36,120 --> 00:00:37,270
+one of the things we did was
+我们做的事情之一就是
+
+21
+00:00:37,510 --> 00:00:40,330
+model the features using this sort of Gaussian distribution.
+使用这种正态(高斯)分布来对特征向量建模
+
+22
+00:00:41,180 --> 00:00:42,810
+With xi to mu
+就是有 xi 服从正态分布 期望为μi
+
+23
+00:00:43,120 --> 00:00:46,050
+i, sigma squared i, lets say.
+方差为 σi 平方
+
+24
+00:00:46,550 --> 00:00:47,890
+And so one thing that
+那么 我常做的一件事
+
+25
+00:00:47,950 --> 00:00:49,620
+I often do would be to plot the
+就是画出这些数据
+
+26
+00:00:50,670 --> 00:00:52,260
+data or the histogram of
+或者用直方图表示数据
+
+27
+00:00:52,330 --> 00:00:53,490
+the data, to make sure that
+以确保
+
+28
+00:00:53,940 --> 00:00:55,210
+the data looks vaguely
+这些数据在
+
+29
+00:00:55,540 --> 00:00:57,320
+Gaussian before feeding it
+应用我的异常检测算法前
+
+30
+00:00:57,470 --> 00:00:58,830
+to my anomaly detection algorithm.
+看起来像高斯分布
+
+31
+00:00:59,810 --> 00:01:01,040
+And, it'll usually work okay,
+当然即使你的数据并不是高斯分布
+
+32
+00:01:01,610 --> 00:01:02,820
+even if your data isn't Gaussian,
+它也基本上可以良好地运行
+
+33
+00:01:03,400 --> 00:01:05,700
+but this is sort of a nice sanitary check to run.
+它也基本上可以良好地运行
+
+34
+00:01:05,970 --> 00:01:06,860
+And by the way, in case your data
+如果你的数据
+
+35
+00:01:07,400 --> 00:01:09,540
+looks non-Gaussian, the algorithms will often work just find.
+看起来不像正态分布 算法也常常可以正常运行
+
+36
+00:01:10,410 --> 00:01:12,070
+But, concretely if I
+但是具体而言
+
+37
+00:01:12,430 --> 00:01:13,510
+plot the data like this,
+我将数据画成这样
+
+38
+00:01:13,850 --> 00:01:15,280
+and if it looks like a histogram like
+如果它的柱状图看起来
+
+39
+00:01:15,370 --> 00:01:16,480
+this, and the way
+像这样 另外说一下
+
+40
+00:01:16,630 --> 00:01:17,800
+to plot a histogram is to
+画柱状图的方法是
+
+41
+00:01:17,950 --> 00:01:19,910
+use the HIST, or the
+使用 hist 命令
+
+42
+00:01:20,130 --> 00:01:21,820
+HIST command in Octave,
+就是 Octave 里面的 hist 命令
+
+43
+00:01:21,910 --> 00:01:22,800
+but it looks like this, this looks
+但看起来好像
+
+44
+00:01:23,010 --> 00:01:24,770
+vaguely Gaussian, so if
+这个图形近似像一个高斯分布
+
+45
+00:01:24,940 --> 00:01:26,200
+my features look like this,
+所以如果我的特征变量是这样的
+
+46
+00:01:26,480 --> 00:01:29,970
+I would be pretty happy feeding into my algorithm.
+那么我可以很高兴地把它们送入我的学习算法了
+
+47
+00:01:30,180 --> 00:01:31,830
+But if i were to plot a histogram of my
+但如果我画出来的
+
+48
+00:01:31,950 --> 00:01:33,070
+data, and it were
+直方图是这样的话
+
+49
+00:01:33,210 --> 00:01:34,800
+to look like this well, this
+好吧
+
+50
+00:01:35,060 --> 00:01:36,090
+doesn't look at all like a
+那么这就看起来完全不像钟形曲线
+
+51
+00:01:36,220 --> 00:01:38,430
+bell shaped curve, this is a very asymmetric distribution,
+这个分布很不对称
+
+52
+00:01:39,410 --> 00:01:40,660
+it has a peak way off to one side.
+它的峰值非常偏向一边
+
+53
+00:01:41,750 --> 00:01:42,660
+If this is what my data
+如果我的数据是这样的话
+
+54
+00:01:42,800 --> 00:01:43,960
+looks like, what I'll often
+通常我要做的事情
+
+55
+00:01:44,190 --> 00:01:45,370
+do is play with different
+是对数据进行一些不同的转换
+
+56
+00:01:45,730 --> 00:01:46,920
+transformations of the data in order
+来确保这些数据
+
+57
+00:01:47,010 --> 00:01:48,850
+to make it look more Gaussian.
+看起来更像高斯分布
+
+58
+00:01:49,480 --> 00:01:51,940
+And again the algorithm will usually work okay, even if you don't.
+虽然通常来说你不这么做 算法也会运行地很好
+
+59
+00:01:52,590 --> 00:01:53,660
+But if you use these transformations
+但如果你使用一些转换方法
+
+60
+00:01:54,630 --> 00:01:56,590
+to make your data more gaussian, it might work a bit better.
+使你的数据更像高斯分布的话 你的算法会工作得更好
+
+61
+00:01:58,030 --> 00:01:59,780
+So given the data set
+所以
+
+62
+00:02:00,140 --> 00:02:01,340
+that looks like this, what I
+如果给我这样的数据集
+
+63
+00:02:01,430 --> 00:02:02,810
+might do is take a
+我通常要做的是
+
+64
+00:02:03,010 --> 00:02:04,520
+log transformation of the
+进行一个求对数的转换
+
+65
+00:02:04,660 --> 00:02:05,930
+data and if i
+如果我这样做的话
+
+66
+00:02:06,060 --> 00:02:07,810
+do that and re-plot the
+重新把直方图画出来
+
+67
+00:02:08,150 --> 00:02:09,110
+histogram, what I end up
+对于这个具体的例子
+
+68
+00:02:09,330 --> 00:02:10,500
+with in this particular example,
+我就会得到
+
+69
+00:02:11,130 --> 00:02:12,400
+is a histogram that looks like this.
+像这样的一个直方图
+
+70
+00:02:12,540 --> 00:02:14,470
+And this looks much more Gaussian, right?
+这样就看起来更像高斯分布了 对吧?
+
+71
+00:02:14,650 --> 00:02:15,720
+This looks much more like the classic
+这看起来就更像
+
+72
+00:02:16,690 --> 00:02:18,020
+bell shaped curve, that we
+典型的钟形曲线
+
+73
+00:02:18,710 --> 00:02:21,000
+can fit with some mean and variance paramater sigma.
+这样我就能拟合出期望和方差参数了
+
+74
+00:02:22,180 --> 00:02:22,940
+So what I mean by taking
+所以这里我说的
+
+75
+00:02:23,230 --> 00:02:24,610
+a log transform, is really that
+进行一个取对数的转换
+
+76
+00:02:24,860 --> 00:02:26,140
+if I have some feature x1 and
+意思是这样的
+
+77
+00:02:26,860 --> 00:02:28,260
+then the histogram of x1 looks
+如果我有一个特征变量 比如 x1
+
+78
+00:02:28,720 --> 00:02:30,500
+like this then I might
+直方图是这样的
+
+79
+00:02:31,070 --> 00:02:32,210
+take my feature x1
+那么我就用 x1 的对数
+
+80
+00:02:32,410 --> 00:02:33,890
+and replace it with log
+log(x1) 来替换掉 x1
+
+81
+00:02:34,800 --> 00:02:36,730
+of x1 and this is
+所以 经过替换
+
+82
+00:02:36,860 --> 00:02:37,880
+my new x1 that I'll plot
+这就是我的新 x1
+
+83
+00:02:38,170 --> 00:02:40,000
+to the histogram over on the right, and this looks much
+我把它的直方图画在右边
+
+84
+00:02:40,430 --> 00:02:42,350
+more Guassian.
+这看起来更像高斯分布了
+
+85
+00:02:44,000 --> 00:02:44,730
+Rather than just a log transform some other things you can
+除了取对数变换之外
+
+86
+00:02:44,920 --> 00:02:46,020
+do, might be, let's say
+还有别的一些方法也可以用
+
+87
+00:02:46,110 --> 00:02:47,720
+I have a different feature x2,
+假如这是另一个特征 x2
+
+88
+00:02:48,690 --> 00:02:49,840
+maybe I'll replace that will
+现在我用 log(x2 + 1) 来取代
+
+89
+00:02:50,120 --> 00:02:52,560
+log x plus 1,
+或者更一般地
+
+90
+00:02:52,630 --> 00:02:54,720
+or more generally with log
+我可以在 x2 后面加上
+
+91
+00:02:56,360 --> 00:02:57,690
+x with x2 and
+某个常数 c
+
+92
+00:02:58,430 --> 00:03:00,350
+some constant c and this
+然后求对数来取代 x2
+
+93
+00:03:00,520 --> 00:03:01,540
+constant could be something
+我会调整这个常数 c 的值
+
+94
+00:03:01,890 --> 00:03:04,390
+that I play with, to try to make it look as Gaussian as possible.
+使得这个分布看起来尽可能地像高斯分布
+
+95
+00:03:05,610 --> 00:03:06,820
+Or for a different feature x3, maybe
+或者对于另一个特征 x3
+
+96
+00:03:07,200 --> 00:03:08,610
+I'll replace it with x3,
+也许我可以用
+
+97
+00:03:09,730 --> 00:03:11,250
+I might take the square root.
+它的平方根来取代
+
+98
+00:03:11,610 --> 00:03:14,180
+The square root is just x3 to the power of one half, right?
+x3 的平方根也就是 x3 的二分之一次方 对吧?
+
+99
+00:03:15,260 --> 00:03:16,660
+And this one half
+而这个 二分之一
+
+100
+00:03:17,130 --> 00:03:19,220
+is another example of a parameter I can play with.
+又是一个可以由我来确定的参数
+
+101
+00:03:19,640 --> 00:03:21,600
+So, I might have x4 and
+所以 或许对另一个特征 x4
+
+102
+00:03:22,450 --> 00:03:23,820
+maybe I might instead replace
+我可以用 x4 的另一个幂次方
+
+103
+00:03:24,410 --> 00:03:25,370
+that with x4 to the power
+来取代 x4
+
+104
+00:03:25,730 --> 00:03:26,790
+of something else, maybe to the
+比如说可以用
+
+105
+00:03:26,890 --> 00:03:28,460
+power of 1/3.
+三分之一次幂
+
+106
+00:03:28,940 --> 00:03:30,830
+And these, all of
+所有这些
+
+107
+00:03:30,900 --> 00:03:32,320
+these, this one, this
+所有这些参数
+
+108
+00:03:32,540 --> 00:03:33,670
+exponent parameter, or the
+这个指数参数
+
+109
+00:03:33,810 --> 00:03:35,110
+C parameter, all of these
+或者这个参数 c
+
+110
+00:03:35,380 --> 00:03:36,880
+are examples of parameters that
+所有这些参数你都可以进行调整
+
+111
+00:03:36,960 --> 00:03:38,110
+you can play with in order
+目的只有一个
+
+112
+00:03:38,460 --> 00:03:40,420
+to make your data look a little bit more Gaussian.
+就是让数据看起来更像高斯分布
+
+113
+00:03:45,180 --> 00:03:46,210
+So, let me show you a live demo
+下面我给你演示一下
+
+114
+00:03:46,740 --> 00:03:48,720
+of how I actually go about
+如何对这些参数进行调整
+
+115
+00:03:49,150 --> 00:03:50,690
+playing with my data to make it look more Gaussian.
+来让我的数据看起来更像高斯分布
+
+116
+00:03:51,650 --> 00:03:52,370
+So, I have already loaded
+这里我已经在 Octave 中
+
+117
+00:03:52,750 --> 00:03:54,730
+in to octave here a set
+加载了一系列特征 x
+
+118
+00:03:54,860 --> 00:03:56,170
+of features x I have a thousand examples
+这里我加载了1000个样本
+
+119
+00:03:57,150 --> 00:03:57,870
+loaded over there.
+这里我加载了1000个样本
+
+120
+00:03:58,580 --> 00:04:00,100
+So let's pull up the histogram of my data.
+所以让我们来画出数据的直方图
+
+121
+00:04:01,560 --> 00:04:02,570
+Use the hist x command.
+使用 hist(x) 命令
+
+122
+00:04:03,190 --> 00:04:04,100
+So there's my histogram.
+这就是我的直方图了
+
+123
+00:04:05,660 --> 00:04:06,580
+By default, I think this
+默认情况下
+
+124
+00:04:06,680 --> 00:04:08,250
+uses 10 bins of histograms,
+直方图有十个柱
+
+125
+00:04:08,610 --> 00:04:10,400
+but I want to see a more fine grid histogram.
+可以重新把样条设置地更好一点
+
+126
+00:04:11,330 --> 00:04:12,950
+So we do hist to the x, 50,
+我们输入 hist(x, 50)
+
+127
+00:04:13,050 --> 00:04:14,970
+so, this plots it in 50 different bins.
+这样就画出了50个柱
+
+128
+00:04:15,310 --> 00:04:15,660
+Okay, that looks better.
+这样看起来好多了
+
+129
+00:04:16,180 --> 00:04:18,570
+Now, this doesn't look very Gaussian, does it?
+但现在看起来还不够"高斯"
+
+130
+00:04:18,930 --> 00:04:20,720
+So, lets start playing around with the data.
+所以下面我们来调整一下参数
+
+131
+00:04:20,900 --> 00:04:22,310
+Lets try a hist of
+首先试试 x 的0.5次方
+
+132
+00:04:22,610 --> 00:04:24,810
+x to the 0.5.
+也就是说
+
+133
+00:04:25,090 --> 00:04:26,590
+So we take the
+我们对数据取平方根
+
+134
+00:04:26,870 --> 00:04:28,820
+square root of the data, and plot that histogram.
+然后画出直方图
+
+135
+00:04:30,670 --> 00:04:31,680
+And, okay, it looks
+好了 现在看起来
+
+136
+00:04:31,800 --> 00:04:32,870
+a little bit more Gaussian, but not
+有那么一点像高斯分布了
+
+137
+00:04:32,960 --> 00:04:34,550
+quite there, so let's play at the 0.5 parameter.
+但还是不够好 我们再调整一下
+
+138
+00:04:34,790 --> 00:04:35,330
+Let's see.
+让我们看一看
+
+139
+00:04:36,520 --> 00:04:38,110
+Set this to 0.2.
+把0.5减小到0.2试试
+
+140
+00:04:38,280 --> 00:04:39,780
+Looks a little bit more Gaussian.
+又更像高斯分布了一点
+
+141
+00:04:40,930 --> 00:04:43,150
+Let's reduce a little bit more 0.1.
+我们再减小一点 试试0.1
+
+142
+00:04:44,450 --> 00:04:45,220
+Yeah, that looks pretty good.
+耶!好极了
+
+143
+00:04:45,500 --> 00:04:48,440
+I could actually just use 0.1.
+所以我可以使用0.1
+
+144
+00:04:48,880 --> 00:04:50,190
+Well, let's reduce it to 0.05.
+我们再试试更小的 0.05
+
+145
+00:04:50,520 --> 00:04:50,910
+And, you know?
+然后 你看
+
+146
+00:04:51,740 --> 00:04:52,750
+Okay, this looks pretty Gaussian,
+这样看起来更像高斯分布了
+
+147
+00:04:53,230 --> 00:04:54,090
+so I can define a new
+因此 我们可以定义一个
+
+148
+00:04:54,190 --> 00:04:55,510
+feature which is x mu equals
+新的特征变量 xNew
+
+149
+00:04:56,110 --> 00:04:58,940
+x to the 0.05,
+等于 x 的0.05次方
+
+150
+00:04:59,620 --> 00:05:01,380
+and now my new
+现在我的新特征变量
+
+151
+00:05:01,610 --> 00:05:03,050
+feature x Mu looks more
+xNew 比原来的特征变量
+
+152
+00:05:03,250 --> 00:05:04,490
+Gaussian than my previous one
+看起来更具像高斯分布
+
+153
+00:05:04,510 --> 00:05:05,560
+and then I might instead use
+因此我就可以用这个新的
+
+154
+00:05:05,850 --> 00:05:07,070
+this new feature to feed
+特征变量来输入到我的
+
+155
+00:05:07,380 --> 00:05:09,390
+into my anomaly detection algorithm.
+异常检测算法中
+
+156
+00:05:10,150 --> 00:05:12,100
+And of course, there is more than one way to do this.
+当然 实现这一功能的方法不唯一
+
+157
+00:05:12,410 --> 00:05:14,530
+You could also have hist of log of
+你也可以用 hist(log(x), 50)
+
+158
+00:05:14,710 --> 00:05:17,320
+x, that's another example of a transformation you can use.
+这是另一种你可以选择的转换方法
+
+159
+00:05:18,270 --> 00:05:20,410
+And, you know, that also look pretty Gaussian.
+这同样会让你的数据看起来更像高斯分布
+
+160
+00:05:20,870 --> 00:05:22,040
+So, I can also define x
+所以 我们也可以
+
+161
+00:05:22,230 --> 00:05:23,760
+mu equals log of x.
+让 xNew 等于 log(x)
+
+162
+00:05:24,220 --> 00:05:25,120
+and that would be another
+这是另一种可以选用的
+
+163
+00:05:25,300 --> 00:05:26,890
+pretty good choice of a feature to use.
+很好的特征变量
+
+164
+00:05:28,040 --> 00:05:29,400
+So to summarize, if you
+我们来总结一下
+
+165
+00:05:29,520 --> 00:05:30,580
+plot a histogram with the data,
+如果你画出数据的直方图
+
+166
+00:05:31,000 --> 00:05:31,690
+and find that it looks pretty
+并且发现图形看起来
+
+167
+00:05:31,940 --> 00:05:33,460
+non-Gaussian, it's worth playing
+非常不像正态分布
+
+168
+00:05:33,740 --> 00:05:35,110
+around a little bit with
+那么应该进行一些
+
+169
+00:05:35,280 --> 00:05:37,120
+different transformations like these, to
+不同的转换 就像这些
+
+170
+00:05:37,290 --> 00:05:38,190
+see if you can make
+通过这些方法
+
+171
+00:05:38,300 --> 00:05:39,410
+your data look a little bit more
+来让你的数据看起来
+
+172
+00:05:39,570 --> 00:05:40,520
+Gaussian, before you feed it to
+更具有高斯分布的特点
+
+173
+00:05:40,770 --> 00:05:41,970
+your learning algorithm, although even if
+然后你再把数据输入到学习算法
+
+174
+00:05:42,050 --> 00:05:43,550
+you don't, it might work okay.
+虽然说 你不这么做也可以
+
+175
+00:05:43,850 --> 00:05:45,070
+But I usually do take this step.
+但我通常还是会进行这一步
+
+176
+00:05:45,850 --> 00:05:46,880
+Now, the second thing I want
+下面我想讲第二个问题
+
+177
+00:05:46,970 --> 00:05:48,280
+to talk about is, how do
+那就是你如何得到
+
+178
+00:05:48,400 --> 00:05:51,540
+you come up with features for an anomaly detection algorithm.
+异常检测算法的特征变量
+
+179
+00:05:52,650 --> 00:05:53,780
+And the way I often do
+我通常用的办法是
+
+180
+00:05:53,990 --> 00:05:56,490
+so, is via an error analysis procedure.
+通过一个误差分析步骤
+
+181
+00:05:57,630 --> 00:05:58,590
+So what I mean by that,
+我的意思是
+
+182
+00:05:58,970 --> 00:05:59,960
+is that this is really similar
+这跟我们之前
+
+183
+00:06:00,320 --> 00:06:02,320
+to the error analysis procedure that
+学习监督学习算法时的
+
+184
+00:06:02,450 --> 00:06:04,600
+we have for supervised learning, where
+误差分析步骤是类似的
+
+185
+00:06:04,860 --> 00:06:06,810
+we would train a
+也就是说 我们先完整地训练出
+
+186
+00:06:06,860 --> 00:06:08,220
+complete algorithm, and run the
+一个学习算法
+
+187
+00:06:08,350 --> 00:06:09,980
+algorithm on a cross validation set,
+然后在一组交叉验证集上运行算法
+
+188
+00:06:10,840 --> 00:06:11,870
+and look at the examples it gets
+然后找出那些预测出错的样本
+
+189
+00:06:12,230 --> 00:06:13,500
+wrong, and see if
+然后再看看
+
+190
+00:06:13,580 --> 00:06:14,800
+we can come up with extra features
+我们能否找到一些其他的特征变量
+
+191
+00:06:15,370 --> 00:06:16,440
+to help the algorithm do
+来帮助学习算法
+
+192
+00:06:16,580 --> 00:06:17,870
+better on the examples
+让它在那些交叉验证时
+
+193
+00:06:18,280 --> 00:06:19,850
+that it got wrong in the cross-validation set.
+判断出错的样本中表现更好
+
+194
+00:06:21,060 --> 00:06:23,380
+So lets try
+让我们来用一个例子
+
+195
+00:06:24,040 --> 00:06:25,960
+to reason through an example of this process.
+详细解释一下刚才说的这一过程
+
+196
+00:06:26,950 --> 00:06:28,680
+In anomaly detection, we are
+在异常检测中
+
+197
+00:06:28,880 --> 00:06:29,690
+hoping that p of x will
+我们希望 p(x) 的值
+
+198
+00:06:29,840 --> 00:06:30,910
+be large for the normal examples
+对正常样本来说是比较大的
+
+199
+00:06:31,760 --> 00:06:33,180
+and it will be small for the anomalous examples.
+而对异常样本来说 值是很小的
+
+200
+00:06:34,400 --> 00:06:35,370
+And so a pretty common problem
+因此 一个很常见的问题是
+
+201
+00:06:35,950 --> 00:06:37,780
+would be if p of x is comparable,
+p(x) 是具有可比性的
+
+202
+00:06:38,480 --> 00:06:41,540
+maybe both are large for both the normal and the anomalous examples.
+也许正常样本和异常样本的值都很大
+
+203
+00:06:42,940 --> 00:06:44,380
+Lets look at a specific example of that.
+我们来看一个具体点的例子
+
+204
+00:06:45,150 --> 00:06:46,760
+Let's say that this is my unlabeled data.
+假如说这是我的无标签数据
+
+205
+00:06:47,120 --> 00:06:47,970
+So, here I have just one
+那么 我只有一个特征变量 x1
+
+206
+00:06:48,210 --> 00:06:51,130
+feature, x1 and so I'm gonna fit a Gaussian to this.
+我要用一个高斯分布来拟合它
+
+207
+00:06:52,160 --> 00:06:55,990
+And maybe my Gaussian that I fit to my data looks like that.
+假如我的数据拟合出的高斯分布是这样的
+
+208
+00:06:57,300 --> 00:06:59,130
+And now let's say I have an anomalous example,
+现在假如我有一个异常样本
+
+209
+00:06:59,670 --> 00:07:00,480
+and let's say that my anomalous example
+假如我的异常样本中
+
+210
+00:07:01,080 --> 00:07:02,850
+takes on an x value of 2.5.
+x 的取值为2.5
+
+211
+00:07:03,020 --> 00:07:06,420
+So I plot my anomalous example there.
+因此 我画出我的异常样本
+
+212
+00:07:07,200 --> 00:07:08,120
+And you know, it's kind of buried
+你不难发现
+
+213
+00:07:08,650 --> 00:07:09,730
+in the middle of a bunch
+它看起来就像被淹没在
+
+214
+00:07:09,880 --> 00:07:11,690
+of normal examples, and so,
+一堆正常样本中似的
+
+215
+00:07:13,450 --> 00:07:14,850
+just this anomalous example
+我用绿色画出来的
+
+216
+00:07:15,460 --> 00:07:16,780
+that I've drawn in green, it gets a
+这个异常样本
+
+217
+00:07:16,820 --> 00:07:18,550
+pretty high probability, where it's the
+它的概率值很大
+
+218
+00:07:18,730 --> 00:07:20,000
+height of the blue curve,
+是蓝色曲线的高度
+
+219
+00:07:20,960 --> 00:07:22,280
+and the algorithm fails to
+而我们的算法
+
+220
+00:07:22,390 --> 00:07:23,840
+flag this as an anomalous example.
+没能把这个样本判断为异常
+
+221
+00:07:25,320 --> 00:07:26,600
+Now, if this were maybe aircraft
+现在如果说这代表
+
+222
+00:07:27,000 --> 00:07:29,540
+engine manufacturing or something, what
+飞机引擎的制造或者别的什么
+
+223
+00:07:29,680 --> 00:07:30,490
+I would do is, I would actually
+那么我会做的是
+
+224
+00:07:30,860 --> 00:07:32,370
+look at my training examples and
+我会看看我的训练样本
+
+225
+00:07:32,840 --> 00:07:34,500
+look at what went wrong with
+然后看看到底是
+
+226
+00:07:34,730 --> 00:07:36,920
+that particular aircraft engine, and
+哪一个具体的飞机引擎出错了
+
+227
+00:07:37,030 --> 00:07:38,360
+see, if looking at that
+看看通过这个样本
+
+228
+00:07:38,720 --> 00:07:40,720
+example can inspire me to
+能不能启发我
+
+229
+00:07:40,860 --> 00:07:41,800
+come up with a new feature
+想出一个新的特征 x2
+
+230
+00:07:42,290 --> 00:07:43,890
+x2, that helps to distinguish
+来帮助算法区别出
+
+231
+00:07:44,650 --> 00:07:46,530
+between this bad example, compared
+不好的样本
+
+232
+00:07:46,900 --> 00:07:47,850
+to the rest of my
+和我剩下的正确的样本
+
+233
+00:07:48,530 --> 00:07:49,850
+red examples, compared to all
+也就是那些红色的叉叉
+
+234
+00:07:50,980 --> 00:07:51,600
+of my normal aircraft engines.
+或者说正常的飞机引擎样本
+
+235
+00:07:52,790 --> 00:07:53,840
+And if I managed to do
+如果我这样做的话
+
+236
+00:07:54,000 --> 00:07:54,910
+so, the hope would be then,
+我们的期望是
+
+237
+00:07:55,150 --> 00:07:56,540
+that, if I can create a
+创建一个新的特征
+
+238
+00:07:56,610 --> 00:07:59,360
+new feature, X2, so that
+x2 使得
+
+239
+00:07:59,610 --> 00:08:01,490
+when I re-plot my data, if
+当我重新画数据时
+
+240
+00:08:01,580 --> 00:08:02,530
+I take all my normal examples
+如果我用训练集中的
+
+241
+00:08:02,770 --> 00:08:04,420
+of my training set, hopefully
+所有正常样本
+
+242
+00:08:04,750 --> 00:08:05,560
+I find that all my training
+我应该就会发现
+
+243
+00:08:05,710 --> 00:08:07,380
+examples are these red crosses here.
+所有的训练样本都是这里的红叉了
+
+244
+00:08:08,210 --> 00:08:09,580
+And hopefully, if I find
+我们也希望能看到
+
+245
+00:08:09,860 --> 00:08:11,390
+that for my anomalous example, the
+对于异常样本
+
+246
+00:08:11,480 --> 00:08:13,490
+feature x2 takes on the the unusual value.
+这个新特征变量 x2 的值会看起来是异常的
+
+247
+00:08:14,470 --> 00:08:15,820
+So for my green example
+因此对于我这里的绿色的样本
+
+248
+00:08:16,290 --> 00:08:18,670
+here, this anomaly, right, my
+这是异常的样本 对吧
+
+249
+00:08:18,940 --> 00:08:20,800
+X1 value, is still 2.5.
+我的 x1 值仍然是2.5
+
+250
+00:08:21,260 --> 00:08:22,900
+Then maybe my X2 value, hopefully
+那么我的 x2 很有可能
+
+251
+00:08:23,290 --> 00:08:24,530
+it takes on a very large
+是一个比较大的值
+
+252
+00:08:24,840 --> 00:08:26,710
+value like 3.5 over there,
+比如这里的3.5
+
+253
+00:08:27,940 --> 00:08:28,450
+or a very small value.
+或者一个非常小的值
+
+254
+00:08:29,450 --> 00:08:30,530
+But now, if I model
+现在如果我再来给数据建模
+
+255
+00:08:30,970 --> 00:08:32,480
+my data, I'll find that
+我会发现
+
+256
+00:08:33,050 --> 00:08:34,660
+my anomaly detection algorithm gives
+我的异常检测算法
+
+257
+00:08:35,240 --> 00:08:36,830
+high probability to data
+会在中间区域
+
+258
+00:08:37,190 --> 00:08:39,160
+in the central regions, slightly lower
+给出一个较高的概率
+
+259
+00:08:39,200 --> 00:08:42,470
+probability to that, sightly lower probability to that.
+然后越到外层越小
+
+260
+00:08:42,660 --> 00:08:43,960
+An example that's all the
+到了那个绿色的样本
+
+261
+00:08:44,070 --> 00:08:45,450
+way out there, my algorithm will
+我的异常检测算法
+
+262
+00:08:45,630 --> 00:08:46,720
+now give very low probability
+会给出非常小的概率值
+
+263
+00:08:48,360 --> 00:08:48,360
+会给出非常小的概率值
+
+264
+00:08:48,360 --> 00:08:48,360
+to.
+
+265
+00:08:48,510 --> 00:08:49,170
+And so, the process of this
+所以这个过程
+
+266
+00:08:49,230 --> 00:08:50,320
+is, really look at the
+实际上就是
+
+267
+00:08:51,430 --> 00:08:52,570
+mistakes that it is making.
+看看哪里出了错
+
+268
+00:08:52,830 --> 00:08:54,370
+Look at the anomaly that the algorithm
+看看那些
+
+269
+00:08:54,580 --> 00:08:56,020
+is failing to flag, and see
+算法没能正确标记的异常点
+
+270
+00:08:56,320 --> 00:08:59,100
+if that inspires you to create some new feature.
+看看你能不能得到启发来创造新的特征变量
+
+271
+00:08:59,590 --> 00:09:01,180
+So find something unusual about
+所以也就是说
+
+272
+00:09:01,470 --> 00:09:02,590
+that aircraft engine and use
+找一找飞机引擎中的不寻常的问题
+
+273
+00:09:02,800 --> 00:09:03,640
+that to create a new feature,
+然后来建立一些新特征变量
+
+274
+00:09:04,530 --> 00:09:05,780
+so that with this new
+有了这些新的特征变量
+
+275
+00:09:05,900 --> 00:09:07,140
+feature it becomes easier to
+应该就能更容易
+
+276
+00:09:07,400 --> 00:09:09,250
+distinguish the anomalies from your good examples.
+从正常样本中区别出异常来
+
+277
+00:09:09,880 --> 00:09:11,170
+And so that's the
+这就是误差分析的过程
+
+278
+00:09:11,280 --> 00:09:12,600
+process of error analysis
+这就是误差分析的过程
+
+279
+00:09:14,020 --> 00:09:15,360
+and using that to create
+以及如何为异常检查算法
+
+280
+00:09:15,750 --> 00:09:17,100
+new features for anomaly detection.
+建立新的特征变量
+
+281
+00:09:17,770 --> 00:09:18,980
+Finally, let me share with
+最后 我想与你分享一些
+
+282
+00:09:19,090 --> 00:09:20,440
+you my thinking on how I
+我平时在为异常检查算法
+
+283
+00:09:20,630 --> 00:09:23,190
+usually go about choosing features for anomaly detection.
+选择特征变量时的一些思考
+
+284
+00:09:24,350 --> 00:09:27,700
+So, usually, the way I think about choosing features is
+通常来说 我想到的选择特征变量的方法是
+
+285
+00:09:27,960 --> 00:09:29,160
+I want to choose features that will
+我会选那些取值
+
+286
+00:09:29,270 --> 00:09:30,610
+take on either very, very
+既不会特别特别大
+
+287
+00:09:30,860 --> 00:09:32,000
+large values, or very, very
+也不会特别特别小的
+
+288
+00:09:32,110 --> 00:09:33,890
+small values, for examples
+那些特征变量
+
+289
+00:09:34,750 --> 00:09:36,420
+that I think might turn out to be anomalies.
+比如说
+
+290
+00:09:37,850 --> 00:09:38,710
+So let's use our example
+我们还是用这个数据中心中
+
+291
+00:09:39,060 --> 00:09:41,820
+again of monitoring the computers in a data center.
+监控计算机的例子
+
+292
+00:09:42,250 --> 00:09:43,560
+And so you have lots of
+比如 在一个数据中心
+
+293
+00:09:43,630 --> 00:09:44,930
+machines, maybe thousands, or tens
+你有很多台电脑
+
+294
+00:09:45,170 --> 00:09:47,830
+of thousands of machines in a data center.
+也许上千 或者上万台
+
+295
+00:09:48,310 --> 00:09:49,410
+And we want to know if one
+我们想要知道的是
+
+296
+00:09:49,580 --> 00:09:50,640
+of the machines, one of our
+是不是有哪一台机器
+
+297
+00:09:50,710 --> 00:09:53,320
+computers is acting up, so doing something strange.
+运作不正常了
+
+298
+00:09:54,180 --> 00:09:56,050
+So here are examples of features you may choose,
+这里给出了几种可选的特征变量
+
+299
+00:09:57,020 --> 00:09:59,630
+maybe memory used, number of disc accesses, CPU load, network traffic.
+包括占用内存 磁盘每秒访问次数 CPU负载 网络流量
+
+300
+00:10:01,040 --> 00:10:01,960
+But now, lets say that I
+现在假如说
+
+301
+00:10:02,220 --> 00:10:03,040
+suspect one of the failure
+我怀疑某个出错的情况
+
+302
+00:10:03,470 --> 00:10:04,580
+cases, let's say that
+假如说 我认为
+
+303
+00:10:05,230 --> 00:10:06,970
+in my data set I think
+在我的数据中
+
+304
+00:10:07,150 --> 00:10:08,460
+that CPU load the network traffic
+我的CPU负载和网络流量
+
+305
+00:10:08,990 --> 00:10:10,820
+tend to grow linearly with each other.
+应该互为线性关系
+
+306
+00:10:11,110 --> 00:10:12,120
+Maybe I'm running a bunch of
+可能我运行了一组
+
+307
+00:10:12,220 --> 00:10:13,370
+web servers, and so, here
+网络服务器
+
+308
+00:10:13,750 --> 00:10:15,050
+if one of my servers is
+如果其中一个服务器
+
+309
+00:10:15,310 --> 00:10:16,530
+serving a lot of users,
+在对许多用户服务
+
+310
+00:10:16,850 --> 00:10:19,050
+I have a very high CPU load, and have a very high network traffic.
+那么我的CPU负载和网络流量都很大
+
+311
+00:10:20,230 --> 00:10:21,360
+But let's say, I think,
+现在假如说
+
+312
+00:10:21,840 --> 00:10:23,280
+let's say I have a suspicion, that
+我怀疑其中一个出错的情形
+
+313
+00:10:23,390 --> 00:10:24,890
+one of the failure cases is
+是我的计算机在执行一个任务时
+
+314
+00:10:25,180 --> 00:10:26,240
+if one of my computers
+进入了一个死循环
+
+315
+00:10:26,530 --> 00:10:29,590
+has a job that gets stuck in some infinite loop.
+因此被卡住了
+
+316
+00:10:29,670 --> 00:10:30,750
+So if I think one of
+意思就是说
+
+317
+00:10:30,800 --> 00:10:32,240
+the failure cases, is one of
+假如我感觉
+
+318
+00:10:32,420 --> 00:10:33,470
+my machines, one of my
+我的其中一台机器
+
+319
+00:10:34,380 --> 00:10:36,020
+web servers--server code--
+或者说其中一台服务器的代码
+
+320
+00:10:36,680 --> 00:10:37,990
+gets stuck in some infinite loop,
+执行到一个死循环卡住了
+
+321
+00:10:38,230 --> 00:10:39,550
+and so the CPU load grows,
+因此CPU负载升高
+
+322
+00:10:40,380 --> 00:10:41,490
+but the network traffic doesn't because
+但网络流量没有升高
+
+323
+00:10:41,560 --> 00:10:42,790
+it's just spinning it's
+因为只是CPU执行了
+
+324
+00:10:42,940 --> 00:10:44,570
+wheels and doing a lot of CPU work, you know,
+较多的工作 所以负载较大
+
+325
+00:10:44,870 --> 00:10:46,000
+stuck in some infinite loop.
+卡在了死循环里
+
+326
+00:10:46,930 --> 00:10:47,850
+In that case, to detect
+在这种情况下
+
+327
+00:10:48,240 --> 00:10:49,610
+that type of anomaly, I might
+要检测出异常
+
+328
+00:10:49,780 --> 00:10:52,440
+create a new feature, X5,
+我可以新建一个特征 x5
+
+329
+00:10:53,170 --> 00:10:55,130
+which might be CPU load
+x5 等于 CPU负载
+
+330
+00:10:56,600 --> 00:11:00,120
+divided by network traffic.
+除以网络流量
+
+331
+00:11:01,230 --> 00:11:02,810
+And so here X5 will take
+因此 x5 的值
+
+332
+00:11:03,180 --> 00:11:04,860
+on a unusually large value
+将会变得不寻常地大
+
+333
+00:11:05,700 --> 00:11:06,410
+if one of the machines has a
+如果某一台机器
+
+334
+00:11:06,790 --> 00:11:08,190
+very large CPU load but
+具有较大的CPU负载
+
+335
+00:11:08,470 --> 00:11:09,980
+not that much network traffic and
+但网络流量正常的话
+
+336
+00:11:10,250 --> 00:11:11,030
+so this will be a
+因此 这将成为一个
+
+337
+00:11:11,160 --> 00:11:12,390
+feature that will help your
+很好的特征 能帮助你
+
+338
+00:11:12,490 --> 00:11:14,180
+anomaly detection capture, a certain type of anomaly.
+检测出某种类型的异常情况
+
+339
+00:11:15,000 --> 00:11:16,700
+And you can
+当然 你还可以
+
+340
+00:11:16,840 --> 00:11:19,060
+also get creative and come up with other features as well.
+用同样的方法得到更多其他的特征
+
+341
+00:11:19,230 --> 00:11:20,090
+Like maybe I have a feature
+比如说我可以
+
+342
+00:11:20,570 --> 00:11:22,050
+x6 thats CPU load
+建立一个特征 x6
+
+343
+00:11:22,880 --> 00:11:25,540
+squared divided by network traffic.
+等于 CPU负载的平方除以网络流量
+
+344
+00:11:27,030 --> 00:11:28,280
+And this would be another variant
+这就像是特征 x5 的一个变体
+
+345
+00:11:28,950 --> 00:11:29,910
+of a feature like x5 to try
+实际上它捕捉的异常
+
+346
+00:11:30,020 --> 00:11:32,120
+to capture anomalies where one
+仍然是你的机器
+
+347
+00:11:32,280 --> 00:11:33,650
+of your machines has a very
+是否具有一个比较高的
+
+348
+00:11:33,800 --> 00:11:35,030
+high CPU load, that maybe
+CPU 负载 但没有一个
+
+349
+00:11:35,290 --> 00:11:37,100
+doesn't have a commensurately large network traffic.
+同样很大的网络流量
+
+350
+00:11:38,540 --> 00:11:40,080
+And by creating features like
+通过这样的方法
+
+351
+00:11:40,290 --> 00:11:41,560
+these, you can start to capture
+建立新的特征变量
+
+352
+00:11:42,770 --> 00:11:44,550
+anomalies that correspond to
+你就可以通过不同特征变量的组合
+
+353
+00:11:45,690 --> 00:11:48,270
+unusual combinations of values of the features.
+捕捉到对应的不寻常现象
+
+354
+00:11:50,990 --> 00:11:52,090
+So in this video we
+在这段视频中
+
+355
+00:11:52,260 --> 00:11:53,550
+talked about how to and
+我们介绍了如何选择特征
+
+356
+00:11:53,690 --> 00:11:54,670
+take a feature, and maybe transform
+以及对特征进行一些
+
+357
+00:11:55,120 --> 00:11:56,680
+it a little bit, so that
+小小的转换
+
+358
+00:11:56,830 --> 00:11:57,910
+it becomes a bit more Gaussian,
+让数据更像正态分布
+
+359
+00:11:58,260 --> 00:12:00,480
+before feeding into an anomaly detection algorithm.
+然后再把数据输入异常检测算法
+
+360
+00:12:00,950 --> 00:12:02,110
+And also the error analysis
+同时也介绍了建立特征时
+
+361
+00:12:02,740 --> 00:12:04,220
+in this process of creating features
+进行的误差分析方法
+
+362
+00:12:04,870 --> 00:12:06,710
+to try to capture different types of anomalies.
+来捕捉各种异常的可能
+
+363
+00:12:07,550 --> 00:12:10,300
+And with these sorts of guidelines hopefully that will help you
+希望你通过这些方法
+
+364
+00:12:10,850 --> 00:12:12,180
+to choose good features, to give to
+能够了解如何选择好的特征变量
+
+365
+00:12:12,460 --> 00:12:14,310
+your anomaly detection algorithm, to
+从而帮助你的异常检测算法
+
+366
+00:12:14,430 --> 00:12:15,920
+help it capture all sorts of anomalies.
+捕捉到各种不同的异常情况 【教育无边界字幕组】翻译:所罗门捷列夫 校对:竹二个 审核:小白_远游
+
diff --git a/srt/15 - 7 - Multivariate Gaussian Distribution (Optional) (14 min).srt b/srt/15 - 7 - Multivariate Gaussian Distribution (Optional) (14 min).srt
new file mode 100644
index 00000000..a2321da8
--- /dev/null
+++ b/srt/15 - 7 - Multivariate Gaussian Distribution (Optional) (14 min).srt
@@ -0,0 +1,2046 @@
+1
+00:00:00,500 --> 00:00:01,550
+In this and the next video,
+在这节和下节视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,040 --> 00:00:03,470
+I'd like to tell you about one
+我想给你介绍
+
+3
+00:00:03,760 --> 00:00:05,880
+possible extension to the
+我们目前为止学习的异常检测算法的
+
+4
+00:00:06,140 --> 00:00:08,270
+anomaly detection algorithm that we've developed so far.
+一种可能的延伸
+
+5
+00:00:09,020 --> 00:00:11,970
+This extension uses something called the multivariate
+这个延伸使用到多元高斯分布 (multivariate Gaussian distribution)
+
+6
+00:00:12,100 --> 00:00:13,480
+Gaussian distribution, and it
+这个延伸使用到多元高斯分布 (multivariate Gaussian distribution)
+
+7
+00:00:13,770 --> 00:00:14,970
+has some advantages, and some
+它有一些优势
+
+8
+00:00:15,160 --> 00:00:16,790
+disadvantages, and it can
+也有一些劣势
+
+9
+00:00:17,070 --> 00:00:20,610
+sometimes catch some anomalies that the earlier algorithm didn't.
+它能捕捉到一些之前的算法检测不出来的异常
+
+10
+00:00:21,740 --> 00:00:23,730
+To motivate this, let's start with an example.
+为了理解这个算法 我们先来看看一个例子
+
+11
+00:00:25,620 --> 00:00:28,410
+Let's say that so our unlabeled data looks like what I have plotted here.
+假设我们的没有标签的数据看起来像这张图一样
+
+12
+00:00:29,060 --> 00:00:30,190
+And I'm going to use
+我要使用数据中心的监控机的例子
+
+13
+00:00:30,340 --> 00:00:32,320
+the example of monitoring machines
+我要使用数据中心的监控机的例子
+
+14
+00:00:32,890 --> 00:00:34,890
+in the data center, monitoring computers in the data center.
+就是在数据中心监控计算机的例子
+
+15
+00:00:35,290 --> 00:00:36,170
+So my two features are x1
+所以我的两个特征变量
+
+16
+00:00:36,220 --> 00:00:37,070
+which is the CPU load and x2
+x1 是 CPU 的负载和
+
+17
+00:00:37,250 --> 00:00:39,280
+which is maybe the memory use.
+x2 可能是内存使用量
+
+18
+00:00:41,160 --> 00:00:42,160
+So if I take
+所以如果我
+
+19
+00:00:42,340 --> 00:00:43,330
+my two features, x1 and x2,
+把这两个特征变量 x1 和 x2
+
+20
+00:00:43,580 --> 00:00:45,960
+and I model them as Gaussians then
+当做高斯分布来建模
+
+21
+00:00:46,200 --> 00:00:47,430
+here's a plot of
+这个是特征变量
+
+22
+00:00:47,610 --> 00:00:49,040
+my X1 features, here's a
+x1 绘制的图
+
+23
+00:00:49,210 --> 00:00:50,370
+plot of my X2 features,
+这个是特征变量 x2 的图
+
+24
+00:00:50,980 --> 00:00:51,880
+and so if I fit a
+如果我能找到一个
+
+25
+00:00:51,910 --> 00:00:52,640
+Gaussian to that, maybe I'll
+它所符合的高斯分布
+
+26
+00:00:52,760 --> 00:00:56,050
+get a Gaussian like this, so
+我得到的高斯分布可能是这样的
+
+27
+00:00:56,730 --> 00:00:57,750
+here's P of X 1,
+所以这是 p(x1;
+
+28
+00:00:57,860 --> 00:01:00,350
+which depends
+它的参数 μ1
+
+29
+00:01:00,690 --> 00:01:02,130
+on the parameters mu 1, and
+它的参数 μ1
+
+30
+00:01:02,440 --> 00:01:04,740
+sigma squared 1,
+和 σ1 的平方 )
+
+31
+00:01:04,880 --> 00:01:06,120
+and here's my memory used, and,
+然后这是内存使用量
+
+32
+00:01:06,240 --> 00:01:07,020
+you know, maybe I'll get a Gaussian
+可能我会得到这样的一个高斯分布
+
+33
+00:01:07,560 --> 00:01:09,910
+that looks like this, and this is my P of X 2,
+这是 p(x2;
+
+34
+00:01:10,760 --> 00:01:12,500
+which depends on mu 2 and sigma squared 2.
+参数 μ2 和 σ2 的平方 )
+
+35
+00:01:12,590 --> 00:01:14,660
+And so this is
+这就是
+
+36
+00:01:14,870 --> 00:01:16,340
+how the anomaly detection algorithm
+异常检测算法给 x1 和 x2 建模的方法
+
+37
+00:01:16,790 --> 00:01:17,850
+models X1 and X2.
+异常检测算法给 x1 和 x2 建模的方法
+
+38
+00:01:19,900 --> 00:01:21,160
+Now let's say that in the
+现在假如说
+
+39
+00:01:21,260 --> 00:01:22,330
+test sets I have an
+在测试集中
+
+40
+00:01:22,410 --> 00:01:24,010
+example that looks like this.
+有一个这样的样本
+
+41
+00:01:25,540 --> 00:01:26,600
+The location of that green
+在这个绿色叉的位置
+
+42
+00:01:27,310 --> 00:01:29,160
+cross, so the value of
+它的 x1 的值是 0.4 左右
+
+43
+00:01:29,360 --> 00:01:31,220
+X 1 is about 0.4, and the value of X 2 is about 1.5.
+x2 的值是 1.5 左右
+
+44
+00:01:31,300 --> 00:01:34,430
+Now, if you look at
+现在 如果你看这些数据
+
+45
+00:01:34,660 --> 00:01:35,780
+the data, it looks like,
+看起来它们大部分
+
+46
+00:01:35,960 --> 00:01:36,780
+yeah, most of the data data
+看起来它们大部分
+
+47
+00:01:37,140 --> 00:01:38,800
+lies in this region, and
+都在这个范围内
+
+48
+00:01:38,940 --> 00:01:40,400
+so that green cross
+所以这个绿色叉
+
+49
+00:01:41,110 --> 00:01:43,510
+is pretty far away from any of the data I've seen.
+离这里看到的任何数据都很远
+
+50
+00:01:43,840 --> 00:01:44,870
+It looks like that should be raised
+看起来它应该被当做
+
+51
+00:01:45,210 --> 00:01:46,790
+as an anomaly. So, in my
+一个异常数据
+
+52
+00:01:46,970 --> 00:01:48,660
+data, in my, in the
+所以我的
+
+53
+00:01:48,790 --> 00:01:49,930
+data of my good examples,
+好的样本的数据
+
+54
+00:01:50,320 --> 00:01:51,430
+it looks like, you know, the
+看起来
+
+55
+00:01:51,510 --> 00:01:52,680
+CPU load, and the
+CPU 负载和内存使用量
+
+56
+00:01:52,770 --> 00:01:54,330
+memory use, they sort
+CPU 负载和内存使用量
+
+57
+00:01:54,680 --> 00:01:56,100
+of grow linearly with each other.
+是彼此线性增长的关系
+
+58
+00:01:56,560 --> 00:01:57,720
+So if I have a
+所以 如果我有一台机器
+
+59
+00:01:57,940 --> 00:01:59,000
+machine using lots of CPU,
+CPU 使用量很高
+
+60
+00:01:59,150 --> 00:02:00,460
+you know memory use
+那么你就知道
+
+61
+00:02:00,830 --> 00:02:02,930
+will also be high, whereas this
+内存使用量也会很高
+
+62
+00:02:03,320 --> 00:02:05,910
+example, this green example it looks like
+但是这个绿色样本
+
+63
+00:02:06,040 --> 00:02:07,140
+here, the CPU load is
+看起来 CPU 负载很低
+
+64
+00:02:07,280 --> 00:02:08,280
+very low, but the memory use
+但是内存使用量很高
+
+65
+00:02:08,490 --> 00:02:09,310
+is very high, and I just
+我以前从没在训练集中见过这样的
+
+66
+00:02:09,430 --> 00:02:10,820
+have not seen that before in my training set.
+我以前从没在训练集中见过这样的
+
+67
+00:02:10,980 --> 00:02:12,150
+It looks like that should be an anomaly.
+看起来它应该是异常的
+
+68
+00:02:13,190 --> 00:02:15,300
+But let's see what the anomaly detection algorithm will do.
+但是我们来看一下异常检测算法会怎么做
+
+69
+00:02:15,570 --> 00:02:16,750
+Well, for the CPU load, it
+对于 CPU 负载
+
+70
+00:02:16,850 --> 00:02:17,990
+puts it at around there
+这个绿色叉差不多在 0.5 这里
+
+71
+00:02:18,280 --> 00:02:20,700
+0.5 and this reasonably high
+有相当高的可能性
+
+72
+00:02:20,900 --> 00:02:21,910
+probability is not that
+它离看到的其它样本不远
+
+73
+00:02:22,120 --> 00:02:23,350
+far from other examples we've seen,
+它离看到的其它样本不远
+
+74
+00:02:23,650 --> 00:02:25,230
+maybe, whereas, for the
+相对的 对于内存使用量
+
+75
+00:02:26,160 --> 00:02:28,320
+memory use, this appointment, 0.5,
+这个点是 0.5
+
+76
+00:02:29,030 --> 00:02:29,900
+whereas for the memory
+而对于内存使用量
+
+77
+00:02:30,030 --> 00:02:32,340
+use, it's about 1.5, which is there. Again,
+它是差不多 1.5 在那里
+
+78
+00:02:32,680 --> 00:02:34,600
+you know, it's all to
+它在这个高斯分布的尾部
+
+79
+00:02:34,730 --> 00:02:35,850
+us, it's not terribly Gaussian, but
+它在这个高斯分布的尾部
+
+80
+00:02:35,980 --> 00:02:37,310
+the value here and the value
+但是这里的值和这里的值
+
+81
+00:02:37,550 --> 00:02:38,830
+here is not that different
+与看到的其他那些样本
+
+82
+00:02:39,210 --> 00:02:41,180
+from many other examples we've
+没有太大差别
+
+83
+00:02:41,430 --> 00:02:43,020
+seen, and so P of
+所以 p(x1) 会很高
+
+84
+00:02:43,210 --> 00:02:44,530
+X 1, will be pretty high,
+所以 p(x1) 会很高
+
+85
+00:02:45,550 --> 00:02:46,030
+reasonably high.
+会比较高
+
+86
+00:02:46,290 --> 00:02:47,730
+P of X 2 reasonably high.
+p(x2) 也会比较高
+
+87
+00:02:47,980 --> 00:02:49,030
+I mean, if you look at this
+我的意思是 如果你看这幅图
+
+88
+00:02:49,910 --> 00:02:51,230
+plot right, this point here,
+这里这个点
+
+89
+00:02:51,410 --> 00:02:52,530
+it doesn't look that bad, and
+看起来它并没那么差
+
+90
+00:02:52,830 --> 00:02:54,440
+if you look at this plot, you
+然后如果你看这幅图
+
+91
+00:02:54,720 --> 00:02:56,690
+know across here, doesn't look that bad.
+这个叉 看起来也不那么差
+
+92
+00:02:57,050 --> 00:02:58,780
+I mean, I have had examples with
+我的意思是 有的样本
+
+93
+00:02:58,980 --> 00:03:00,730
+even greater memory used, or
+内存使用量更高
+
+94
+00:03:01,030 --> 00:03:02,270
+with even less CPU use,
+或者 CPU 使用量更低
+
+95
+00:03:02,860 --> 00:03:04,780
+and so this example doesn't look that anomalous.
+所以这个点看起来不是很异常
+
+96
+00:03:05,940 --> 00:03:07,380
+And so, an anomaly detection algorithm
+所以 一个异常检测算法
+
+97
+00:03:07,680 --> 00:03:10,090
+will fail to flag this point as an anomaly.
+不会将这个点标记为异常
+
+98
+00:03:10,550 --> 00:03:12,220
+And it turns out what
+可以看出来
+
+99
+00:03:12,360 --> 00:03:13,610
+our anomaly detection algorithm is
+我们的异常检测算法
+
+100
+00:03:13,880 --> 00:03:15,070
+doing is that it is
+不能察觉到
+
+101
+00:03:15,200 --> 00:03:16,700
+not realizing that this blue
+不能察觉到
+
+102
+00:03:16,900 --> 00:03:18,060
+ellipse shows the high
+这个蓝色椭圆所表示的好样本概率高的范围
+
+103
+00:03:18,210 --> 00:03:19,380
+probability region, is that, one
+这个蓝色椭圆所表示的好样本概率高的范围
+
+104
+00:03:19,490 --> 00:03:21,290
+of the thing is that, examples here,
+它所做的是
+
+105
+00:03:21,720 --> 00:03:23,430
+a high probability, and the
+这部分样本是高概率的
+
+106
+00:03:23,680 --> 00:03:24,980
+examples, the next circle
+外面一些的圈里面的样本
+
+107
+00:03:26,170 --> 00:03:27,280
+of from a lower probably, and
+是好样本的概率低一些
+
+108
+00:03:27,370 --> 00:03:28,950
+examples here are even
+而这里的样本概率更低
+
+109
+00:03:29,220 --> 00:03:31,040
+lower probability, and somehow, here
+然后事情就变成了
+
+110
+00:03:31,150 --> 00:03:32,070
+are things that are, green cross
+那里的绿色叉
+
+111
+00:03:32,420 --> 00:03:33,430
+there, it's pretty high probability,
+是好样本的概率挺高
+
+112
+00:03:34,490 --> 00:03:35,510
+and in particular, it tends to think
+具体来说
+
+113
+00:03:35,990 --> 00:03:37,740
+that, you know, everything in this
+它倾向于认为所有在这区域中的
+
+114
+00:03:38,000 --> 00:03:40,400
+region, everything on the
+在我画的这个圈上的样本
+
+115
+00:03:40,580 --> 00:03:43,390
+line that I'm circling over, has, you know, about equal probability,
+都具有相同的概率
+
+116
+00:03:44,160 --> 00:03:45,810
+and it doesn't realize that something
+它并不能意识到
+
+117
+00:03:46,790 --> 00:03:50,910
+out here actually has
+这边的其实比那边的
+
+118
+00:03:51,080 --> 00:03:53,130
+much lower probability than something over there.
+概率要低得多
+
+119
+00:03:55,060 --> 00:03:56,080
+So, in order to fix
+所以 为了解决这个问题
+
+120
+00:03:56,270 --> 00:03:57,300
+this, we can, we're going to
+我们要开发一种
+
+121
+00:03:57,580 --> 00:03:58,930
+develop a modified version of
+改良版的异常检测算法
+
+122
+00:03:58,990 --> 00:04:01,030
+the anomaly detection algorithm, using
+改良版的异常检测算法
+
+123
+00:04:01,430 --> 00:04:02,520
+something called the multivariate
+要用到一种
+
+124
+00:04:02,580 --> 00:04:05,880
+Gaussian distribution also called the multivariate normal distribution.
+叫做多元高斯分布或者多元正态分布的东西
+
+125
+00:04:07,330 --> 00:04:08,120
+So here's what we're going to
+所以这是我们要做的
+
+126
+00:04:08,810 --> 00:04:10,270
+do. We have features x
+我们有特征变量 x
+
+127
+00:04:10,470 --> 00:04:11,680
+which are in Rn and
+它属于 Rn
+
+128
+00:04:11,910 --> 00:04:14,180
+instead of P of X 1, P of X 2, separately,
+我们不要把 p(x1) p(x2) 分开
+
+129
+00:04:14,570 --> 00:04:15,630
+we're going to model P of
+而要建立一个 p(x) 整体的模型
+
+130
+00:04:15,800 --> 00:04:16,840
+X, all in one go,
+而要建立一个 p(x) 整体的模型
+
+131
+00:04:17,010 --> 00:04:18,970
+so model P of X, you know, all at the same time.
+就是一次性建立 p(x) 的模型
+
+132
+00:04:20,300 --> 00:04:21,550
+So the parameters of the
+多元高斯分布的参数是向量 μ
+
+133
+00:04:21,830 --> 00:04:24,140
+multivariate Gaussian distribution are mu,
+多元高斯分布的参数是向量 μ
+
+134
+00:04:24,630 --> 00:04:25,770
+which is a vector, and sigma,
+和一个 n 乘 n 矩阵 Σ
+
+135
+00:04:26,490 --> 00:04:28,450
+which is an n by n matrix, called a covariance matrix,
+Σ 被称为协方差矩阵
+
+136
+00:04:29,640 --> 00:04:30,870
+and this is similar to the
+它类似于我们之前
+
+137
+00:04:31,010 --> 00:04:32,220
+covariance matrix that we
+在学习 PCA
+
+138
+00:04:32,430 --> 00:04:33,560
+saw when we were working
+也就是主成分分析的时候
+
+139
+00:04:34,080 --> 00:04:35,200
+with the PCA, with the
+也就是主成分分析的时候
+
+140
+00:04:35,280 --> 00:04:36,700
+principal components analysis algorithm.
+所见到的协方差矩阵
+
+141
+00:04:37,860 --> 00:04:38,970
+For the second complete is, let
+我们来试着完成这个
+
+142
+00:04:39,070 --> 00:04:39,880
+me just write out the formula
+让我来写出
+
+143
+00:04:40,930 --> 00:04:42,390
+for the multivariate Gaussian distribution.
+多元高斯分布的公式
+
+144
+00:04:42,820 --> 00:04:44,030
+So we say that probability of
+我们说 x 的概率
+
+145
+00:04:44,140 --> 00:04:45,100
+X, and this is parameterized
+确定它的参数是
+
+146
+00:04:46,090 --> 00:04:47,500
+by my parameters mu and
+我的参数 μ 和 Σ
+
+147
+00:04:47,640 --> 00:04:49,280
+sigma that the
+我的参数 μ 和 Σ
+
+148
+00:04:49,360 --> 00:04:50,100
+probability of x is equal
+x 的概率等于
+
+149
+00:04:50,430 --> 00:04:52,260
+to once again
+再说一次
+
+150
+00:04:52,580 --> 00:04:54,810
+there's absolutely no need to memorize this formula.
+完全没有必要去记这个公式
+
+151
+00:04:56,030 --> 00:04:56,780
+You know, you can look it up
+因为你可以在
+
+152
+00:04:57,010 --> 00:04:58,160
+whenever you need to use
+任何需要用它的时候找到它
+
+153
+00:04:58,340 --> 00:04:59,130
+it, but this is what
+但是 x 的概率
+
+154
+00:04:59,690 --> 00:05:01,230
+the probability of X looks like.
+看起来是这样的
+
+155
+00:05:03,000 --> 00:05:04,680
+Transverse, 2nd inverse, X
+转置 Σ 的逆
+
+156
+00:05:05,220 --> 00:05:06,300
+minus mu.
+x-μ
+
+157
+00:05:07,400 --> 00:05:08,850
+And this thing here,
+这边这个东西
+
+158
+00:05:10,390 --> 00:05:11,510
+the absolute value of sigma, this
+Σ 的绝对值
+
+159
+00:05:11,680 --> 00:05:13,140
+thing here when you write
+当我们写这个符号的时候
+
+160
+00:05:13,410 --> 00:05:14,430
+this symbol, this is called
+这个东西叫做 Σ 的行列式 (determinant)
+
+161
+00:05:14,600 --> 00:05:17,220
+the determent of sigma
+这个东西叫做 Σ 的行列式 (determinant)
+
+162
+00:05:18,150 --> 00:05:19,620
+and this is a mathematical function
+它是一个矩阵的数学函数
+
+163
+00:05:20,210 --> 00:05:21,740
+of a matrix and you really
+你不需要知道
+
+164
+00:05:21,960 --> 00:05:22,820
+don't need to know what the
+你不需要知道
+
+165
+00:05:23,240 --> 00:05:24,250
+determinant of a matrix is,
+矩阵的行列式是什么
+
+166
+00:05:24,780 --> 00:05:25,770
+but really all you need to
+你真的需要知道的
+
+167
+00:05:25,860 --> 00:05:27,180
+know is that you can
+就是可以在 octave 里
+
+168
+00:05:27,320 --> 00:05:29,380
+compute it in octave by using
+使用 octave 命令
+
+169
+00:05:29,760 --> 00:05:31,820
+the octave command DET of
+det(Sigma) 来计算它
+
+170
+00:05:33,570 --> 00:05:33,570
+sigma.
+det(Sigma) 来计算它
+
+171
+00:05:34,010 --> 00:05:36,210
+Okay, and again, just be clear, alright?
+好 再确认一次
+
+172
+00:05:36,300 --> 00:05:38,240
+In this expression, these sigmas
+在这个表达式中
+
+173
+00:05:38,730 --> 00:05:41,250
+here, these are just n by n matrix.
+这些 Σ 是 n 乘 n 矩阵
+
+174
+00:05:41,850 --> 00:05:43,150
+This is not a summation and
+这不是一个和号
+
+175
+00:05:43,260 --> 00:05:45,680
+you know, the sigma there is an n by n matrix.
+这个 Σ 是一个 n 乘 n 矩阵
+
+176
+00:05:46,710 --> 00:05:47,780
+So that's the formula for P
+所以那就是 p(x) 的公式
+
+177
+00:05:48,010 --> 00:05:50,500
+of X, but it's
+但是更有趣
+
+178
+00:05:50,820 --> 00:05:52,030
+more interestingly, or more importantly,
+或者说更重要的是
+
+179
+00:05:53,940 --> 00:05:55,610
+what does P of X actually looks like?
+p(x) 到底什么样子
+
+180
+00:05:56,190 --> 00:05:57,450
+Lets look at some examples of
+我们来看一些
+
+181
+00:05:58,020 --> 00:06:00,690
+multivariate Gaussian distributions.
+多元高斯分布的例子
+
+182
+00:06:02,350 --> 00:06:03,380
+So let's take a
+我们来看一个二维的例子
+
+183
+00:06:03,500 --> 00:06:04,700
+two dimensional example, say if
+我们来看一个二维的例子
+
+184
+00:06:04,820 --> 00:06:06,550
+I have N equals 2, I
+如果我有 n 等于 2
+
+185
+00:06:06,710 --> 00:06:08,160
+have two features, X 1 and X 2.
+两个特征变量 x1 和 x2
+
+186
+00:06:09,250 --> 00:06:10,540
+Lets say I set MU to
+如果说我让 μ 等于 0
+
+187
+00:06:10,650 --> 00:06:11,800
+be equal to 0 and sigma
+如果说我让 μ 等于 0
+
+188
+00:06:12,330 --> 00:06:14,030
+to be equal to this matrix here.
+让 Σ 等于这个矩阵
+
+189
+00:06:14,200 --> 00:06:16,710
+With 1s on the diagonals and 0s on the off-diagonals,
+让对角线上的值等于 1 非对角线上的值等于 0
+
+190
+00:06:17,600 --> 00:06:19,980
+this matrix is sometimes also called the identity matrix.
+这个矩阵有时会被叫做单位矩阵 (identity matrix)
+
+191
+00:06:21,350 --> 00:06:22,470
+In that case, p of
+在这个情况下
+
+192
+00:06:22,590 --> 00:06:24,950
+x will look like
+p(x) 看起来会是这样
+
+193
+00:06:25,240 --> 00:06:27,430
+this, and what
+我在这个图里展示的是
+
+194
+00:06:27,600 --> 00:06:29,380
+I'm showing in this figure is, you know,
+我在这个图里展示的是
+
+195
+00:06:29,500 --> 00:06:30,900
+for a specific value of X1
+对于一个特定的 x1 的值
+
+196
+00:06:31,240 --> 00:06:32,860
+and for a specific value of
+和一个特定的 x2 的值
+
+197
+00:06:32,970 --> 00:06:34,680
+X2, the height of
+这个面的高度
+
+198
+00:06:34,810 --> 00:06:36,470
+this surface the value
+这个面的高度
+
+199
+00:06:36,970 --> 00:06:38,330
+of p of x. And
+就是 p(x) 的值
+
+200
+00:06:38,470 --> 00:06:39,520
+so with this setting the parameters
+所以在这个参数设定下
+
+201
+00:06:40,610 --> 00:06:42,100
+p of x is highest when
+p(x) 在 x1 和 x2 都等于 0 时最高
+
+202
+00:06:42,300 --> 00:06:43,620
+X1 and X2 equal zero 0,
+p(x) 在 x1 和 x2 都等于 0 时最高
+
+203
+00:06:44,010 --> 00:06:45,710
+so that's the peak of this Gaussian distribution,
+那就是高斯分布的峰值
+
+204
+00:06:46,950 --> 00:06:48,760
+and the probability falls off with this
+然后这个概率
+
+205
+00:06:48,970 --> 00:06:51,330
+sort of two dimensional Gaussian or
+这个二元高斯分布
+
+206
+00:06:51,510 --> 00:06:53,590
+this bell shaped two dimensional bell-shaped surface.
+随着这个二维钟形的面衰减
+
+207
+00:06:55,080 --> 00:06:56,400
+Down below is the same
+下面这个图和上面是一样的
+
+208
+00:06:56,610 --> 00:06:58,230
+thing but plotted using a
+但它是用等高线
+
+209
+00:06:58,330 --> 00:07:00,970
+contour plot instead, or using different colors,
+或者说不同颜色画的图
+
+210
+00:07:01,150 --> 00:07:02,020
+and so this
+所以中间这里这个
+
+211
+00:07:02,530 --> 00:07:04,210
+heavy intense red in the
+很强烈的暗红色
+
+212
+00:07:04,280 --> 00:07:06,260
+middle, corresponds to the highest values,
+对应的是最高值
+
+213
+00:07:06,850 --> 00:07:08,230
+and then the values decrease
+然后这个值降低
+
+214
+00:07:08,790 --> 00:07:10,470
+with the yellow being slightly lower
+黄色表示低一点儿的值
+
+215
+00:07:10,700 --> 00:07:11,830
+values the cyan being
+青色表示更低一些的值
+
+216
+00:07:12,060 --> 00:07:13,230
+lower values and this deep
+这里的深蓝色
+
+217
+00:07:14,000 --> 00:07:15,440
+blue being the lowest
+表示的是最低的值
+
+218
+00:07:15,450 --> 00:07:17,010
+values so this is really the same figure but plotted
+所以这个其实是同一张图
+
+219
+00:07:17,240 --> 00:07:19,410
+viewed from the top instead, using colors instead.
+就是才用俯视的角度 并且使用了颜色
+
+220
+00:07:21,390 --> 00:07:22,510
+And so, with this distribution,
+所以 从这个分布
+
+221
+00:07:23,830 --> 00:07:25,010
+you see that it faces most
+你可以看出来
+
+222
+00:07:25,300 --> 00:07:27,440
+of the probability near 0,0
+大部分概率都在 0 0 附近
+
+223
+00:07:27,600 --> 00:07:28,630
+and then as you go out
+然后 随着从 0 0 这个点往外延伸
+
+224
+00:07:28,710 --> 00:07:32,450
+from 0,0 the probability of X1 and X2 goes down.
+x1 和 x2 的概率下降
+
+225
+00:07:36,000 --> 00:07:37,220
+Now lets try varying some
+现在我们来试试
+
+226
+00:07:37,310 --> 00:07:38,630
+of the parameters and see
+改变一些参数
+
+227
+00:07:38,770 --> 00:07:40,150
+what happens. So let's
+然后看看会发生什么
+
+228
+00:07:40,940 --> 00:07:42,420
+take sigma and change it
+我们来改变一下 Σ
+
+229
+00:07:42,590 --> 00:07:44,720
+so let's say sigma shrinks a
+假如说缩小一下 Σ
+
+230
+00:07:44,870 --> 00:07:46,350
+little bit. Sigma is a
+Σ 是一个协方差矩阵
+
+231
+00:07:46,580 --> 00:07:47,710
+covariance matrix and so it
+所以它衡量的是方差
+
+232
+00:07:47,820 --> 00:07:49,030
+measures the variance or the
+所以它衡量的是方差
+
+233
+00:07:49,120 --> 00:07:50,640
+variability of the features X1 X2.
+或者说特征变量 x1 和 x2 的变化量
+
+234
+00:07:50,720 --> 00:07:52,080
+So if the shrink
+所以如果缩小 Σ
+
+235
+00:07:52,400 --> 00:07:53,430
+sigma then what you get
+那么你的得到的是
+
+236
+00:07:53,780 --> 00:07:54,290
+is what you get is that the
+那么你的得到的是
+
+237
+00:07:54,400 --> 00:07:56,320
+width of this bump diminishes
+这个鼓包的宽度会减小
+
+238
+00:07:57,760 --> 00:07:59,310
+and the height also
+高度会增加一点
+
+239
+00:07:59,550 --> 00:08:00,620
+increases a bit, because the
+因为在这个面以下的
+
+240
+00:08:01,090 --> 00:08:03,080
+area under the surface is equal to 1.
+区域等于 1
+
+241
+00:08:03,130 --> 00:08:04,400
+So the integral of the
+所以这个面以下的
+
+242
+00:08:04,950 --> 00:08:06,230
+volume under the surface is
+体积的积分等于 1
+
+243
+00:08:06,580 --> 00:08:08,000
+equal to 1, because probability
+因为概率分布的积分
+
+244
+00:08:08,690 --> 00:08:10,080
+distribution must integrate to one.
+必须等于一
+
+245
+00:08:10,800 --> 00:08:11,650
+But, if you shrink the variance,
+但是 如果你缩小方差
+
+246
+00:08:12,660 --> 00:08:14,290
+it's kinda like shrinking
+相当于缩小 Σ 的平方
+
+247
+00:08:14,810 --> 00:08:15,870
+sigma squared,
+相当于缩小 Σ 的平方
+
+248
+00:08:16,740 --> 00:08:20,080
+you end up with a narrower distribution, and one that's a little bit taller.
+你会得到一个窄一些 高一些的分布
+
+249
+00:08:20,860 --> 00:08:22,150
+And so you see here also the
+在这儿你也看到
+
+250
+00:08:22,580 --> 00:08:27,200
+concentric ellipsis has shrunk a little bit.
+这些同心椭圆也缩小了一些
+
+251
+00:08:27,340 --> 00:08:28,730
+Whereas in contrast if you were to increase sigma
+而相对的 如果你
+
+252
+00:08:29,770 --> 00:08:31,000
+to 2 2 on the
+增加 Σ 对角线上的值到 2 2
+
+253
+00:08:31,110 --> 00:08:32,020
+diagonals, so it is now two
+所以它现在是单位矩阵的二倍
+
+254
+00:08:32,220 --> 00:08:34,370
+times the identity then you end up with a
+那么最后你会得到
+
+255
+00:08:34,510 --> 00:08:35,880
+much wider and much flatter Gaussian.
+一个更宽更扁的高斯分布
+
+256
+00:08:36,150 --> 00:08:38,190
+And so the width of this is much wider.
+所以这个的宽度会更宽
+
+257
+00:08:38,930 --> 00:08:39,800
+This is hard to see but this
+虽然很难看出来
+
+258
+00:08:40,020 --> 00:08:41,090
+is still a bell shaped bump,
+但这还是一个钟形的鼓包
+
+259
+00:08:41,210 --> 00:08:42,540
+it's just flattened down a lot,
+它只是扁平了很多
+
+260
+00:08:42,620 --> 00:08:44,470
+it has become much wider and
+它变得更宽了
+
+261
+00:08:44,590 --> 00:08:45,720
+so the variance or the
+所以 x1 和 x2 的方差或者变化量变大了
+
+262
+00:08:45,830 --> 00:08:48,690
+variability of X1 and X2 just becomes wider.
+所以 x1 和 x2 的方差或者变化量变大了
+
+263
+00:08:50,520 --> 00:08:50,980
+Here are a few more examples.
+下面再举几个例子
+
+264
+00:08:51,670 --> 00:08:53,930
+Now lets try varying
+现在我们试一下
+
+265
+00:08:54,070 --> 00:08:55,490
+one of the elements of sigma at the time.
+一次改变 Σ 的一个元素
+
+266
+00:08:55,840 --> 00:08:58,080
+Let's say I send sigma to
+假如说我把 Σ
+
+267
+00:08:58,140 --> 00:09:00,020
+0.6 there, and 1 over there.
+改为这里是 0.6 那里是 1
+
+268
+00:09:01,340 --> 00:09:02,380
+What this does, is this
+它所做事情的是
+
+269
+00:09:02,610 --> 00:09:04,240
+reduces the variance of
+减小第一个特征变量 x1 的方差
+
+270
+00:09:05,780 --> 00:09:06,960
+the first feature, X 1, while
+减小第一个特征变量 x1 的方差
+
+271
+00:09:07,770 --> 00:09:08,890
+keeping the variance of the
+同时保持第二个特征变量
+
+272
+00:09:08,960 --> 00:09:11,530
+second feature X 2, the same.
+x2 的方差不变
+
+273
+00:09:12,160 --> 00:09:15,150
+And so with this setting of parameters, you can model things like that.
+在这个参数设置下 就可以给这样的东西建模
+
+274
+00:09:15,670 --> 00:09:16,910
+X 1 has smaller variance, and
+x1 有小一些的方差
+
+275
+00:09:17,580 --> 00:09:19,120
+X 2 has larger variance.
+而 x2 有大一些的方差
+
+276
+00:09:20,080 --> 00:09:20,800
+Whereas if I do this,
+然而如果我这样做
+
+277
+00:09:21,120 --> 00:09:22,900
+if I set this
+把这个矩阵设置为 2 1
+
+278
+00:09:23,090 --> 00:09:24,390
+matrix to 2, 1
+把这个矩阵设置为 2 1
+
+279
+00:09:24,560 --> 00:09:25,900
+then you can also model
+那么你也可以
+
+280
+00:09:26,230 --> 00:09:27,470
+examples where you know here
+建立这样的模型
+
+281
+00:09:28,850 --> 00:09:30,590
+we'll say X1 can have take
+x1 的变化范围比较大
+
+282
+00:09:30,830 --> 00:09:31,930
+on a large range of values
+x1 的变化范围比较大
+
+283
+00:09:32,220 --> 00:09:34,870
+whereas X2 takes on a relatively narrower range of values.
+而 x2 的变化范围则窄一些
+
+284
+00:09:35,070 --> 00:09:37,060
+And that's reflected in this
+它也被反映到这张图上了
+
+285
+00:09:37,270 --> 00:09:38,040
+figure as well, you know where,
+它也被反映到这张图上了
+
+286
+00:09:38,750 --> 00:09:40,530
+the distribution falls off
+这个分布随着 x1 远离 0
+
+287
+00:09:40,830 --> 00:09:42,670
+more slowly as X 1
+这个分布随着 x1 远离 0
+
+288
+00:09:42,820 --> 00:09:43,940
+moves away from 0,
+下降得更缓慢
+
+289
+00:09:44,180 --> 00:09:45,380
+and falls off very
+而随着 x2 远离 0
+
+290
+00:09:45,640 --> 00:09:48,080
+rapidly as X 2 moves away from 0.
+下降得非常快
+
+291
+00:09:49,190 --> 00:09:50,710
+And similarly if
+类似地
+
+292
+00:09:50,800 --> 00:09:52,320
+we were to modify
+如果我们改变
+
+293
+00:09:53,010 --> 00:09:54,490
+this element of the
+矩阵的这个元素
+
+294
+00:09:54,660 --> 00:09:55,570
+matrix instead, then similar to the previous
+那么会类似于上一页
+
+295
+00:09:57,390 --> 00:09:58,860
+slide, except that here where
+除了在这儿
+
+296
+00:09:59,450 --> 00:10:00,900
+you know playing around here saying
+我们说 x2 的变化区间非常小
+
+297
+00:10:01,240 --> 00:10:03,010
+that X2 can take on
+我们说 x2 的变化区间非常小
+
+298
+00:10:03,170 --> 00:10:04,460
+a very small range of values
+我们说 x2 的变化区间非常小
+
+299
+00:10:05,190 --> 00:10:06,840
+and so here if this
+所以在这里 如果这个是 0.6
+
+300
+00:10:07,200 --> 00:10:08,740
+is 0.6, we notice now X2
+我们发现现在 x2 的变化区间
+
+301
+00:10:09,810 --> 00:10:10,610
+tends to take on a much
+我们发现现在 x2 的变化区间
+
+302
+00:10:10,760 --> 00:10:12,930
+smaller range of values than the original example,
+比原来的例子要小很多
+
+303
+00:10:14,010 --> 00:10:15,310
+whereas if we were to
+然而如果我要让 Σ 等于 2
+
+304
+00:10:15,680 --> 00:10:17,320
+set sigma to be equal to 2 then
+然而如果我要让 Σ 等于 2
+
+305
+00:10:17,410 --> 00:10:20,580
+that's like saying X2 you know, has a much larger range of values.
+这就是说让 x2 有大一些的变化区间
+
+306
+00:10:22,780 --> 00:10:23,570
+Now, one of the cool
+现在 多元高斯分布的
+
+307
+00:10:23,880 --> 00:10:24,950
+things about the multivariate
+现在 多元高斯分布的
+
+308
+00:10:25,190 --> 00:10:26,690
+Gaussian distribution is that
+一个很棒的事情是
+
+309
+00:10:26,880 --> 00:10:28,050
+you can also use it to
+你可以用它给数据的
+
+310
+00:10:28,330 --> 00:10:30,230
+model correlations between the data.
+相关性建立模型
+
+311
+00:10:30,410 --> 00:10:31,930
+That is we can use it to
+我们可以用它
+
+312
+00:10:32,060 --> 00:10:33,510
+model the fact that
+来给 x1 和 x2 高度相关的情况建立模型
+
+313
+00:10:33,610 --> 00:10:34,940
+X1 and X2 tend to be
+来给 x1 和 x2 高度相关的情况建立模型
+
+314
+00:10:35,070 --> 00:10:36,760
+highly correlated with each other for example.
+来给 x1 和 x2 高度相关的情况建立模型
+
+315
+00:10:37,640 --> 00:10:38,880
+So specifically if you start
+所以具体来说
+
+316
+00:10:39,540 --> 00:10:40,720
+to change the off diagonal
+如果你改变协方差矩阵
+
+317
+00:10:41,340 --> 00:10:42,390
+entries of this covariance
+非对角线上的元素
+
+318
+00:10:42,950 --> 00:10:45,250
+matrix you can get a different type of Gaussian distribution.
+你会得到一种不同的高斯分布
+
+319
+00:10:46,610 --> 00:10:48,250
+And so as I
+所以当我将非对角线的元素
+
+320
+00:10:48,340 --> 00:10:49,590
+increase the off-diagonal entries
+所以当我将非对角线的元素
+
+321
+00:10:50,090 --> 00:10:51,300
+from .5 to .8, what
+从 0.5 增加到 0.8 时
+
+322
+00:10:51,580 --> 00:10:53,080
+I get is this distribution that
+我会得到一个
+
+323
+00:10:53,380 --> 00:10:54,590
+is more and more thinly peaked
+更加窄和高的
+
+324
+00:10:55,100 --> 00:10:57,480
+along this sort of x equals y line.
+沿着 x=y 这条线的分布
+
+325
+00:10:57,700 --> 00:10:59,100
+And so here the
+然后这个等高线图告诉我们
+
+326
+00:10:59,160 --> 00:11:00,610
+contour says that x and
+x 和 y 看起来是一起增加的
+
+327
+00:11:00,730 --> 00:11:03,010
+y tend to grow together and
+x 和 y 看起来是一起增加的
+
+328
+00:11:03,290 --> 00:11:04,500
+the things that are with
+概率高的地方是这样的
+
+329
+00:11:04,640 --> 00:11:06,550
+large probability are if
+概率高的地方是这样的
+
+330
+00:11:06,790 --> 00:11:08,140
+either X1 is large and
+要么 x1 很大 x2 也很大
+
+331
+00:11:08,260 --> 00:11:09,560
+Y2 is large or X1
+要么 x1 很大 x2 也很大
+
+332
+00:11:09,890 --> 00:11:11,160
+is small and Y2 is small.
+或者 x1 很小 x2 也很小
+
+333
+00:11:11,490 --> 00:11:12,480
+Or somewhere in between.
+或者是这两者之间
+
+334
+00:11:13,110 --> 00:11:14,700
+And as this entry,
+然后随着这个值
+
+335
+00:11:15,130 --> 00:11:16,280
+0.8 gets large, you get
+0.8 增大
+
+336
+00:11:16,490 --> 00:11:18,410
+a Gaussian distribution, that's sort of
+你会得到这样一个高斯分布
+
+337
+00:11:18,660 --> 00:11:20,570
+where all the probability lies on
+差不多全部的概率都在一个很窄的范围内
+
+338
+00:11:20,770 --> 00:11:22,870
+this sort of narrow region,
+差不多全部的概率都在一个很窄的范围内
+
+339
+00:11:24,350 --> 00:11:26,200
+where x is approximately equal to
+也就是 x 几乎等于 y
+
+340
+00:11:26,420 --> 00:11:27,530
+y. This is a very
+它是一个非常高
+
+341
+00:11:28,020 --> 00:11:30,290
+tall, thin distribution you know
+而且非常薄的分布
+
+342
+00:11:30,670 --> 00:11:32,570
+line mostly along this line
+几乎完全在 x 非常接近于 y 的这样一个
+
+343
+00:11:33,860 --> 00:11:34,940
+central region where x is
+几乎完全在 x 非常接近于 y 的这样一个
+
+344
+00:11:35,010 --> 00:11:36,860
+close to y. So this
+非常窄的范围内
+
+345
+00:11:37,130 --> 00:11:38,350
+is if we set these
+这是当我们把这些元素
+
+346
+00:11:38,810 --> 00:11:40,360
+entries to be positive entries.
+设置为正数时的情况
+
+347
+00:11:40,970 --> 00:11:42,120
+In contrast if we set
+相对地 如果我们
+
+348
+00:11:42,460 --> 00:11:43,530
+these to negative values, as
+将它们设置为负数
+
+349
+00:11:44,350 --> 00:11:46,340
+I decreases it to -.5
+随着我把它从 -0.5
+
+350
+00:11:46,380 --> 00:11:47,920
+down to -.8, then
+减小到 -0.8
+
+351
+00:11:48,060 --> 00:11:49,360
+what we get is a model where
+那么我得到的模型是
+
+352
+00:11:49,870 --> 00:11:50,930
+we put most of the probability
+大部分的概率都在
+
+353
+00:11:51,620 --> 00:11:53,930
+in this sort of negative X
+x1 和 x2 负相关的这样一个区域内
+
+354
+00:11:54,010 --> 00:11:55,420
+one in the next 2 correlation region,
+x1 和 x2 负相关的这样一个区域内
+
+355
+00:11:55,710 --> 00:11:57,330
+and so, most of the
+那么大部分的概率
+
+356
+00:11:57,480 --> 00:11:58,420
+probability now lies in this region,
+几乎都落在 x1 和 -x2 差不多相等的区间内
+
+357
+00:11:58,810 --> 00:11:59,910
+where X 1 is about equal
+几乎都落在 x1 和 -x2 差不多相等的区间内
+
+358
+00:12:00,190 --> 00:12:01,700
+to -X 2, rather than X
+而不是 x1 等于 x2 的区间
+
+359
+00:12:01,890 --> 00:12:03,370
+1 equals X 2.
+而不是 x1 等于 x2 的区间
+
+360
+00:12:04,180 --> 00:12:05,460
+And so this captures a sort
+所以这个捕捉到了
+
+361
+00:12:05,610 --> 00:12:08,050
+of negative correlation between x1
+x1 和 x2 的负相关性
+
+362
+00:12:10,300 --> 00:12:10,650
+and x2.
+x1 和 x2 的负相关性
+
+363
+00:12:11,010 --> 00:12:12,550
+And so this is
+希望这些例子
+
+364
+00:12:12,750 --> 00:12:13,640
+a hopefully this gives you a sense of the
+能让你体会到
+
+365
+00:12:13,750 --> 00:12:15,230
+different distributions that the
+多元高斯分布所能展现的不同的分布
+
+366
+00:12:15,650 --> 00:12:17,400
+multivariate Gaussian distribution can capture.
+多元高斯分布所能展现的不同的分布
+
+367
+00:12:18,680 --> 00:12:20,430
+So follow up in varying, the
+到目前为止
+
+368
+00:12:20,730 --> 00:12:22,200
+covariance matrix sigma, the other
+我一直在改变协方差矩阵 Σ
+
+369
+00:12:22,910 --> 00:12:23,880
+thing you can do is
+你还可以做的事情是
+
+370
+00:12:24,030 --> 00:12:26,090
+also, vary the mean
+改变平均值参数 μ
+
+371
+00:12:26,300 --> 00:12:27,730
+parameter mu, and so
+改变平均值参数 μ
+
+372
+00:12:28,370 --> 00:12:29,740
+operationally, we have mu
+我们的 μ 本来是等于 0 0 的
+
+373
+00:12:30,270 --> 00:12:31,190
+equal 0 0, and so the
+我们的 μ 本来是等于 0 0 的
+
+374
+00:12:31,250 --> 00:12:32,820
+distribution was centered around
+所以分布才会集中在
+
+375
+00:12:33,270 --> 00:12:34,650
+X 1 equals 0, X2 equals 0,
+x1=0 x2=0 这个点周围
+
+376
+00:12:35,050 --> 00:12:35,980
+so the peak of the
+所以这个分布的峰值在这里
+
+377
+00:12:36,070 --> 00:12:38,530
+distribution is here, whereas,
+所以这个分布的峰值在这里
+
+378
+00:12:38,950 --> 00:12:40,430
+if we vary the values of
+而如果我们改变 μ 的值
+
+379
+00:12:40,610 --> 00:12:42,120
+mu, then that varies the
+它就会改变
+
+380
+00:12:42,360 --> 00:12:43,700
+peak of the distribution and so,
+这个分布的峰值
+
+381
+00:12:43,910 --> 00:12:45,770
+if mu equals 0, 0.5,
+所以如果 μ 等于 0 0.5
+
+382
+00:12:45,920 --> 00:12:47,100
+the peak is at, you know,
+这个峰值就在
+
+383
+00:12:47,270 --> 00:12:49,470
+X1 equals zero, and X2
+x1=0 x2=0.5 这里
+
+384
+00:12:49,810 --> 00:12:51,430
+equals 0.5, and so the
+所以这个分布的
+
+385
+00:12:51,980 --> 00:12:53,400
+peak or the center of
+峰值或者说中心
+
+386
+00:12:53,710 --> 00:12:55,260
+this distribution has shifted,
+就会被移动
+
+387
+00:12:56,470 --> 00:12:57,770
+and if mu was 1.5
+如果 μ 等于 1.5 -0.5
+
+388
+00:12:58,340 --> 00:13:00,050
+minus 0.5 then OK,
+那么还是同样地
+
+389
+00:13:01,170 --> 00:13:03,350
+and similarly the peak
+现在分布的峰值
+
+390
+00:13:03,890 --> 00:13:05,490
+of the distribution has now
+就会被移动到另一个地方
+
+391
+00:13:05,620 --> 00:13:06,750
+shifted to a different location,
+就会被移动到另一个地方
+
+392
+00:13:07,670 --> 00:13:09,710
+corresponding to where, you know,
+这个新地方就对应
+
+393
+00:13:09,910 --> 00:13:11,020
+X1 is 1.5 and X2
+x1=1.5 x2=-0.5 这个点
+
+394
+00:13:11,350 --> 00:13:12,710
+is -0.5, and so
+所以改变参数 μ
+
+395
+00:13:13,290 --> 00:13:15,180
+varying the mu parameter, just shifts
+就是在移动
+
+396
+00:13:15,730 --> 00:13:17,840
+around the center of this whole distribution.
+这整个分布的中心
+
+397
+00:13:18,450 --> 00:13:19,670
+So, hopefully, looking at
+所以 希望这些不同的图片
+
+398
+00:13:19,780 --> 00:13:21,270
+all these different pictures gives you
+所以 希望这些不同的图片
+
+399
+00:13:21,410 --> 00:13:22,440
+a sense of the sort
+能够帮助你了解一下多元高斯分布
+
+400
+00:13:22,700 --> 00:13:24,850
+of probability distributions that
+能够帮助你了解一下多元高斯分布
+
+401
+00:13:25,070 --> 00:13:28,000
+the Multivariate Gaussian Distribution allows you to capture.
+所能描述的概率分布是什么样的
+
+402
+00:13:28,800 --> 00:13:29,800
+And the key advantage of it
+它最重要的优势
+
+403
+00:13:29,990 --> 00:13:30,930
+is it allows you to
+就是它可以让你
+
+404
+00:13:31,130 --> 00:13:32,240
+capture, when you'd expect
+能够描述当两个特征变量之间
+
+405
+00:13:32,750 --> 00:13:33,840
+two different features to be
+可能存在正相关
+
+406
+00:13:33,970 --> 00:13:36,560
+positively correlated, or maybe negatively correlated.
+或者是负相关关系的情况
+
+407
+00:13:37,790 --> 00:13:39,030
+In the next video, we'll take
+在接下来的视频中
+
+408
+00:13:39,260 --> 00:13:40,760
+this multivariate Gaussian distribution
+我们要把这个多元高斯分布
+
+409
+00:13:41,670 --> 00:13:43,290
+and apply it to anomaly detection.
+应用到异常检测中
+
diff --git a/srt/15 - 8 - Anomaly Detection using the Multivariate Gaussian Distribution (Optional) (14 min).srt b/srt/15 - 8 - Anomaly Detection using the Multivariate Gaussian Distribution (Optional) (14 min).srt
new file mode 100644
index 00000000..5907de38
--- /dev/null
+++ b/srt/15 - 8 - Anomaly Detection using the Multivariate Gaussian Distribution (Optional) (14 min).srt
@@ -0,0 +1,1815 @@
+1
+00:00:00,330 --> 00:00:01,420
+In the last video we talked
+在我们谈到的上一个视频(字幕翻译:中国海洋大学 黄海广 haiguang2000@qq.com)
+
+2
+00:00:01,750 --> 00:00:03,510
+about the Multivariate Gaussian Distribution
+关于多元高斯分布
+
+3
+00:00:04,720 --> 00:00:06,990
+and saw some examples of the
+,看到的一些
+
+4
+00:00:07,230 --> 00:00:08,830
+sorts of distributions you can model, as
+建立的各种分布模型,
+
+5
+00:00:08,960 --> 00:00:10,880
+you vary the parameters, mu and sigma.
+当你改变参数,mu和sigma。
+
+6
+00:00:11,830 --> 00:00:13,190
+In this video, let's take those
+在这段视频中,让我们这些
+
+7
+00:00:13,420 --> 00:00:14,690
+ideas, and apply them
+想法,并应用它们
+
+8
+00:00:14,890 --> 00:00:17,550
+to develop a different anomaly detection algorithm.
+制定一个不同的异常检测算法。
+
+9
+00:00:19,880 --> 00:00:21,890
+To recap the multivariate Gaussian
+要回顾一下多元高斯
+
+10
+00:00:22,270 --> 00:00:23,080
+distribution and the multivariate normal
+分布和多元正态分布
+
+11
+00:00:23,770 --> 00:00:26,640
+distribution has two parameters, mu and sigma.
+分布有两个参数,mu和sigma。
+
+12
+00:00:27,210 --> 00:00:28,850
+Where mu this an n
+其中mu这一个n
+
+13
+00:00:28,990 --> 00:00:31,110
+dimensional vector and sigma,
+维向量和sigma,
+
+14
+00:00:32,110 --> 00:00:34,430
+the covariance matrix, is an
+的协方差矩阵,是一种
+
+15
+00:00:34,810 --> 00:00:36,110
+n by n matrix.
+n乘n的矩阵。
+
+16
+00:00:37,330 --> 00:00:38,710
+And here's the formula for
+而这里的公式
+
+17
+00:00:38,740 --> 00:00:39,780
+the probability of X, as
+X的概率,如
+
+18
+00:00:40,480 --> 00:00:41,870
+parameterized by mu and
+按mu和参数化
+
+19
+00:00:42,240 --> 00:00:43,770
+sigma, and as you
+sigma,和你的
+
+20
+00:00:43,890 --> 00:00:45,010
+vary mu and sigma, you
+变量mu和sigma,你
+
+21
+00:00:45,100 --> 00:00:45,830
+can get a range of different
+可以得到一个范围的不同
+
+22
+00:00:46,240 --> 00:00:47,700
+distributions, like, you know,
+分布一样,你知道的,
+
+23
+00:00:47,760 --> 00:00:48,990
+these are three examples of the
+这些都是三个样本
+
+24
+00:00:49,060 --> 00:00:50,660
+ones that we saw in the previous video.
+那些我们在以前的视频看了。
+
+25
+00:00:51,800 --> 00:00:53,100
+So let's talk about the
+因此,让我们谈谈
+
+26
+00:00:53,260 --> 00:00:54,600
+parameter fitting or the
+参数拟合或
+
+27
+00:00:54,670 --> 00:00:56,260
+parameter estimation problem. The
+参数估计问题。该
+
+28
+00:00:56,800 --> 00:00:58,480
+question, as usual, is if
+问题,像往常一样,如果是
+
+29
+00:00:58,610 --> 00:00:59,890
+I have a set of examples
+我有一组样本
+
+30
+00:01:00,500 --> 00:01:02,140
+X1 through XM and here each
+X1到XM并且这里的每个
+
+31
+00:01:02,410 --> 00:01:03,750
+of these examples is an
+这些样本是一个
+
+32
+00:01:04,420 --> 00:01:05,820
+n dimensional vector and I think my
+n维向量,我想我的
+
+33
+00:01:06,000 --> 00:01:08,280
+examples come from a multivariate Gaussian distribution.
+样本来自一个多元高斯分布。
+
+34
+00:01:09,470 --> 00:01:12,450
+How do I try to estimate my parameters mu and sigma?
+我如何尝试估计我的参数mu和sigma?
+
+35
+00:01:13,440 --> 00:01:15,070
+Well the standard formulas for
+以及标准公式的
+
+36
+00:01:15,270 --> 00:01:17,170
+estimating them is you
+估计是:你
+
+37
+00:01:17,330 --> 00:01:18,270
+set mu to be just
+设置mu是
+
+38
+00:01:18,580 --> 00:01:19,960
+the average of your training examples.
+你的训练样本的平均值。
+
+39
+00:01:21,010 --> 00:01:22,770
+And you set sigma to be equal to this.
+并设置sigma等于这一点。
+
+40
+00:01:23,130 --> 00:01:24,120
+And this is actually just
+这其实只是
+
+41
+00:01:24,250 --> 00:01:25,200
+like the sigma that we had
+像我们有sigma
+
+42
+00:01:25,490 --> 00:01:26,860
+written out, when we were
+写出来,当我们
+
+43
+00:01:27,150 --> 00:01:28,740
+using the PCA or
+使用PCA即
+
+44
+00:01:28,850 --> 00:01:30,750
+the Principal Components Analysis algorithm.
+主成分分析算法。
+
+45
+00:01:31,820 --> 00:01:32,730
+So you just plug in these
+所以你只需插入这
+
+46
+00:01:32,850 --> 00:01:34,290
+two formulas and this
+两个公式,这
+
+47
+00:01:34,570 --> 00:01:36,720
+would give you your estimated parameter
+会给你你估计的参数
+
+48
+00:01:37,160 --> 00:01:39,440
+mu and your estimated parameter sigma.
+mu和你估计的参数sigma。
+
+49
+00:01:41,580 --> 00:01:43,860
+So given the data set here is how you estimate mu and sigma.
+所以,这里给出的数据集是你如何估计mu和sigma。
+
+50
+00:01:44,270 --> 00:01:45,350
+Let's take this method
+让我们以这种方法
+
+51
+00:01:46,020 --> 00:01:47,410
+and just plug it
+而只需将其插入
+
+52
+00:01:47,610 --> 00:01:49,130
+into an anomaly detection algorithm.
+到异常检测算法。
+
+53
+00:01:50,050 --> 00:01:51,020
+So how do we
+那么,我们如何
+
+54
+00:01:51,080 --> 00:01:52,200
+put all of this together to
+把所有这一切共同
+
+55
+00:01:52,420 --> 00:01:54,160
+develop an anomaly detection algorithm?
+开发一个异常检测算法?
+
+56
+00:01:54,640 --> 00:01:55,780
+Here 's what we do.
+下面是我们做什么。
+
+57
+00:01:56,580 --> 00:01:57,480
+First we take our training
+首先,我们把我们的训练
+
+58
+00:01:57,960 --> 00:01:59,110
+set, and we fit the
+集,和我们的拟合
+
+59
+00:01:59,170 --> 00:02:00,210
+model, we fit P
+模型,我们计算P
+
+60
+00:02:00,390 --> 00:02:01,640
+of X, by, you know, setting
+的X,要知道,设定
+
+61
+00:02:02,100 --> 00:02:02,720
+mu and sigma as described
+mu和描述的一样sigma
+
+62
+00:02:03,780 --> 00:02:05,410
+on the previous slide.
+在上一张幻灯片。
+
+63
+00:02:07,370 --> 00:02:08,510
+Next when you are given
+您将得到下一个
+
+64
+00:02:08,720 --> 00:02:10,170
+a new example X. So
+一个新的样本X,所以
+
+65
+00:02:10,510 --> 00:02:11,430
+if you are given a test example,
+如果给你一个测试的样本,
+
+66
+00:02:12,450 --> 00:02:15,240
+lets take an earlier example to have a new example out here.
+让作为一个早期的样本有一个新的样本在这里。
+
+67
+00:02:15,880 --> 00:02:16,790
+And that is my test example.
+那是我的测试样本。
+
+68
+00:02:18,220 --> 00:02:19,670
+Given the new example X, what
+鉴于新的样本X,
+
+69
+00:02:19,810 --> 00:02:21,220
+we are going to do is compute
+我们要做的是计算
+
+70
+00:02:21,770 --> 00:02:23,420
+P of X, using this
+p(x),用这
+
+71
+00:02:23,690 --> 00:02:26,250
+formula for the multivariate Gaussian distribution.
+式为多元高斯分布。
+
+72
+00:02:27,720 --> 00:02:29,220
+And then, if P of
+然后,如果p(x)
+
+73
+00:02:29,470 --> 00:02:30,840
+X is very small, then we
+是非常小的,那么我们
+
+74
+00:02:30,950 --> 00:02:31,800
+flagged it as an anomaly,
+把它当作一个异常,
+
+75
+00:02:32,440 --> 00:02:33,570
+whereas, if P of X is greater
+然而,如果p(x)是远大于
+
+76
+00:02:33,750 --> 00:02:35,520
+than that parameter epsilon, then
+参数epsilon,则
+
+77
+00:02:35,670 --> 00:02:39,190
+we don't flag it as an anomaly.
+我们不会将其标记为异常。
+
+78
+00:02:39,400 --> 00:02:42,240
+So it turns out, if we were to fit a multivariate Gaussian distribution to this data set,
+所以,事实证明,如果我们要拟合多元高斯分布到这组数据,
+
+79
+00:02:42,560 --> 00:02:44,220
+so just the red crosses, not the green example,
+所以只是图中的红叉,不是绿的样本,
+
+80
+00:02:45,190 --> 00:02:46,100
+you end up with a Gaussian
+你完成了一个高斯
+
+81
+00:02:46,300 --> 00:02:48,080
+distribution that places lots
+分布的地方很多
+
+82
+00:02:48,350 --> 00:02:49,690
+of probability in the central
+在中央的概率
+
+83
+00:02:49,910 --> 00:02:51,840
+region, slightly less probability here,
+区域,这里概率稍微小,
+
+84
+00:02:52,440 --> 00:02:53,360
+slightly less probability here,
+在这里概率略少,
+
+85
+00:02:54,110 --> 00:02:55,010
+slightly less probability here,
+在这里概率略少,
+
+86
+00:02:56,020 --> 00:02:59,280
+and very low probability at the point that is way out here.
+并在该点是在这里的概率非常低。
+
+87
+00:03:01,260 --> 00:03:02,350
+And so, if you apply
+所以,如果你应用
+
+88
+00:03:02,840 --> 00:03:04,730
+the multivariate Gaussian distribution to
+多元高斯分布
+
+89
+00:03:04,920 --> 00:03:06,530
+this example, it will actually
+本例中,将实际
+
+90
+00:03:06,930 --> 00:03:08,610
+correctly flag that example.
+正确地标记样本。
+
+91
+00:03:09,520 --> 00:03:09,920
+as an anomaly.
+作为一个异常。
+
+92
+00:03:16,860 --> 00:03:18,080
+Finally it's worth saying
+最后,值得一说
+
+93
+00:03:18,430 --> 00:03:19,640
+a few words about what is
+简要描述
+
+94
+00:03:19,760 --> 00:03:21,900
+the relationship between the
+他们之间的关系:
+
+95
+00:03:21,950 --> 00:03:23,810
+multivariate Gaussian distribution model, and
+多元高斯分布模型和
+
+96
+00:03:24,030 --> 00:03:25,440
+the original model, where we
+原始模型,在那里我们
+
+97
+00:03:25,500 --> 00:03:26,870
+were modeling P(x)
+被建模的P
+
+98
+00:03:26,940 --> 00:03:28,000
+X as a product of this
+作为该商品
+
+99
+00:03:28,110 --> 00:03:28,890
+P of X1, P of X2,
+P(X1),P(X2),
+
+100
+00:03:29,150 --> 00:03:31,420
+up to P of Xn.
+到P(Xn)。
+
+101
+00:03:32,750 --> 00:03:33,890
+It turns out that you can
+事实证明,你可以
+
+102
+00:03:34,090 --> 00:03:35,310
+prove mathematically, I'm not
+数学上证明,我不是
+
+103
+00:03:35,590 --> 00:03:36,470
+going to do the proof here, but
+要在这里做了证明,但
+
+104
+00:03:36,540 --> 00:03:38,120
+you can prove mathematically that this
+你能证明在数学上,这
+
+105
+00:03:38,300 --> 00:03:40,610
+relationship, between the
+关系,之间的
+
+106
+00:03:40,650 --> 00:03:42,240
+multivariate Gaussian model and
+多元高斯模型和
+
+107
+00:03:42,400 --> 00:03:44,030
+this original one. And in
+这种原始模型。而且
+
+108
+00:03:44,110 --> 00:03:45,420
+particular, it turns out
+特别是,它原来
+
+109
+00:03:45,660 --> 00:03:47,500
+that the original model corresponds
+原模型对应
+
+110
+00:03:48,440 --> 00:03:50,330
+to multivariate Gaussians, where
+以多变量高斯,其中
+
+111
+00:03:50,660 --> 00:03:51,980
+the contours of the
+的轮廓
+
+112
+00:03:52,040 --> 00:03:54,060
+Gaussian are always axis aligned.
+高斯总是轴线对齐。
+
+113
+00:03:55,410 --> 00:03:57,350
+So all three of
+因此,所有三个
+
+114
+00:03:57,470 --> 00:03:59,390
+these are examples of
+这些样本
+
+115
+00:03:59,510 --> 00:04:01,300
+Gaussian distributions that you
+高斯分布,你
+
+116
+00:04:01,480 --> 00:04:02,930
+can fit using the original model.
+可以适合使用原始模型。
+
+117
+00:04:03,190 --> 00:04:04,090
+It turns out that that corresponds
+原来,对应
+
+118
+00:04:05,040 --> 00:04:06,920
+to multivariate Gaussian, where, you
+以多元高斯,在那里,你
+
+119
+00:04:07,300 --> 00:04:09,830
+know, the ellipsis here, the contours
+知道,这里的省略号,轮廓
+
+120
+00:04:10,730 --> 00:04:13,600
+of this distribution--it
+这种分布的 - 它
+
+121
+00:04:13,800 --> 00:04:15,190
+turns out that this model actually
+事实证明,这种模式实际上
+
+122
+00:04:15,470 --> 00:04:17,030
+corresponds to a special
+对应于一个特殊的
+
+123
+00:04:17,490 --> 00:04:19,160
+case of a multivariate Gaussian distribution.
+情况下的多元高斯分布。
+
+124
+00:04:19,740 --> 00:04:21,110
+And in particular, this special
+特别是,这个特殊的
+
+125
+00:04:21,410 --> 00:04:22,930
+case is defined by constraining
+例子通过约束定义
+
+126
+00:04:24,460 --> 00:04:25,710
+the distribution of p
+p(x)分布
+
+127
+00:04:25,880 --> 00:04:27,110
+of x, the multivariate a Gaussian
+,多元高斯
+
+128
+00:04:27,270 --> 00:04:28,070
+distribution of p of x,
+分布的p(x),
+
+129
+00:04:28,980 --> 00:04:30,740
+so that the contours of
+这个
+
+130
+00:04:30,920 --> 00:04:32,340
+the probability density function, of
+概率密度函数的轮廓,
+
+131
+00:04:32,440 --> 00:04:35,010
+the probability distribution function, are axis aligned.
+这个概率分布函数,是轴对齐。
+
+132
+00:04:35,700 --> 00:04:37,400
+And so you can get a p
+所以你可以得到p(x)
+
+133
+00:04:37,940 --> 00:04:39,550
+of x with a
+是一个
+
+134
+00:04:39,860 --> 00:04:41,430
+multivariate Gaussian that looks like
+多元高斯分布,看起来像
+
+135
+00:04:41,630 --> 00:04:43,850
+this, or like this, or like this.
+这样,或者这样,或者像这样。
+
+136
+00:04:44,050 --> 00:04:44,990
+And you notice, that in all
+而且你注意到,在所有
+
+137
+00:04:45,210 --> 00:04:47,820
+3 of these examples, these ellipses,
+第三个样本中,这些椭圆
+
+138
+00:04:48,740 --> 00:04:50,490
+or these ovals that I'm drawing, have
+或者,这些椭圆形的我画,有
+
+139
+00:04:50,690 --> 00:04:53,190
+their axes aligned with the X1 X2 axes.
+其轴线对准X1 X2轴。
+
+140
+00:04:54,260 --> 00:04:55,920
+And what we do not have, is
+而我们没有,是
+
+141
+00:04:56,200 --> 00:04:57,310
+a set of contours
+一组轮廓
+
+142
+00:04:58,050 --> 00:05:00,450
+that are at an angle, right?
+这是一个角度,对不对?
+
+143
+00:05:00,790 --> 00:05:02,620
+And this corresponded to examples where
+与此相对应的样本在那里
+
+144
+00:05:02,790 --> 00:05:06,780
+sigma is equal to 1 1, 0.8, 0.8.
+sigma等于1 1,0.8,0.8。
+
+145
+00:05:06,840 --> 00:05:08,790
+Let's say, with non-0 elements on the
+比方说,对非0元素
+
+146
+00:05:09,070 --> 00:05:10,780
+off diagonals.
+关闭对角线。
+
+147
+00:05:11,180 --> 00:05:11,970
+So, it turns out that
+所以,事实证明,
+
+148
+00:05:12,380 --> 00:05:13,980
+it's possible to show mathematically that
+它是可能的数学证明
+
+149
+00:05:14,260 --> 00:05:16,400
+this model actually is the
+这种模式实际上是
+
+150
+00:05:16,480 --> 00:05:18,300
+same as a multivariate Gaussian
+同样作为多元高斯
+
+151
+00:05:18,750 --> 00:05:20,570
+distribution but with a constraint.
+分布但有限制。
+
+152
+00:05:21,250 --> 00:05:24,400
+And the constraint is that the
+约束是
+
+153
+00:05:24,480 --> 00:05:26,710
+covariance matrix sigma must
+协方差矩阵sigma必须
+
+154
+00:05:27,240 --> 00:05:28,900
+have 0's on the off diagonal elements.
+有0的对非对角元素。
+
+155
+00:05:29,360 --> 00:05:30,830
+In particular, the covariance matrix sigma,
+特别地,协方差矩阵Σ和
+
+156
+00:05:31,240 --> 00:05:32,450
+this thing here, it would
+这个东西在这里,它会
+
+157
+00:05:32,550 --> 00:05:33,940
+be sigma squared 1, sigma
+被sigma平方1,sigma
+
+158
+00:05:34,190 --> 00:05:36,050
+squared 2, down to sigma
+平方2,下至sigma
+
+159
+00:05:36,350 --> 00:05:38,660
+squared n, and then
+平方N,然后
+
+160
+00:05:39,530 --> 00:05:40,550
+everything on the off diagonal
+一切都在对角线关闭
+
+161
+00:05:41,020 --> 00:05:42,210
+entries, all of these elements
+条目,所有这些元素
+
+162
+00:05:43,640 --> 00:05:45,110
+above and below the diagonal of the matrix,
+上面和下面的对角矩阵,
+
+163
+00:05:45,640 --> 00:05:46,850
+all of those are going to be zero.
+所有这些都将是零。
+
+164
+00:05:47,900 --> 00:05:49,380
+And in fact if you take
+而事实上,如果你拿
+
+165
+00:05:49,680 --> 00:05:50,980
+these values of sigma, sigma
+那些sigma的值,sigma
+
+166
+00:05:51,330 --> 00:05:52,280
+squared 1, sigma squared 2,
+平方1,sigma平方2,
+
+167
+00:05:52,320 --> 00:05:53,380
+down to sigma squared n,
+下降到sigma平方N,
+
+168
+00:05:53,930 --> 00:05:56,370
+and plug them into here, and
+并将其插入在这里,
+
+169
+00:05:56,690 --> 00:05:57,640
+you know, plug them into this
+你知道,将它们插入此
+
+170
+00:05:57,760 --> 00:05:59,580
+covariance matrix, then the
+协方差矩阵,则
+
+171
+00:05:59,990 --> 00:06:01,130
+two models are actually identical.
+两个模型实际上是相同的。
+
+172
+00:06:01,630 --> 00:06:02,560
+That is, this new model,
+也就是说,这种新的模型,
+
+173
+00:06:06,210 --> 00:06:07,530
+using a multivariate Gaussian distribution,
+使用多变量高斯分布,
+
+174
+00:06:08,820 --> 00:06:10,340
+corresponds exactly to the
+完全对应
+
+175
+00:06:10,400 --> 00:06:11,510
+old model, if the covariance
+旧的模式,如果协方差
+
+176
+00:06:12,040 --> 00:06:13,700
+matrix sigma, has only
+矩阵的标准差,只有
+
+177
+00:06:14,230 --> 00:06:15,490
+0 elements off the diagonals,
+0元折对角线,
+
+178
+00:06:15,580 --> 00:06:17,700
+and in pictures that
+并且在图片的
+
+179
+00:06:18,180 --> 00:06:19,570
+corresponds to having Gaussian distributions,
+对应于具有高斯分布,
+
+180
+00:06:20,720 --> 00:06:22,260
+where the contours of this
+其中该轮廓
+
+181
+00:06:22,950 --> 00:06:25,620
+distribution function are axis aligned.
+分布函数轴线对齐。
+
+182
+00:06:25,940 --> 00:06:28,500
+So you aren't allowed to model the correlations between the diffrent features.
+所以你不允许模型的不同特征之间的相关性。
+
+183
+00:06:30,990 --> 00:06:32,520
+So in that sense the original model
+所以在这个意义上的原始模型
+
+184
+00:06:33,030 --> 00:06:35,840
+is actually a special case of this multivariate Gaussian model.
+其实这个多元高斯模型的一个特例。
+
+185
+00:06:38,370 --> 00:06:40,370
+So when would you use each of these two models?
+你什么时候会使用这两类模型?
+
+186
+00:06:40,830 --> 00:06:41,750
+So when would you the original
+所以,当你的原始
+
+187
+00:06:42,070 --> 00:06:42,880
+model and when would you use
+模型,你会用时
+
+188
+00:06:43,040 --> 00:06:45,170
+the multivariate Gaussian model?
+多变量高斯模型?
+
+189
+00:06:52,110 --> 00:06:53,670
+The original model is probably
+原始模型可能是
+
+190
+00:06:54,240 --> 00:06:55,840
+used somewhat more often,
+使用较为频繁,
+
+191
+00:06:58,800 --> 00:07:03,160
+and whereas the multivariate Gaussian
+而多元高斯
+
+192
+00:07:03,160 --> 00:07:04,470
+distribution is used somewhat
+分布用的有点
+
+193
+00:07:04,800 --> 00:07:06,670
+less but it has the advantage of being
+少,但它具有的优势在于
+
+194
+00:07:07,000 --> 00:07:08,290
+able to capture correlations between features. So
+能够捕捉功能之间的相关性。所以
+
+195
+00:07:10,490 --> 00:07:11,600
+suppose you want to
+假设你想
+
+196
+00:07:11,770 --> 00:07:13,100
+capture anomalies where you
+捕捉异常,你
+
+197
+00:07:13,210 --> 00:07:14,430
+have different features say where
+有不同的特征如
+
+198
+00:07:14,640 --> 00:07:16,560
+features x1, x2 take
+特征为x1,x2等
+
+199
+00:07:16,790 --> 00:07:19,760
+on unusual combinations of values
+不同的值的组合。
+
+200
+00:07:20,070 --> 00:07:21,320
+so in the earlier
+因此,在早期的
+
+201
+00:07:21,730 --> 00:07:27,320
+example, we had that example where the anomaly was with the CPU load and the memory use taking on unusual combinations of values, if
+例如,我们有这样的样本,其中的异常是与CPU的负载,并采取对价值观的不同寻常的组合,内存使用,如果
+
+202
+00:07:30,240 --> 00:07:31,220
+you want to use the original
+要使用原始
+
+203
+00:07:31,490 --> 00:07:33,500
+model to capture that, then what you
+模型捕捉到这一点,那么你
+
+204
+00:07:33,650 --> 00:07:34,650
+need to do is create an
+需要做的是创建一个
+
+205
+00:07:34,790 --> 00:07:36,780
+extra feature, such as X3
+额外的特征,如X3
+
+206
+00:07:37,020 --> 00:07:40,710
+equals X1/X2, you know
+等于X1/X2,你知道
+
+207
+00:07:40,860 --> 00:07:46,480
+equals maybe the CPU load divided by the memory used, or something, and you
+也许等于CPU的负载所使用的内存,还是分了,你
+
+208
+00:07:47,910 --> 00:07:49,030
+need to create extra features
+需要创建额外的特征
+
+209
+00:07:49,550 --> 00:07:51,440
+if there's unusual combinations of values
+如果有的同的值组合
+
+210
+00:07:51,540 --> 00:07:52,930
+where X1 and X2 take
+其中X1和X2取
+
+211
+00:07:53,220 --> 00:07:54,900
+on an unusual combination of
+在一个不同的组合
+
+212
+00:07:55,000 --> 00:07:56,360
+values even though X1 by
+值,即使通过X1
+
+213
+00:07:56,530 --> 00:07:58,610
+itself and X2 by itself
+本身和X2本身
+
+214
+00:07:59,850 --> 00:08:03,530
+looks like it's taking a perfectly normal value. But if you're willing to spend the time to manually
+看起来像它采取了完全正常的值。但是,如果你愿意花时间去手动
+
+215
+00:08:04,030 --> 00:08:05,240
+create an extra feature like this,
+创建这样一个额外的特征,
+
+216
+00:08:05,920 --> 00:08:07,670
+then the original model will work
+那么原始模型将工作
+
+217
+00:08:07,890 --> 00:08:14,170
+fine. Whereas in contrast, the multivariate Gaussian model can automatically capture
+得很好。而与此相反,多元高斯模型可以自动捕获
+
+218
+00:08:14,780 --> 00:08:23,360
+correlations between different features. But the original model has some other more significant advantages, too, and one huge
+不同特征之间的相关性。但原始模型有一些其他更显著的优势,比如
+
+219
+00:08:23,740 --> 00:08:24,990
+advantage of the original model
+利用原始模型
+
+220
+00:08:28,200 --> 00:08:29,400
+is that it is computationally cheaper, and another view on this
+在于,它在计算上是便宜的,并且另一方面
+
+221
+00:08:29,650 --> 00:08:31,170
+is that is scales better to
+是尺度比较好
+
+222
+00:08:31,290 --> 00:08:32,720
+very large values of n
+当n是个非常大的值
+
+223
+00:08:32,800 --> 00:08:34,270
+and very large numbers of
+和非常大的数字
+
+224
+00:08:34,460 --> 00:08:36,260
+features, and so even
+特征,并且因此即使
+
+225
+00:08:36,510 --> 00:08:38,090
+if n were ten thousand,
+如果n分别为10000,
+
+226
+00:08:39,470 --> 00:08:40,350
+or even if n were equal
+甚至如果n是等于
+
+227
+00:08:40,990 --> 00:08:42,600
+to a hundred thousand, the original
+十万,
+
+228
+00:08:42,820 --> 00:08:47,120
+model will usually work just fine.
+原始模型通常会工作得很好。
+
+229
+00:08:47,940 --> 00:08:48,930
+Whereas in contrast for the multivariate Gaussian model notice here, for
+而与此相反,多元高斯模型在这里,
+
+230
+00:08:49,070 --> 00:08:49,940
+example, that we need to
+例如,我们需要
+
+231
+00:08:50,440 --> 00:08:51,730
+compute the inverse of the matrix
+计算该矩阵的逆矩阵
+
+232
+00:08:52,110 --> 00:08:53,760
+sigma where sigma is
+sigma是
+
+233
+00:08:54,100 --> 00:08:55,230
+an n by n matrix
+一个n×n矩阵
+
+234
+00:08:56,300 --> 00:08:57,830
+and so computing sigma if
+所以要计算出sigma,如果
+
+235
+00:08:58,160 --> 00:08:59,950
+sigma is a hundred thousand by a
+sigma是一个十万了
+
+236
+00:09:00,190 --> 00:09:02,910
+hundred thousand matrix that is going to be very computationally expensive.
+十万矩阵,该矩阵将是非常耗时的计算。
+
+237
+00:09:03,440 --> 00:09:04,650
+And so the multivariate Gaussian model
+这样一来,多元高斯模型
+
+238
+00:09:05,350 --> 00:09:06,900
+scales less well to large
+尺度不那么大
+
+239
+00:09:07,180 --> 00:09:09,210
+values of N. And
+N的和值
+
+240
+00:09:09,490 --> 00:09:11,030
+finally for the original
+最后,关于
+
+241
+00:09:11,250 --> 00:09:12,630
+model, it turns out
+原始模型,事实证明
+
+242
+00:09:12,770 --> 00:09:13,850
+to work out ok even if
+就ok了工作,即使
+
+243
+00:09:14,090 --> 00:09:15,520
+you have a relatively small training
+你有一个比较小的训练
+
+244
+00:09:15,960 --> 00:09:17,010
+set this is the small unlabeled
+集这是小的未标记
+
+245
+00:09:17,410 --> 00:09:19,130
+examples that we use to model p of x
+样本我们用来对模型P(x)来建模
+
+246
+00:09:20,410 --> 00:09:21,600
+of course, and this works fine, even if
+当然,这工作得很好,即使
+
+247
+00:09:21,720 --> 00:09:23,400
+M is, you
+M的值是,你
+
+248
+00:09:24,530 --> 00:09:25,150
+know, maybe 50, 100, works fine.
+知道,也许50,100,工作得很好。
+
+249
+00:09:25,860 --> 00:09:27,740
+Whereas for the multivariate Gaussian, it
+而对于多变量高斯分布,它
+
+250
+00:09:27,890 --> 00:09:29,340
+is sort of a mathematical property
+是一种数学性质
+
+251
+00:09:29,980 --> 00:09:31,230
+of the algorithm that you must have m
+你必须有m
+
+252
+00:09:32,100 --> 00:09:38,810
+greater than n, so that the number of examples is greater than the number of features you have. And there's a mathematical property of the way we estimate the parameters
+大于n,因此,样本的数目是大于的特征数的。还有的,我们估计参数的方法的数学属性
+
+253
+00:09:41,840 --> 00:09:43,850
+that if this is not true, so if m is less than or equal to n,
+如果这是不正确的,因此,如果m小于或等于n,
+
+254
+00:09:44,730 --> 00:09:51,610
+then this matrix isn't even invertible, that is this matrix is singular, and so you can't even use the
+那么这个矩阵甚至不是可逆的,即该矩阵是奇异的,所以你甚至不能使用
+
+255
+00:09:51,810 --> 00:09:53,230
+multivariate Gaussian model unless you make some changes to it. But a
+多元高斯模型,除非你做一些修改。但
+
+256
+00:09:54,630 --> 00:09:55,820
+typical rule of thumb
+这个典型的规则
+
+257
+00:09:56,030 --> 00:09:58,760
+that I use is, I will use the
+我用的就是,我将使用
+
+258
+00:09:58,860 --> 00:10:00,500
+multivariate Gaussian model only if m is much greater than n, so this is sort of the
+多元高斯模型仅当m是大于n大得多,所以这是一种
+
+259
+00:10:04,050 --> 00:10:05,750
+narrow mathematical requirement, but
+狭隘的数学要求,但
+
+260
+00:10:05,900 --> 00:10:07,300
+in practice, I would use
+在实践中,我会用
+
+261
+00:10:07,480 --> 00:10:08,910
+the multivariate Gaussian model, only
+多变量高斯模型,只
+
+262
+00:10:09,280 --> 00:10:10,420
+if m were quite a bit
+如果m是有点点
+
+263
+00:10:10,750 --> 00:10:11,870
+bigger than n. So if
+大于n大。所以,如果
+
+264
+00:10:12,040 --> 00:10:13,320
+m were greater than or
+m大于或
+
+265
+00:10:13,470 --> 00:10:14,780
+equal to 10 times n, let's
+等于10 n次,让我们
+
+266
+00:10:14,990 --> 00:10:16,460
+say, might be a reasonable rule of thumb, and if
+比方说,可能还是一个合理的经验法则,如果
+
+267
+00:10:18,970 --> 00:10:20,890
+it doesn't satisfy this, then the multivariate Gaussian model
+它不满足这一点,那么多元高斯模型
+
+268
+00:10:21,300 --> 00:10:23,320
+has a lot
+有很多
+
+269
+00:10:23,700 --> 00:10:25,850
+of parameters, right, so this covariance matrix sigma is an n by n matrix,
+的参数,所以这个协方差矩阵sigma是一个n×n矩阵,
+
+270
+00:10:26,510 --> 00:10:27,590
+so it has, you know, roughly
+所以,你知道的,大概
+
+271
+00:10:27,820 --> 00:10:31,230
+n squared parameters, because it's a symmetric matrix,
+n平方的参数,因为它是一个对称矩阵,
+
+272
+00:10:31,710 --> 00:10:33,040
+it's actually closer to n squared
+它实际上更接近到n的平方
+
+273
+00:10:33,430 --> 00:10:35,230
+over 2 parameters, but this is a
+以上2个参数,但是这是一个
+
+274
+00:10:35,670 --> 00:10:37,220
+lot of parameters, so you need
+很多的参数,所以你需要
+
+275
+00:10:37,600 --> 00:10:38,720
+make sure you have a fairly
+确保你有一个相当
+
+276
+00:10:38,930 --> 00:10:48,350
+large value for m, make sure you have enough data to fit all these parameters. And m greater than
+大的值给m,请确保您有足够的数据,以适应所有这些参数。m大于
+
+277
+00:10:49,010 --> 00:10:52,220
+or equal to 10 n would be a reasonable rule of thumb to make sure that you can estimate this covariance matrix sigma reasonably well.
+或等于10 N将是一个经验合理的规则,以确保您能够估计这个协方差矩阵sigma还算不错。
+
+278
+00:10:55,090 --> 00:10:56,240
+So in practice the original
+因此在实践中,原始模型
+
+279
+00:10:56,750 --> 00:10:58,940
+model shown on the left that is used more often.
+显示上左边的更常用。
+
+280
+00:10:59,520 --> 00:11:00,840
+And if you suspect that you
+如果您怀疑自己
+
+281
+00:11:01,060 --> 00:11:02,680
+need to capture correlations between features
+需要捕捉特征之间的相关性
+
+282
+00:11:03,450 --> 00:11:08,150
+what people will often do is just manually design extra features like these to capture specific
+人们会经常做的只是手动设计像这些额外的特征来捕捉特定
+
+283
+00:11:08,780 --> 00:11:13,020
+unusual combinations of values. But in problems where you
+值的不同寻常的组合。但问题在那里你
+
+284
+00:11:13,120 --> 00:11:15,310
+have a very large training set or m is very large and n is
+有一个非常大的训练集或m是非常大的,n是
+
+285
+00:11:17,700 --> 00:11:20,160
+not too large, then the multivariate Gaussian
+不要过大,那么多元高斯
+
+286
+00:11:20,520 --> 00:11:21,720
+model is well worth considering and may work better as well, and can
+模型是非常值得考虑,可能更好地工作为好,并能
+
+287
+00:11:24,360 --> 00:11:25,960
+save you from having to
+节省您不必
+
+288
+00:11:26,070 --> 00:11:27,400
+spend your time to manually
+花你的时间来手动
+
+289
+00:11:28,070 --> 00:11:30,350
+create extra features in case
+在情况下创建额外的特征
+
+290
+00:11:31,380 --> 00:11:33,520
+the anomalies turn out to be captured by unusual
+异常变成了异常被捕获
+
+291
+00:11:34,040 --> 00:11:35,790
+combinations of values of the features.
+的特征值的组合。
+
+292
+00:11:37,430 --> 00:11:38,330
+Finally I just want to
+最后,我只是想
+
+293
+00:11:38,600 --> 00:11:40,220
+briefly mention one somewhat technical
+简要地提一个有点技术
+
+294
+00:11:40,770 --> 00:11:42,200
+property, but if you're
+属性,但如果你
+
+295
+00:11:42,370 --> 00:11:43,210
+fitting multivariate Gaussian
+拟合多元高斯
+
+296
+00:11:43,690 --> 00:11:44,590
+model, and if you find
+模型,如果你发现
+
+297
+00:11:44,910 --> 00:11:46,340
+that the covariance matrix sigma
+该协方差矩阵sigma
+
+298
+00:11:47,150 --> 00:11:48,160
+is singular, or you find
+是奇异的,或您发现
+
+299
+00:11:48,340 --> 00:11:51,340
+it's non-invertible, they're usually 2 cases for this.
+它是不可逆的,他们通常是两种情况。
+
+300
+00:11:51,680 --> 00:11:52,990
+One is if it's failing to
+其一是如果它未能
+
+301
+00:11:53,100 --> 00:11:54,270
+satisfy this m greater than
+满足这个m大于
+
+302
+00:11:54,640 --> 00:11:56,200
+n condition, and the
+n的条件,以及
+
+303
+00:11:56,570 --> 00:11:58,970
+second case is if you have redundant features.
+第二种情况是,如果你有多余的特征。
+
+304
+00:11:59,570 --> 00:12:00,980
+So by redundant features, I mean,
+因此,通过多余的特征,我的意思是,
+
+305
+00:12:01,520 --> 00:12:02,760
+if you have 2 features that are the same.
+如果你有2个特征是相同的。
+
+306
+00:12:02,980 --> 00:12:04,700
+Somehow you accidentally made two
+不知怎的,你不小心做了两个
+
+307
+00:12:04,830 --> 00:12:11,220
+copies of the feature, so your x1 is just equal to x2. Or if you have redundant features like maybe
+该特征的拷贝,所以你的X1正好等于2倍。或者如果你有多余的特征,如可能
+
+308
+00:12:12,860 --> 00:12:14,920
+your features X3 is equal to feature X4, plus feature X5.
+你的特征X3是等于X4的特征,再加上特征X5。
+
+309
+00:12:15,870 --> 00:12:16,960
+Okay, so if you have highly
+好了,所以如果你有高度
+
+310
+00:12:17,250 --> 00:12:18,500
+redundant features like these, you
+多余的特征,如这些,你
+
+311
+00:12:18,680 --> 00:12:20,110
+know, where if X3 is equal
+知道,其中如果X 3是相等
+
+312
+00:12:20,380 --> 00:12:21,840
+to X4 plus X5, well X3
+到X4加X5,X3以及
+
+313
+00:12:22,350 --> 00:12:24,420
+doesn't contain any extra information, right?
+不包含任何额外的信息,对不对?
+
+314
+00:12:24,590 --> 00:12:26,460
+You just take these 2 other features, and add them together.
+你只需要这2个其他的特征,并把它们相加。
+
+315
+00:12:27,590 --> 00:12:28,900
+And if you have this
+如果你有这样的
+
+316
+00:12:29,030 --> 00:12:30,960
+sort of redundant features, duplicated features,
+多余特征,重复特征,
+
+317
+00:12:31,520 --> 00:12:34,030
+or this sort of features, than sigma may be non-invertible.
+或这样的特征,sigma可以是不可逆的。
+
+318
+00:12:35,060 --> 00:12:37,000
+And so there's a debugging set--
+所以有一个调试设置 -
+
+319
+00:12:37,340 --> 00:12:38,270
+this should very rarely happen,
+这应该很少发生,
+
+320
+00:12:38,750 --> 00:12:40,190
+so you probably won't run into this,
+所以你可能不会遇到这个,
+
+321
+00:12:40,250 --> 00:12:42,780
+it is very unlikely that you have to worry about this--
+它是不太可能,你不必担心这一点 -
+
+322
+00:12:42,940 --> 00:12:44,480
+but in case you implement a
+但如果你实现一个
+
+323
+00:12:44,780 --> 00:12:46,850
+multivariate Gaussian model you find that sigma is non-invertible.
+多元高斯模型,你发现,sigma是不可逆的。
+
+324
+00:12:48,240 --> 00:12:49,350
+What I would do is first
+我会做的是第一
+
+325
+00:12:49,880 --> 00:12:51,300
+make sure that M is quite
+确保M比
+
+326
+00:12:51,540 --> 00:12:53,520
+a bit bigger than N, and if it
+N大很多,如果它
+
+327
+00:12:53,670 --> 00:12:54,640
+is then, the second thing I
+然后,第二件事情我
+
+328
+00:12:54,760 --> 00:12:56,560
+do, is just check for redundant features.
+这样做,只是检查多余特征。
+
+329
+00:12:57,360 --> 00:12:58,070
+And so if there are 2 features
+所以,如果有2个特点
+
+330
+00:12:58,150 --> 00:12:59,360
+that are equal, just get rid
+这是相等的,只是去掉
+
+331
+00:12:59,480 --> 00:13:00,590
+of one of them, or if
+其中之一,或者如果
+
+332
+00:13:00,970 --> 00:13:02,580
+you have redundant if these
+你有多余的,如果这些
+
+333
+00:13:02,880 --> 00:13:04,100
+, X3 equals X4 plus X5,
+,X3等于X4加X5,
+
+334
+00:13:04,490 --> 00:13:05,160
+just get rid of the redundant
+刚刚去掉多余的
+
+335
+00:13:05,720 --> 00:13:08,650
+feature, and then it should work fine again.
+特征,那么它应该正常工作。
+
+336
+00:13:08,840 --> 00:13:09,610
+As an aside for those of you
+顺便说一句,对于那些你们
+
+337
+00:13:09,840 --> 00:13:11,210
+who are experts in linear algebra,
+线性代数专家,
+
+338
+00:13:11,810 --> 00:13:13,280
+by redundant features, what I
+通过多余的特征,我
+
+339
+00:13:13,410 --> 00:13:14,970
+mean is the formal term is
+意思是先前的方程的
+
+340
+00:13:15,300 --> 00:13:17,680
+features that are linearly dependent.
+特征是线性相关的。
+
+341
+00:13:18,460 --> 00:13:19,180
+But in practice what that really means
+但在实践中是什么真正含义
+
+342
+00:13:19,620 --> 00:13:21,710
+is one of these problems tripping
+是这些问题的难为
+
+343
+00:13:22,040 --> 00:13:24,130
+up the algorithm if you just make you features non-redundant.,
+你的算法,如果你只是让你的特征非多余。,
+
+344
+00:13:24,790 --> 00:13:26,390
+that should solve the problem of sigma being non-invertable.
+应该解决的sigma是不可逆的问题。
+
+345
+00:13:26,720 --> 00:13:29,100
+But once again
+但再次
+
+346
+00:13:29,530 --> 00:13:30,630
+the odds of your running into this
+你的遇到这个的概率
+
+347
+00:13:30,850 --> 00:13:32,190
+at all are pretty low so
+在所有都相当低,因此
+
+348
+00:13:32,540 --> 00:13:33,800
+chances are, you can
+这样的机会,你可以
+
+349
+00:13:34,130 --> 00:13:35,460
+just apply the multivariate Gaussian
+只是采用了多元高斯
+
+350
+00:13:35,990 --> 00:13:37,240
+model, without having to
+模型中,而不必
+
+351
+00:13:37,450 --> 00:13:39,150
+worry about sigma being non-invertible,
+担心sigma是不可逆的,
+
+352
+00:13:40,090 --> 00:13:41,180
+so long as m is greater
+只要m为大于
+
+353
+00:13:41,470 --> 00:13:42,780
+than or equal to n. So
+小于或等于n。所以
+
+354
+00:13:43,200 --> 00:13:45,180
+that's it for anomaly detection,
+这是它的异常检测,
+
+355
+00:13:45,810 --> 00:13:47,230
+with the multivariate Gaussian distribution.
+用多元高斯分布。
+
+356
+00:13:48,220 --> 00:13:49,240
+And if you apply this method
+如果你将这个方法
+
+357
+00:13:49,950 --> 00:13:51,160
+you would be able to have an
+你将能够有一个
+
+358
+00:13:51,310 --> 00:13:53,250
+anomaly detection algorithm that automatically
+异常检测算法,自动
+
+359
+00:13:54,010 --> 00:13:55,430
+captures positive and negative
+捕获正和负
+
+360
+00:13:55,610 --> 00:13:56,690
+correlations between your different
+您的不同特征之间的相关性
+
+361
+00:13:57,030 --> 00:13:58,520
+features and flags an anomaly
+和标志异常
+
+362
+00:13:59,450 --> 00:14:00,820
+if it sees is unusual combination
+如果它认为是不寻常的组合
+
+363
+00:14:01,630 --> 00:14:02,770
+of the values of the features.
+的特征的值。
+
diff --git a/srt/16 - 1 - Problem Formulation (8 min).srt b/srt/16 - 1 - Problem Formulation (8 min).srt
new file mode 100644
index 00000000..bf7d9b84
--- /dev/null
+++ b/srt/16 - 1 - Problem Formulation (8 min).srt
@@ -0,0 +1,1207 @@
+1
+00:00:00,080 --> 00:00:01,060
+In this next set of
+在接下来的视频中,(字幕翻译:仇利克,中国海洋大学)
+
+2
+00:00:01,180 --> 00:00:01,970
+videos, I would like to
+我想
+
+3
+00:00:02,300 --> 00:00:03,700
+tell you about recommender systems.
+讲一下推荐系统。
+
+4
+00:00:04,730 --> 00:00:05,810
+There are two reasons, I had
+有两个原因,我想讲
+
+5
+00:00:05,940 --> 00:00:08,590
+two motivations for why I wanted to talk about recommender systems.
+推荐系统有两个原因
+
+6
+00:00:09,640 --> 00:00:10,670
+The first is just that it
+第一,仅仅因为它是
+
+7
+00:00:10,830 --> 00:00:13,830
+is an important application of machine learning.
+机器学习中的一个重要的应用。
+
+8
+00:00:14,160 --> 00:00:15,680
+Over the last few years, occasionally I
+在过去几年,我偶尔
+
+9
+00:00:15,810 --> 00:00:16,840
+visit different, you know, technology
+访问硅谷不同的技术公司,
+
+10
+00:00:17,510 --> 00:00:18,720
+companies here in Silicon Valley
+
+11
+00:00:18,820 --> 00:00:20,040
+and I often talk to people
+我常和工作在这儿
+
+12
+00:00:20,390 --> 00:00:21,270
+working on machine learning applications there
+致力于机器学习应用的人们聊天,
+
+13
+00:00:21,370 --> 00:00:23,010
+and so I've asked
+我常问
+
+14
+00:00:23,200 --> 00:00:24,120
+people what are the most
+他们,最重要的
+
+15
+00:00:24,260 --> 00:00:26,840
+important applications of machine
+机器学习的应用是什么,
+
+16
+00:00:27,450 --> 00:00:28,530
+learning or what are the machine
+或者,你最想改进的
+
+17
+00:00:28,550 --> 00:00:29,520
+learning applications that you would most like to get
+机器学习应用
+
+18
+00:00:29,790 --> 00:00:31,130
+an improvement in the performance of.
+有哪些。
+
+19
+00:00:31,890 --> 00:00:32,690
+And one of the most frequent
+我最常听到的
+
+20
+00:00:33,020 --> 00:00:34,240
+answers I heard was that
+答案是
+
+21
+00:00:34,590 --> 00:00:35,710
+there are many groups out in Silicon
+现在,在硅谷有很多团体
+
+22
+00:00:36,020 --> 00:00:38,250
+Valley now, trying to build better recommender systems.
+试图建立很好的推荐系统。
+
+23
+00:00:39,570 --> 00:00:40,460
+So, if you think about
+因此,如果你考虑
+
+24
+00:00:40,800 --> 00:00:42,390
+what the websites are
+网站是什么
+
+25
+00:00:42,540 --> 00:00:44,100
+like Amazon, or what Netflix
+像亚马逊,或网飞公司
+
+26
+00:00:44,840 --> 00:00:46,100
+or what eBay, or what
+或易趣,或
+
+27
+00:00:46,830 --> 00:00:48,230
+iTunes Genius, made by Apple
+iTunes Genius,苹果开发的,
+
+28
+00:00:48,480 --> 00:00:49,450
+does, there are many websites
+有很多的网站
+
+29
+00:00:50,050 --> 00:00:51,520
+or systems that try to
+或系统试图
+
+30
+00:00:51,670 --> 00:00:53,140
+recommend new products to use.
+推荐新产品给用户。
+
+31
+00:00:53,360 --> 00:00:54,380
+So, Amazon recommends new books
+因此,亚马逊推荐新书
+
+32
+00:00:54,630 --> 00:00:55,840
+to you, Netflix try to recommend
+给你,网飞公司试图推荐
+
+33
+00:00:56,230 --> 00:00:58,090
+new movies to you, and so on.
+新电影给你,等等。
+
+34
+00:00:58,430 --> 00:00:59,560
+And these sorts of recommender systems,
+这些推荐系统,
+
+35
+00:01:00,130 --> 00:01:01,480
+that look at what books you
+浏览你过去
+
+36
+00:01:01,560 --> 00:01:02,430
+may have purchased in the past,
+买过什么书,
+
+37
+00:01:02,890 --> 00:01:03,820
+or what movies you have rated
+或过去评价过什么电影,
+
+38
+00:01:04,010 --> 00:01:05,100
+in the past, but these are
+但是,
+
+39
+00:01:05,140 --> 00:01:06,390
+the systems that are responsible
+这些系统会带来
+
+40
+00:01:07,470 --> 00:01:09,410
+for today, a substantial fraction of
+很大一部分收入
+
+41
+00:01:09,620 --> 00:01:10,630
+Amazon's revenue and for a
+为亚马逊和
+
+42
+00:01:10,710 --> 00:01:12,520
+company like Netflix, the recommendations
+像网飞这样的公司,
+
+43
+00:01:13,950 --> 00:01:14,710
+that they make to the users
+给用户推荐的电影
+
+44
+00:01:15,180 --> 00:01:16,610
+is also responsible for a
+也占据了
+
+45
+00:01:16,830 --> 00:01:18,250
+substantial fraction of the movies
+用户所看电影的一大部分。
+
+46
+00:01:18,520 --> 00:01:20,700
+watched by their users.
+And so an
+因此,
+
+47
+00:01:20,780 --> 00:01:22,410
+improvement in performance of
+对推荐系统性能的改善
+
+48
+00:01:22,520 --> 00:01:24,070
+a recommender system can have
+
+49
+00:01:24,680 --> 00:01:26,340
+a substantial and immediate
+将对这些企业的
+
+50
+00:01:26,880 --> 00:01:28,010
+impact on the bottom line of
+底线有实质性和直接的影响。
+
+51
+00:01:28,370 --> 00:01:31,380
+many of these
+
+52
+00:01:31,710 --> 00:01:32,660
+companies. Recommender systems is kind of a funny
+推荐系统是个有趣的问题,
+
+53
+00:01:32,870 --> 00:01:34,530
+problem, within academic machine
+在学术机器学习中
+
+54
+00:01:34,870 --> 00:01:35,890
+learning so that we could
+因此,我们
+
+55
+00:01:35,980 --> 00:01:37,230
+go to an academic machine learning conference,
+可以去参加一个学术机器学习会议,
+
+56
+00:01:38,430 --> 00:01:39,950
+the problem of recommender systems,
+推荐系统问题
+
+57
+00:01:40,190 --> 00:01:41,560
+actually receives relatively little attention,
+实际上受到很少的关注,
+
+58
+00:01:42,430 --> 00:01:43,680
+or at least it's sort of a smaller
+或者,至少在学术界
+
+59
+00:01:43,960 --> 00:01:46,200
+fraction of what goes on within Academia.
+它占了很小的份额。
+
+60
+00:01:47,140 --> 00:01:48,010
+But if you look at what's happening,
+但是,如果你看正在发生的事情,
+
+61
+00:01:48,570 --> 00:01:50,200
+many technology companies, the ability
+许多有能力构建这些系统的科技企业,
+
+62
+00:01:50,700 --> 00:01:53,500
+to build these systems seems to be a high priority for many companies.
+他们似乎在很多企业中占据很高的优先级。
+
+63
+00:01:54,430 --> 00:01:56,460
+And that's one of the reasons why I want to talk about them in this class.
+这是我为什么在这节课讨论它的原因之一。
+
+64
+00:01:58,280 --> 00:01:59,420
+The second reason that I
+我想讨论推荐系统地第二个原因是
+
+65
+00:01:59,520 --> 00:02:00,570
+want to talk about recommender systems
+
+66
+00:02:01,170 --> 00:02:02,460
+is that as we approach
+
+67
+00:02:02,910 --> 00:02:04,080
+the last few sets of videos
+这个班视频的最后几集
+
+68
+00:02:05,120 --> 00:02:06,300
+of this class I wanted to talk about
+我想讨论
+
+69
+00:02:06,700 --> 00:02:07,740
+a few of the big ideas
+机器学习中的一些大思想
+
+70
+00:02:08,410 --> 00:02:09,410
+in machine learning and share with you,
+并和大家分享,
+
+71
+00:02:09,510 --> 00:02:11,560
+you know, some of the big ideas in machine learning.
+你知道的,机器学习中的一些大思想。
+
+72
+00:02:12,400 --> 00:02:13,840
+And we've already seen
+这节课我们也看到了,
+
+73
+00:02:14,070 --> 00:02:15,870
+in this class that features are
+对机器学习来说,特征是
+
+74
+00:02:15,990 --> 00:02:17,000
+important for machine learning, the
+很重要的,
+
+75
+00:02:17,640 --> 00:02:19,170
+features you choose will have
+你所选择的特征
+
+76
+00:02:19,400 --> 00:02:22,340
+a big effect on the performance of your learning algorithm.
+将对你学习算法的性能有很大的影响。
+
+77
+00:02:23,290 --> 00:02:24,320
+So there's this big idea in machine
+因此,在机器学习中有一种大思想,
+
+78
+00:02:24,620 --> 00:02:25,890
+learning, which is that for
+它针对一些问题,
+
+79
+00:02:25,990 --> 00:02:27,630
+some problems, maybe not
+可能并不是
+
+80
+00:02:27,790 --> 00:02:29,690
+all problems, but some problems, there
+所有的问题,而是一些问题,
+
+81
+00:02:29,910 --> 00:02:31,610
+are algorithms that can try
+有算法可以为你
+
+82
+00:02:31,950 --> 00:02:34,860
+to automatically learn a good set of features for you.
+自动学习一套好的特征。
+
+83
+00:02:35,210 --> 00:02:35,970
+So rather than trying to hand
+因此,不要试图手动设计,
+
+84
+00:02:36,660 --> 00:02:37,840
+design, or hand code the
+或手写代码
+
+85
+00:02:38,100 --> 00:02:39,120
+features, which is mostly what we've
+这是目前为止我们常干的,
+
+86
+00:02:39,340 --> 00:02:40,350
+been doing so far, there are a
+
+87
+00:02:40,430 --> 00:02:41,790
+few settings where you might
+有一些设置,你可以
+
+88
+00:02:42,050 --> 00:02:42,650
+be able to have an
+有一个算法,
+
+89
+00:02:42,770 --> 00:02:43,780
+algorithm, just to learn what feature to
+仅仅学习其使用的特征,
+
+90
+00:02:43,920 --> 00:02:45,200
+use, and the recommender
+推荐系统
+
+91
+00:02:45,580 --> 00:02:47,690
+systems is just one example of that sort of setting.
+就是类型设置的一个例子。
+
+92
+00:02:47,880 --> 00:02:49,250
+There are many others, but engraved
+还有很多其它的,但是
+
+93
+00:02:49,690 --> 00:02:51,150
+through recommender systems, will be
+通过推荐系统,我们将
+
+94
+00:02:51,320 --> 00:02:52,770
+able to go a little
+领略一小部分
+
+95
+00:02:53,090 --> 00:02:54,380
+bit into this idea of learning
+特征学习的思想,
+
+96
+00:02:54,710 --> 00:02:56,450
+the features and you'll be
+至少,你将能够
+
+97
+00:02:56,540 --> 00:02:57,320
+able to see at least one example
+了解到这方面的一个例子,
+
+98
+00:02:58,170 --> 00:03:00,120
+of this, I think, big idea in machine learning as well.
+我认为,机器学习中的大思想也是这样。
+
+99
+00:03:01,220 --> 00:03:02,800
+So, without further ado, let's
+因此,让我们
+
+100
+00:03:02,990 --> 00:03:04,220
+get started, and talk
+开始讨论
+
+101
+00:03:04,400 --> 00:03:06,120
+about the recommender system problem formulation.
+推荐系统问题公式化。
+
+102
+00:03:08,110 --> 00:03:09,690
+As my running example, I'm
+接下来的例子,我将
+
+103
+00:03:09,870 --> 00:03:11,210
+going to use the
+用
+
+104
+00:03:11,390 --> 00:03:13,230
+modern problem of predicting movie ratings.
+电影评级预测现代问题。
+
+105
+00:03:14,150 --> 00:03:14,640
+So, here's a problem.
+因此,这是一个问题。
+
+106
+00:03:15,100 --> 00:03:16,520
+Imagine that you're a
+假设你是一个
+
+107
+00:03:16,660 --> 00:03:18,140
+website or a company that
+网站或者公司
+
+108
+00:03:18,910 --> 00:03:21,340
+sells or rents out movies, or what have you.
+出售或者出租电影,或者诸如此类。
+
+109
+00:03:21,560 --> 00:03:22,880
+And so, you know, Amazon, and Netflix, and
+因此,你知道,亚马逊、网飞公司和
+
+110
+00:03:23,610 --> 00:03:24,930
+I think iTunes are all examples
+iTunes都是做这个的
+
+111
+00:03:26,540 --> 00:03:28,180
+of companies that do this,
+公司。
+
+112
+00:03:28,750 --> 00:03:30,450
+and let's say you let
+比方说,你让
+
+113
+00:03:30,930 --> 00:03:32,610
+your users rate different movies,
+你的用户评价不同的电影,
+
+114
+00:03:33,560 --> 00:03:34,130
+using a 1 to 5 star rating.
+用1到5星级评价。
+
+115
+00:03:34,560 --> 00:03:36,300
+So, users may, you know,
+因此,用户可能,你知道,
+
+116
+00:03:36,400 --> 00:03:39,070
+something one, two, three, four or five stars.
+评定一星、二星、三星、四星或五星。
+
+117
+00:03:40,420 --> 00:03:41,440
+In order to make this example
+为了让这个例子
+
+118
+00:03:41,980 --> 00:03:43,170
+just a little bit nicer, I'm
+更完善一点,我
+
+119
+00:03:43,360 --> 00:03:44,860
+going to allow 0 to
+将允许0到
+
+120
+00:03:45,180 --> 00:03:46,720
+5 stars as well,
+5星级,
+
+121
+00:03:47,300 --> 00:03:49,170
+because that just makes some of the math come out just nicer.
+这只是让数字呈现的更好一些。
+
+122
+00:03:49,360 --> 00:03:51,580
+Although most of these websites use the 1 to 5 star scale.
+虽然大多数网站使用1到5星级评价。
+
+123
+00:03:53,000 --> 00:03:54,520
+So here, I have 5 movies.
+这里,我有5部电影。
+
+124
+00:03:55,110 --> 00:03:56,600
+You know, Love That
+你们知道的,Love That
+
+125
+00:03:56,710 --> 00:03:58,050
+Lasts, Romance Forever, Cute Puppies of
+Lasts, Romance Forever, Cute Puppies of
+
+126
+00:03:58,160 --> 00:04:00,230
+Love, Nonstop Car Chases,
+Love, Nonstop Car Chases,
+
+127
+00:04:01,040 --> 00:04:03,340
+and Swords vs. Karate.
+and Swords vs. Karate.
+
+128
+00:04:03,550 --> 00:04:04,380
+And we have 4 users, which,
+我们有4个用户,
+
+129
+00:04:05,020 --> 00:04:06,190
+calling, you know, Alice, Bob, Carol,
+他们分别是Alice, Bob, Carol,
+
+130
+00:04:06,410 --> 00:04:07,610
+and Dave, with initials A, B,
+和Dave,名字首字母分别是A, B,
+
+131
+00:04:07,690 --> 00:04:09,790
+C, and D, we'll call them users 1, 2, 3, and 4.
+C和D,我们称呼他们用户1,2,3和用户4.
+
+132
+00:04:10,390 --> 00:04:11,940
+So, let's say Alice really
+比如说Alice喜欢
+
+133
+00:04:12,190 --> 00:04:13,360
+likes Love That Lasts and
+Love That Lasts并
+
+134
+00:04:13,460 --> 00:04:15,680
+rates that 5 stars, likes Romance
+给其评价5星,Romance
+
+135
+00:04:16,070 --> 00:04:17,220
+Forever, rates it 5 stars.
+Forever,评价5星。
+
+136
+00:04:18,060 --> 00:04:19,050
+She did not watch Cute Puppies
+她并没有看过Cute Puppies
+
+137
+00:04:19,370 --> 00:04:20,810
+of Love, and did rate it, so we
+of Love,没有进行评价,因此,我们
+
+138
+00:04:20,950 --> 00:04:22,190
+don't have a rating for that,
+并没有这部电影的星级评价,
+
+139
+00:04:23,120 --> 00:04:24,400
+and Alice really did not
+Alice 并不喜欢
+
+140
+00:04:24,590 --> 00:04:27,170
+like Nonstop Car Chases or
+Nonstop Car Chases或者
+
+141
+00:04:27,240 --> 00:04:29,330
+Swords vs. Karate. And a different user
+Swords vs. Karate. 另一个用户
+
+142
+00:04:29,720 --> 00:04:31,390
+Bob, user two, maybe rated
+Bob,用户2,可能评级
+
+143
+00:04:31,630 --> 00:04:32,680
+a different set of movies, maybe
+一些不同的电影,可能
+
+144
+00:04:32,850 --> 00:04:33,580
+she likes to Love at Last,
+他喜欢Love at Last,
+
+145
+00:04:34,300 --> 00:04:35,520
+did not to watch Romance Forever,
+并没有看过Romance Forever,
+
+146
+00:04:36,130 --> 00:04:37,960
+just have a rating of 4, a 0,
+仅仅评了一个4星,一个0星,
+
+147
+00:04:38,360 --> 00:04:42,530
+a 0, and maybe our 3rd user,
+可能第三个用户,
+
+148
+00:04:43,170 --> 00:04:44,280
+rates this 0, did not watch
+评价它为0星,并没有看过
+
+149
+00:04:44,550 --> 00:04:45,610
+that one, 0, 5, 5, and, you know, let's just
+那部电影,0, 5, 5,你知道的,让我们仅仅
+
+150
+00:04:45,980 --> 00:04:48,150
+fill in some of the numbers.
+用一些数字填满。
+
+151
+00:04:52,150 --> 00:04:53,910
+And so just to introduce a
+下面介绍一些
+
+152
+00:04:53,970 --> 00:04:55,090
+bit of notation, this notation
+符号,这些符号
+
+153
+00:04:55,600 --> 00:04:57,200
+that we'll be using throughout, I'm going
+我们将一直使用,我将
+
+154
+00:04:57,400 --> 00:04:59,650
+to use NU to denote the number of users.
+用NU表示用户的数量。
+
+155
+00:05:00,260 --> 00:05:02,800
+So in this example, NU will be equal to 4.
+因此,在这个例子中,NU=4。
+
+156
+00:05:03,550 --> 00:05:04,750
+So the u-subscript stands for
+u的下标表示
+
+157
+00:05:05,040 --> 00:05:07,290
+users and Nm,
+用户数,Nm
+
+158
+00:05:07,770 --> 00:05:08,900
+going to use to denote the number
+用来表示电影的数量,
+
+159
+00:05:09,090 --> 00:05:11,210
+of movies, so here I have five movies
+这里我有5部电影,
+
+160
+00:05:11,610 --> 00:05:12,960
+so Nm equals equals 5.
+因此Nm=5.
+
+161
+00:05:13,320 --> 00:05:15,320
+And you know for this example, I have
+这个例子你知道的,
+
+162
+00:05:15,950 --> 00:05:17,640
+for this example, I have loosely
+我有
+
+163
+00:05:18,920 --> 00:05:20,440
+3 maybe romantic or
+3部浪漫或
+
+164
+00:05:20,700 --> 00:05:24,020
+romantic comedy movies and 2
+浪漫喜剧和2部
+
+165
+00:05:24,260 --> 00:05:25,750
+action movies and you know, if
+动作片,
+
+166
+00:05:25,960 --> 00:05:27,460
+you look at this small example, it
+你看这个小例子,它
+
+167
+00:05:27,580 --> 00:05:29,430
+looks like Alice and Bob are
+看起来像是Alice和 Bob
+
+168
+00:05:29,550 --> 00:05:31,360
+giving high ratings to these
+评了高星级给这些
+
+169
+00:05:32,170 --> 00:05:33,650
+romantic comedies or movies
+浪漫喜剧或者关于爱情的电影,
+
+170
+00:05:33,960 --> 00:05:34,850
+about love, and giving very
+给动作片非常
+
+171
+00:05:35,140 --> 00:05:36,790
+low ratings about the action
+低的评价,
+
+172
+00:05:37,060 --> 00:05:39,470
+movies, and for Carol and Dave, it's the opposite, right?
+Carol 和 Dave,他们的评价是相反的,对吗?
+
+173
+00:05:39,620 --> 00:05:40,800
+Carol and Dave, users three
+Carol 和 Dave,用户3和4,
+
+174
+00:05:41,010 --> 00:05:42,170
+and four, really like the
+喜欢
+
+175
+00:05:42,350 --> 00:05:43,390
+action movies and give them
+动作片并给它们
+
+176
+00:05:43,490 --> 00:05:45,020
+high ratings, but don't like
+高星级,但是不喜欢
+
+177
+00:05:45,510 --> 00:05:46,910
+the romance and love-
+浪漫剧和
+
+178
+00:05:47,060 --> 00:05:48,440
+type movies as much.
+爱情剧。
+
+179
+00:05:50,330 --> 00:05:51,720
+Specifically, in the recommender system
+尤其在推荐系统问题中,
+
+180
+00:05:52,120 --> 00:05:54,170
+problem, we are given the following data.
+我们给定下面数据,
+
+181
+00:05:54,700 --> 00:05:56,230
+Our data comprises the following:
+我们的数据组成如下:
+
+182
+00:05:56,390 --> 00:05:58,780
+we have these values r(i, j), and
+我们有值r(i, j),
+
+183
+00:05:58,910 --> 00:06:00,080
+r(i, j) is 1 if user
+r(i, j)=1,如果用户
+
+184
+00:06:00,350 --> 00:06:01,580
+J has rated movie I.
+j给电影i进行了评价。
+
+185
+00:06:01,950 --> 00:06:02,920
+So our users rate only
+因此,用户仅仅给
+
+186
+00:06:03,180 --> 00:06:04,200
+some of the movies, and so,
+某些电影评价,诸如此类。
+
+187
+00:06:04,820 --> 00:06:06,050
+you know, we don't have
+你知道,我们没有
+
+188
+00:06:06,190 --> 00:06:08,140
+ratings for those movies.
+对这些电影进行评价。
+
+189
+00:06:08,310 --> 00:06:09,890
+And whenever r(i, j) is equal
+r(i, j)等于1,仅当
+
+190
+00:06:10,450 --> 00:06:11,790
+to 1, whenever user j has
+用户j
+
+191
+00:06:11,980 --> 00:06:13,150
+rated movie i, we also
+给电影i进行了评价。我们也
+
+192
+00:06:13,660 --> 00:06:15,310
+get this number y(i, j),
+得到值y(i, j),
+
+193
+00:06:16,090 --> 00:06:17,520
+which is the rating given by
+它是用户j给电影i的评级。
+
+194
+00:06:17,740 --> 00:06:18,870
+user j to movie i. And
+
+195
+00:06:19,030 --> 00:06:20,370
+so, y(i, j) would be
+因此,y(i, j)是一个
+
+196
+00:06:20,540 --> 00:06:22,890
+a number from zero to
+从0到5的数字,
+
+197
+00:06:23,090 --> 00:06:24,360
+five, depending on the star
+依赖星级评定,
+
+198
+00:06:24,790 --> 00:06:25,810
+rating, zero to five
+用户给
+
+199
+00:06:26,160 --> 00:06:28,470
+stars that user gave that particular movie.
+特定电影评价0到5,五个星级。
+
+200
+00:06:30,240 --> 00:06:31,700
+So, the recommender system problem
+因此,推荐系统问题
+
+201
+00:06:32,200 --> 00:06:33,540
+is given this data
+给出了这个数据
+
+202
+00:06:33,900 --> 00:06:35,210
+that has give these r(i, j)'s
+这些r(i, j)和
+
+203
+00:06:35,440 --> 00:06:38,540
+and the y(i, j)'s to look
+y(i, j)数据,
+
+204
+00:06:38,820 --> 00:06:40,150
+through the data and
+浏览数据
+
+205
+00:06:40,320 --> 00:06:41,700
+look at all the movie ratings that
+查找所有未被评价的电影,
+
+206
+00:06:41,860 --> 00:06:42,940
+are missing and to try
+并试图
+
+207
+00:06:43,220 --> 00:06:44,590
+to predict what these values
+预测这些电影的评价星级。
+
+208
+00:06:45,110 --> 00:06:47,290
+of the question marks should be.
+
+209
+00:06:47,520 --> 00:06:48,710
+In the particular example, I have
+在这个特殊的例子中,我有
+
+210
+00:06:48,840 --> 00:06:49,920
+a very small number of movies
+非常少的电影数量
+
+211
+00:06:50,210 --> 00:06:51,250
+and a very small number of users
+和用户数量,
+
+212
+00:06:51,620 --> 00:06:52,790
+and so most users have rated most
+因此,大多数用户都对大多数电影进行了评价。
+
+213
+00:06:53,020 --> 00:06:53,820
+movies but in the realistic
+但在现实情况中,
+
+214
+00:06:54,190 --> 00:06:55,870
+settings your users each
+
+215
+00:06:56,040 --> 00:06:57,120
+of your users may have rated
+每个用户可能仅评价
+
+216
+00:06:57,600 --> 00:06:58,940
+only a minuscule fraction of your
+你的一部分电影,
+
+217
+00:06:59,200 --> 00:07:00,170
+movies but looking at this
+但看这些数据,
+
+218
+00:07:00,310 --> 00:07:01,430
+data, you know, if Alice and Bob
+你知道的,如果Alice和Bob
+
+219
+00:07:01,730 --> 00:07:02,990
+both like the romantic movies
+都喜欢浪漫剧
+
+220
+00:07:03,740 --> 00:07:05,810
+maybe we think that Alice would have given this a five.
+我们可能认为Alice将给电影评价5星。
+
+221
+00:07:06,160 --> 00:07:07,290
+Maybe we think Bob would have
+Bob或许给电影评价
+
+222
+00:07:07,430 --> 00:07:08,570
+given this a 4.5
+4.5星
+
+223
+00:07:08,750 --> 00:07:10,560
+or some high value, as we
+或更高的星级,
+
+224
+00:07:10,690 --> 00:07:11,710
+think maybe Carol and Dave
+Carol和Dave
+
+225
+00:07:12,590 --> 00:07:15,050
+were doing these very low ratings.
+将给浪漫剧非常低的评价。
+
+226
+00:07:15,610 --> 00:07:16,520
+And Dave, well, if Dave really likes action movies,
+如果Dave真的喜欢动作片,
+
+227
+00:07:16,740 --> 00:07:17,790
+maybe he would have given
+他可能给
+
+228
+00:07:18,490 --> 00:07:19,540
+Swords and Karate a 4
+Swords and Karate评价4星
+
+229
+00:07:20,020 --> 00:07:22,080
+rating or maybe a 5 rating, okay?
+或者5星。
+
+230
+00:07:22,590 --> 00:07:23,700
+And so, our job in developing
+因此,如果我们想开发
+
+231
+00:07:24,330 --> 00:07:25,950
+a recommender system is to
+一个推荐系统,
+
+232
+00:07:26,460 --> 00:07:28,120
+come up with a learning
+那我们的工作是想出一个学习算法,
+
+233
+00:07:28,360 --> 00:07:29,440
+algorithm that can automatically
+一个能自动
+
+234
+00:07:30,380 --> 00:07:31,490
+go fill in these missing values
+为我们填补这些缺失值的算法,
+
+235
+00:07:31,880 --> 00:07:33,260
+for us so that we
+因此,我们
+
+236
+00:07:33,390 --> 00:07:34,380
+can look at, say, the
+可以看看,
+
+237
+00:07:34,490 --> 00:07:35,630
+movies that the user has
+该用户还没有看
+
+238
+00:07:35,870 --> 00:07:37,370
+not yet watched, and recommend
+电影,并推荐
+
+239
+00:07:38,230 --> 00:07:39,570
+new movies to that user to watch.
+新电影给该用户看。
+
+240
+00:07:39,860 --> 00:07:42,500
+You try to predict what else might be interesting to a user.
+你试图预测一个用户感兴趣的内容。
+
+241
+00:07:45,210 --> 00:07:47,890
+So that's the formalism of the recommender system problem.
+因此,这是推荐系统问题的形式主义。
+
+242
+00:07:49,290 --> 00:07:50,450
+In the next video we'll start
+下一段视频我们将
+
+243
+00:07:50,770 --> 00:07:53,360
+to develop a learning algorithm to address this problem.
+开发一个学习算法解决这个问题。
+
diff --git a/srt/16 - 2 - Content Based Recommendations (15 min).srt b/srt/16 - 2 - Content Based Recommendations (15 min).srt
new file mode 100644
index 00000000..ea567005
--- /dev/null
+++ b/srt/16 - 2 - Content Based Recommendations (15 min).srt
@@ -0,0 +1,2114 @@
+1
+00:00:01,370 --> 00:00:02,420
+In the last video, we talked
+在上一段视频中,我们讨论到(字幕翻译:中国海洋大学,郭帅)
+
+2
+00:00:02,740 --> 00:00:04,200
+about the recommender system problem,
+关于推荐系统的问题
+
+3
+00:00:05,030 --> 00:00:06,270
+where for example, you may
+例如,你可能
+
+4
+00:00:06,380 --> 00:00:07,810
+have a set of movies and you
+有一些电影并且你
+
+5
+00:00:07,940 --> 00:00:09,140
+may have a set of users,
+还可能有一些用户
+
+6
+00:00:09,810 --> 00:00:10,960
+each of whom has rated
+每个用户评价了
+
+7
+00:00:11,670 --> 00:00:13,170
+some subset of the movies,
+其中的一部分电影
+
+8
+00:00:13,370 --> 00:00:14,340
+rated the movies 1 to
+评价电影1星到
+
+9
+00:00:14,500 --> 00:00:15,460
+5 stars or 0 to 5
+5星或者从0星到5星
+
+10
+00:00:15,630 --> 00:00:16,830
+stars, and what I would like
+我想要
+
+11
+00:00:17,200 --> 00:00:18,170
+to do, is look at
+做的是,去看看
+
+12
+00:00:18,240 --> 00:00:19,720
+these users and predict how
+这些用户并且预测
+
+13
+00:00:19,910 --> 00:00:22,540
+they would have rated other movies that they have not yet rated.
+他们是怎样评价那些他们还没有评价的电影的
+
+14
+00:00:23,530 --> 00:00:24,540
+In this video, I would like
+在这段视频中,我想
+
+15
+00:00:24,600 --> 00:00:25,950
+to talk about our first approach
+讲一下我们的第一个方法
+
+16
+00:00:26,430 --> 00:00:28,190
+to building a recommender system, this
+来建立推荐系统
+
+17
+00:00:28,360 --> 00:00:30,100
+approach is called content based recommendations.
+这个方法被叫做基于内容的推荐
+
+18
+00:00:31,460 --> 00:00:32,690
+Here's our data set from before,
+这是我们之前的一个数据集
+
+19
+00:00:33,310 --> 00:00:34,470
+and just to remind you of a
+仅仅是为了让你回想起
+
+20
+00:00:34,550 --> 00:00:35,780
+bit of notation, I was using
+一些符号,我使用
+
+21
+00:00:36,690 --> 00:00:37,870
+Nu to denote the number
+nu 来表示用户数量
+
+22
+00:00:38,030 --> 00:00:39,110
+of users, and so that's equal
+这里它的值是4
+
+23
+00:00:39,290 --> 00:00:40,990
+to 4, and Nm
+用nm
+
+24
+00:00:41,990 --> 00:00:44,780
+to denote the number of movies, I have five movies.
+来表示电影的数量,我有5部电影
+
+25
+00:00:47,230 --> 00:00:48,140
+So, how do I predict
+所以,我要怎样预测
+
+26
+00:00:48,960 --> 00:00:50,950
+what these missing values would be?
+这些缺失的值是多少?
+
+27
+00:00:52,490 --> 00:00:53,520
+Let's suppose that for each
+我们假设对于每一部
+
+28
+00:00:53,700 --> 00:00:55,500
+of these movies, I have a
+电影,我有一个
+
+29
+00:00:55,540 --> 00:00:57,460
+set of features for them.
+它们的特征集
+
+30
+00:00:57,910 --> 00:00:58,990
+In particular, lets say that
+尤其是,我们规定
+
+31
+00:00:59,690 --> 00:01:00,850
+for each of the movies I have two features,
+对于每一个电影都有两个特征
+
+32
+00:01:01,920 --> 00:01:03,500
+which I'm going to denote X1 and
+这里用x1和x2表示
+
+33
+00:01:04,080 --> 00:01:05,700
+X2, where X1 measures the degree
+其中x1衡量
+
+34
+00:01:06,130 --> 00:01:07,450
+to which a movie is a
+一部电影为爱情片的程度
+
+35
+00:01:07,650 --> 00:01:09,270
+romantic movie and X2 measures
+用x2来衡量
+
+36
+00:01:09,810 --> 00:01:12,080
+the degree to which a movie is an action movie.
+一部电影为动作片的程度
+
+37
+00:01:12,840 --> 00:01:13,700
+So if you take a movie
+所以如果选了love at last这部电影
+
+38
+00:01:14,470 --> 00:01:16,490
+Love at last, you know,
+可以看出
+
+39
+00:01:16,800 --> 00:01:17,960
+0.9 rating on the
+0.9程度
+
+40
+00:01:18,030 --> 00:01:19,190
+romance scale, it is a
+在爱情片范围内,这是一部
+
+41
+00:01:19,260 --> 00:01:20,850
+highly romantic movie but zero on
+纯爱情片,在动作片范围内的程度是0
+
+42
+00:01:20,920 --> 00:01:22,400
+the action scale, so almost no
+所以几乎没有
+
+43
+00:01:22,520 --> 00:01:24,390
+action in that
+动作成分在这个
+
+44
+00:01:24,540 --> 00:01:25,860
+movie. Romance forever was 1.0,
+电影里。电影Romance forever程度是1.0
+
+45
+00:01:26,230 --> 00:01:27,610
+lot of romance and 0.01 action,
+有很多爱情成分并且动作成分只有0.01
+
+46
+00:01:27,860 --> 00:01:29,790
+I don't know maybe
+我不知道可能
+
+47
+00:01:30,700 --> 00:01:32,650
+there's a minor car crash in
+有一起小型车祸
+
+48
+00:01:33,630 --> 00:01:35,580
+that movie or something, so little bit of action.
+在这部电影里或者是别的,动作成分太少了。
+
+49
+00:01:35,610 --> 00:01:36,760
+Skipping one let's do
+跳过一个我们
+
+50
+00:01:37,860 --> 00:01:39,630
+Swords vs,. karate, maybe that
+来看swords vs karate这部电影,或许
+
+51
+00:01:39,870 --> 00:01:41,110
+has a zero romance rating
+它的爱情片程度为0
+
+52
+00:01:41,520 --> 00:01:42,780
+and no romance at all in that
+没有一点爱情成分
+
+53
+00:01:43,250 --> 00:01:46,040
+but plenty of action and you know, non-stop car chases.
+但是有大量动作成分,不停的上演飞车追逐
+
+54
+00:01:46,300 --> 00:01:47,120
+Maybe again there is
+或许这部电影里有
+
+55
+00:01:47,220 --> 00:01:48,390
+tiny bit of romance in
+一点点爱情成分
+
+56
+00:01:48,500 --> 00:01:49,800
+that movie, but mainly action,
+但是主要的是动作成分
+
+57
+00:01:50,460 --> 00:01:51,560
+and and Cute puppies of
+电影cute puppies of love
+
+58
+00:01:51,680 --> 00:01:52,730
+love again but mainly a romance
+又是一部爱情片,
+
+59
+00:01:53,510 --> 00:01:54,410
+movie with no action at all.
+没有动作成分
+
+60
+00:01:55,990 --> 00:01:57,150
+So if we have features
+所以如果我们有
+
+61
+00:01:57,550 --> 00:01:59,220
+like these then each movie
+一些像这样的特征,那么每部电影
+
+62
+00:01:59,800 --> 00:02:01,510
+can be represented with a feature vector.
+可以用一个特征向量来表示
+
+63
+00:02:02,380 --> 00:02:03,810
+Let's take movie 1, so just
+比如说第一个电影,
+
+64
+00:02:04,020 --> 00:02:06,210
+call these movies you know, movies 1 2, 3, 4 and 5.
+我们将这5部电影叫做电影1,2,3,4,5
+
+65
+00:02:06,630 --> 00:02:08,180
+For my first movie,
+对于电影1
+
+66
+00:02:08,520 --> 00:02:09,810
+Love at last, I have
+love at last,我有
+
+67
+00:02:10,170 --> 00:02:11,710
+my two features, 0.9 and
+两个特征,0.9和
+
+68
+00:02:12,180 --> 00:02:12,950
+0, and so these are features
+0,这两个特征
+
+69
+00:02:13,380 --> 00:02:16,170
+X1 and X2, and
+就是x1和x2的值
+
+70
+00:02:16,340 --> 00:02:17,270
+let's add an extra feature
+我们加一个额外的特征
+
+71
+00:02:17,790 --> 00:02:18,780
+as usual, which is my
+像往常一样,
+
+72
+00:02:19,350 --> 00:02:21,640
+interceptor feature X0, which is equal to 1
+用x0来标示,它的值是1
+
+73
+00:02:22,680 --> 00:02:23,810
+and so, putting these together,
+把这些放在一起
+
+74
+00:02:24,700 --> 00:02:26,150
+I would then have a feature X1,
+然后我有一个特征x1
+
+75
+00:02:26,970 --> 00:02:28,420
+the superscript 1 denotes it's
+上标是1标示
+
+76
+00:02:28,510 --> 00:02:29,430
+the feature vector for my first
+它是电影1的特征向量
+
+77
+00:02:29,770 --> 00:02:30,720
+movie, and this feature
+这个特征
+
+78
+00:02:30,980 --> 00:02:32,520
+vector is equal to one.
+向量等于1
+
+79
+00:02:33,190 --> 00:02:34,880
+The first one there is this interceptor,
+第一个元素值为x0,值是1
+
+80
+00:02:35,740 --> 00:02:37,010
+and then my two features 0.9, 0,
+然后是我的两个特征0.9 和0
+
+81
+00:02:37,260 --> 00:02:39,330
+like so.
+就像这样
+
+82
+00:02:40,370 --> 00:02:41,360
+So, for Love at last, I
+对于电影1
+
+83
+00:02:41,550 --> 00:02:43,470
+would have a feature vector X1,
+我们有一个特征向量x1
+
+84
+00:02:44,480 --> 00:02:46,220
+for the movie Romance Forever, we
+对于电影2
+
+85
+00:02:46,340 --> 00:02:47,510
+have the separate feature vector
+我们有一个特征向量
+
+86
+00:02:47,800 --> 00:02:49,310
+X2 and so on, and
+x2 并且
+
+87
+00:02:49,380 --> 00:02:50,780
+for Swords vs. karate I would
+对于电影5
+
+88
+00:02:51,510 --> 00:02:54,050
+have a different feature vector x superscript 5.
+我们有一个不同的特征向量x5
+
+89
+00:02:56,150 --> 00:02:57,460
+Also, consistent with our
+并且与
+
+90
+00:02:57,680 --> 00:02:59,090
+early notation that we were
+我们之前用的符号一致
+
+91
+00:02:59,300 --> 00:03:00,220
+using, we're going to set N
+我们用集合n
+
+92
+00:03:00,490 --> 00:03:02,130
+to be the number of features, not
+来表示特征数量,
+
+93
+00:03:02,360 --> 00:03:03,530
+counting this X zero
+不包括x0这个特征
+
+94
+00:03:03,810 --> 00:03:05,320
+intercept term so n is
+所以n
+
+95
+00:03:05,420 --> 00:03:06,600
+equal to two because we have
+等于2,因为我们有
+
+96
+00:03:06,790 --> 00:03:08,180
+two features x1 and x2
+特征x1和x2
+
+97
+00:03:08,890 --> 00:03:10,140
+capturing the degree of romance
+来表示每部电影里的爱情程度
+
+98
+00:03:10,640 --> 00:03:11,980
+and the degree of action in each
+和动作程度
+
+99
+00:03:12,630 --> 00:03:14,270
+movie.
+Now in order
+现在为了
+
+100
+00:03:14,560 --> 00:03:17,930
+to make predictions, here is one thing we could do,
+作出预测,我们可以这么做
+
+101
+00:03:19,230 --> 00:03:20,980
+which is that we could treat predicting
+我们把预测
+
+102
+00:03:21,160 --> 00:03:22,340
+the ratings of each user
+每个用户的评价看做
+
+103
+00:03:23,250 --> 00:03:26,210
+as a separate linear regression problem. So
+一个线性回归问题
+
+104
+00:03:26,440 --> 00:03:27,660
+specifically lets say that for each
+特别是对于每一个
+
+105
+00:03:27,920 --> 00:03:29,170
+user j we are going
+用户j我们
+
+106
+00:03:29,270 --> 00:03:30,860
+to learn a parameter vector theta
+用一个参数向量theta j,
+
+107
+00:03:31,340 --> 00:03:33,030
+J which would be in R3 in this case,
+这里是三维的
+
+108
+00:03:33,540 --> 00:03:35,730
+more generally theta j would
+通常theta j
+
+109
+00:03:35,950 --> 00:03:37,960
+be in r n+1, where
+在n+1维中
+
+110
+00:03:38,340 --> 00:03:39,460
+n is the number of features,
+n是特征数量
+
+111
+00:03:39,700 --> 00:03:42,170
+not counting the intercept term, and we're going
+不算特征x0, 我们
+
+112
+00:03:42,440 --> 00:03:43,880
+to predict user J as
+预测用户j
+
+113
+00:03:44,050 --> 00:03:45,780
+rating movie I, with just
+评价电影1,
+
+114
+00:03:46,000 --> 00:03:47,390
+the inner product between the parameters
+就是参数向量
+
+115
+00:03:47,860 --> 00:03:50,590
+vector theta and the features "XI".
+theta和特征x1的内积
+
+116
+00:03:51,830 --> 00:03:53,680
+So, let's take a specific example.
+举一个特殊的例子
+
+117
+00:03:55,130 --> 00:03:56,700
+Let's take user one.
+对于用户1
+
+118
+00:03:59,600 --> 00:04:01,120
+So that would be Alice and
+就是Alice
+
+119
+00:04:01,380 --> 00:04:02,700
+associated with Alice would
+和Alice有关的
+
+120
+00:04:02,830 --> 00:04:03,990
+be some parameter vector,
+是一些参数向量
+
+121
+00:04:04,810 --> 00:04:06,210
+theta 1 and our
+theta 1 并且
+
+122
+00:04:06,520 --> 00:04:07,610
+second user Bob will be
+第二个用户Bob
+
+123
+00:04:07,720 --> 00:04:08,600
+associated, with a different
+和一个不同的
+
+124
+00:04:08,970 --> 00:04:10,290
+parameter vector theta 2.
+参数向量theta相关
+
+125
+00:04:10,800 --> 00:04:12,190
+Carol will be associated with a
+Carol将会
+
+126
+00:04:12,300 --> 00:04:13,360
+different parameter vector theta
+和theta 3参数向量相关
+
+127
+00:04:13,660 --> 00:04:14,790
+3 and Dave a different
+Dave 将会
+
+128
+00:04:15,750 --> 00:04:17,670
+parameter vector, theta 4. So
+和theta 4相关
+
+129
+00:04:18,090 --> 00:04:18,990
+lets say we want to make a
+我们想要做一个
+
+130
+00:04:19,320 --> 00:04:21,040
+prediction for what Alice will
+关于Alice将会
+
+131
+00:04:21,240 --> 00:04:22,450
+think of the movie, Cute
+如何评价电影3的预测
+
+132
+00:04:22,690 --> 00:04:24,640
+puppies of love. Well that
+那么
+
+133
+00:04:24,810 --> 00:04:25,670
+movie is going to have some
+电影3有一些
+
+134
+00:04:26,810 --> 00:04:29,180
+parameter vector X3, where
+参数向量x3
+
+135
+00:04:29,410 --> 00:04:30,400
+we have that X3 is going
+我们让它
+
+136
+00:04:30,430 --> 00:04:32,460
+to be equal to 1
+等于1
+
+137
+00:04:32,650 --> 00:04:34,580
+which is my intercept term, and
+也就是特征x0
+
+138
+00:04:34,800 --> 00:04:37,220
+then 0.99, and then 0.
+然后是0.99和0
+
+139
+00:04:38,560 --> 00:04:39,680
+And let's say for this
+对于这个例子
+
+140
+00:04:39,810 --> 00:04:41,040
+example, let's say that you
+我们假设
+
+141
+00:04:41,190 --> 00:04:42,890
+know we have somehow already gotten
+我们已经得到Alice的
+
+142
+00:04:43,290 --> 00:04:44,600
+a parameter vector theta 1
+参数向量theta1
+
+143
+00:04:44,830 --> 00:04:45,700
+for Alice--we we will
+我们
+
+144
+00:04:45,850 --> 00:04:47,560
+say later exactly how
+过会会详细说明
+
+145
+00:04:47,800 --> 00:04:48,520
+we come up with this parameter
+我们是怎么得到这个参数向量的
+
+146
+00:04:48,600 --> 00:04:50,530
+vector--but let's
+但是
+
+147
+00:04:50,710 --> 00:04:52,000
+just say for now that you
+我们说目前
+
+148
+00:04:52,150 --> 00:04:53,560
+know some unspecified learning algorithm
+你知道一些未指明的学习算法
+
+149
+00:04:54,040 --> 00:04:55,040
+has learned the parameter vector
+已经学习了参数向量
+
+150
+00:04:55,180 --> 00:04:56,970
+theta 1 and it is
+theta1并且它
+
+151
+00:04:57,120 --> 00:04:59,260
+equal to 0 5 0. And so
+等于[0 5 0]
+
+152
+00:05:00,150 --> 00:05:02,010
+our prediction for this
+我们对于它的预测
+
+153
+00:05:02,270 --> 00:05:04,130
+entry is going to
+将会
+
+154
+00:05:04,260 --> 00:05:06,930
+be equal to theta 1,
+等于theta1
+
+155
+00:05:07,440 --> 00:05:08,760
+that is Alice's parameter vector,
+这就是Alice的参数向量
+
+156
+00:05:09,620 --> 00:05:11,450
+transpose X3, that
+将x3转置
+
+157
+00:05:11,620 --> 00:05:13,730
+is the feature vector for
+这就是对
+
+158
+00:05:14,170 --> 00:05:16,050
+the Cute Puppies of Love movie number 3.
+电影3的特征向量
+
+159
+00:05:16,250 --> 00:05:17,200
+And so the inner
+并且
+
+160
+00:05:17,470 --> 00:05:18,470
+product between these two vectors
+这两个向量的内积
+
+161
+00:05:19,910 --> 00:05:21,780
+is going to be 5 x 0.99.
+是5*0.99
+
+162
+00:05:23,980 --> 00:05:26,340
+Which is equal to 4.95.
+其值为4.95
+
+163
+00:05:27,360 --> 00:05:28,940
+And so my prediction for value this
+那么这里我们的预测值
+
+164
+00:05:29,130 --> 00:05:30,930
+over here is going to be 4.95.
+就是4.95
+
+165
+00:05:31,970 --> 00:05:33,110
+And maybe that seems like a
+或许它看起来像
+
+166
+00:05:33,230 --> 00:05:34,660
+reasonable value, if indeed
+一个合理值,如果
+
+167
+00:05:36,130 --> 00:05:37,830
+this is my parameter vector theta 1.
+这是我们的参数向量theta1
+
+168
+00:05:38,950 --> 00:05:40,290
+So all we doing here is
+这里我们所做的是
+
+169
+00:05:40,520 --> 00:05:42,710
+we are applying a different copy of
+我们使用一个不同的
+
+170
+00:05:42,930 --> 00:05:44,480
+essentially linear regression for each
+线性回归的副本对于每一个用户而言
+
+171
+00:05:44,760 --> 00:05:46,020
+user and we are saying
+并且我们说
+
+172
+00:05:46,230 --> 00:05:47,610
+that what Alice does, is
+Alice所做的
+
+173
+00:05:47,820 --> 00:05:48,880
+Alice has seem some parameter vector
+是她将参数向量
+
+174
+00:05:49,160 --> 00:05:50,400
+theta 1 that she uses,
+theta1
+
+175
+00:05:51,410 --> 00:05:52,380
+that we use to predict
+我们用来预测
+
+176
+00:05:53,310 --> 00:05:54,770
+her ratings as a
+她的评价,看做是
+
+177
+00:05:54,950 --> 00:05:56,190
+function of how romantic and how
+电影中爱情成分和动作成分怎样组成
+
+178
+00:05:56,470 --> 00:05:57,540
+action packed the movie is
+的一个功能
+
+179
+00:05:58,210 --> 00:05:59,600
+and Bob, and Carol, and
+并且bob,Carol,
+
+180
+00:05:59,740 --> 00:06:01,010
+Dave each of them have a
+Dave 他们每一个都有
+
+181
+00:06:01,220 --> 00:06:03,170
+different linear function of the
+一个不同的线性方程
+
+182
+00:06:03,330 --> 00:06:04,700
+romantic-ness and action-ness or the degree
+来标示爱情成分和动作成分
+
+183
+00:06:05,220 --> 00:06:06,510
+of romance and the degree of action
+的程度
+
+184
+00:06:07,580 --> 00:06:08,030
+in a movie,
+在电影中
+
+185
+00:06:08,820 --> 00:06:11,300
+and that that is how we're going to predict their star ratings.
+这就是我们怎样预测出他们的评价的
+
+186
+00:06:14,820 --> 00:06:16,330
+More formally here is
+这里更正式的是
+
+187
+00:06:16,610 --> 00:06:17,920
+how we can write down the problem.
+我们怎样能写下问题
+
+188
+00:06:19,260 --> 00:06:20,320
+Our notation is that RIJ
+我们将Rij记为1
+
+189
+00:06:20,690 --> 00:06:21,600
+is equal to one, if
+
+190
+00:06:21,680 --> 00:06:22,910
+user J has rated movie I,
+如果用户j评价了电影i
+
+191
+00:06:23,380 --> 00:06:24,630
+and YIJ is the rating
+Yij是对那个电影的评价
+
+192
+00:06:25,850 --> 00:06:28,010
+of that movie if that rating exists.
+如果评价存在的话
+
+193
+00:06:29,540 --> 00:06:30,520
+That is if that user has actually
+如果用户真的
+
+194
+00:06:31,030 --> 00:06:32,830
+rated that movie. And
+对那个电影进行评价,并且
+
+195
+00:06:33,330 --> 00:06:34,360
+on the previous slide we also
+在之前的ppt我们也
+
+196
+00:06:34,650 --> 00:06:36,540
+defined theta J which
+定义了theta j
+
+197
+00:06:36,740 --> 00:06:38,790
+is a parameter for each user XI
+是每个用户Xi的一个参数
+
+198
+00:06:39,150 --> 00:06:40,830
+which is a feature vector for specific
+Xi是对于特定电影的特征向量
+
+199
+00:06:41,220 --> 00:06:42,370
+movie and for each user
+对每一个用户
+
+200
+00:06:42,850 --> 00:06:43,780
+and each movie you would predict
+和每一个电影,你会像下面这样预测
+
+201
+00:06:44,300 --> 00:06:45,620
+that rating, as follows.
+
+202
+00:06:47,230 --> 00:06:49,560
+So let me introduce,
+下面我将引入
+
+203
+00:06:49,650 --> 00:06:51,600
+just temporarily, introduce one extra
+仅仅是暂时的,引入一个额外的
+
+204
+00:06:51,860 --> 00:06:53,530
+bit of notation mj, we
+标记Mj,我们
+
+205
+00:06:53,760 --> 00:06:54,980
+are gonna use mj to denote the
+用Mj来表示
+
+206
+00:06:55,070 --> 00:06:56,140
+number of users rated by movie
+评价了电影j的用户数量
+
+207
+00:06:56,400 --> 00:06:57,350
+j, we're gonna need this
+我们需要这个
+
+208
+00:06:57,580 --> 00:06:59,890
+notation only for this slide. Now, in order to learn
+标记,仅仅是为了这个ppt,现在,为了
+
+209
+00:07:00,160 --> 00:07:01,700
+the parameter vector for
+thetaj的参数向量
+
+210
+00:07:01,760 --> 00:07:03,720
+theta j, well, how can we do so?
+我们该怎么做?
+
+211
+00:07:04,410 --> 00:07:06,380
+This is basically a linear regression problem.
+这是一个基本的线性回归问题
+
+212
+00:07:06,930 --> 00:07:07,980
+So what we can do, is
+我们可以
+
+213
+00:07:08,290 --> 00:07:09,810
+just choose a parameter vector, theta j,
+选择一个参数向量,theta j
+
+214
+00:07:10,520 --> 00:07:12,100
+so the predicted value
+这里的预测值
+
+215
+00:07:12,570 --> 00:07:13,620
+here are as close
+尽可能接近
+
+216
+00:07:13,980 --> 00:07:15,280
+as possible to the values
+那些
+
+217
+00:07:15,800 --> 00:07:18,760
+that we observed in our training set, the values we observed in our data.
+我们在训练集中观察的值
+
+218
+00:07:19,900 --> 00:07:21,390
+So, let's write that down.
+让我们将这些写下
+
+219
+00:07:22,290 --> 00:07:24,320
+In order to learn the
+为了学习
+
+220
+00:07:24,380 --> 00:07:26,960
+parameter vector theta j, let's minimize over
+参数向量thetaj,让我们缩小
+
+221
+00:07:27,170 --> 00:07:28,510
+my parameter vector theta j,
+参数向量thetaj,
+
+222
+00:07:29,400 --> 00:07:30,360
+of sum--
+
+223
+00:07:31,920 --> 00:07:32,860
+and I want to sum
+我想将
+
+224
+00:07:33,290 --> 00:07:34,900
+over all movies that user
+所有用户评价的电影求和
+
+225
+00:07:35,240 --> 00:07:36,930
+j has rated--so we write this as sum
+所以我们这样写
+
+226
+00:07:37,270 --> 00:07:38,290
+over all values of i
+所有的i值求和
+
+227
+00:07:39,100 --> 00:07:42,000
+that is a colon rij equals 1.
+r(i,j)等于1
+
+228
+00:07:43,870 --> 00:07:45,970
+So the way to read this summation index is
+我们可以这样来看索引
+
+229
+00:07:46,370 --> 00:07:48,280
+this is summation over all
+这是所有i值得总和
+
+230
+00:07:48,470 --> 00:07:49,550
+the values of i, so that
+这样来,
+
+231
+00:07:49,780 --> 00:07:51,180
+r i j is equal to 1.
+Rij等于1
+
+232
+00:07:51,210 --> 00:07:52,470
+So this is going to be summing over all the
+所以这样就是对
+
+233
+00:07:52,560 --> 00:07:54,670
+movies that user j has rated.
+所有用户j评价的电影求和
+
+234
+00:07:56,230 --> 00:07:57,000
+And then I am going to
+然后我将要
+
+235
+00:07:58,150 --> 00:07:59,910
+compute theta j
+计算theta j
+
+236
+00:08:01,810 --> 00:08:04,450
+transpose xi so
+Xi转置
+
+237
+00:08:04,610 --> 00:08:06,740
+that's the prediction of user
+这就是
+
+238
+00:08:07,030 --> 00:08:08,390
+j's rating on movie i,
+用户j对电影i的评价
+
+239
+00:08:09,230 --> 00:08:10,960
+minus y i j,
+减去Yij
+
+240
+00:08:11,700 --> 00:08:13,700
+so that's the actual observed rating squared,
+这是实际观测评价的平方
+
+241
+00:08:15,190 --> 00:08:16,790
+and then, let me just divide
+然后我们
+
+242
+00:08:17,260 --> 00:08:18,650
+by the number of movies
+除以
+
+243
+00:08:19,040 --> 00:08:20,990
+that user J, has
+用户j
+
+244
+00:08:21,380 --> 00:08:23,910
+actually rated, so just divide by 1 over 2MJ.
+真实评价的电影数量,
+
+245
+00:08:24,000 --> 00:08:25,460
+And so this is
+这就
+
+246
+00:08:25,690 --> 00:08:27,620
+just like the least squares regression,
+像最小平方回归
+
+247
+00:08:28,210 --> 00:08:29,550
+it's just like linear regression
+就像线性回归
+
+248
+00:08:30,170 --> 00:08:31,170
+where we want to choose
+我们选择
+
+249
+00:08:31,320 --> 00:08:34,480
+the parameter vector theta J, to minimize this type of squared error term.
+参数向量thetaj,来缩小这种方差
+
+250
+00:08:34,510 --> 00:08:35,090
+And if you want to, you can
+并且如果你想,你可以
+
+251
+00:08:36,330 --> 00:08:39,580
+also add in a regularization term
+加入一个正则相
+
+252
+00:08:39,980 --> 00:08:41,870
+so plus lambda over 2m, and
+lambda+2m
+
+253
+00:08:43,780 --> 00:08:44,930
+this is really 2MJ because,
+这就是2Mj
+
+254
+00:08:45,420 --> 00:08:47,760
+this as if we have MJ examples right?
+因为这就好像我们有Mj样本
+
+255
+00:08:47,920 --> 00:08:49,330
+Because if user J has
+因为如果用户j
+
+256
+00:08:49,650 --> 00:08:50,910
+rated that many movies, it's
+评价了许多电影
+
+257
+00:08:51,050 --> 00:08:53,340
+sort of like we have that many data points with which to fit
+这就好像我们有许多数据点来
+
+258
+00:08:53,680 --> 00:08:55,790
+the parameters theta J. And then
+对应参数thetaj,并且
+
+259
+00:08:56,650 --> 00:08:57,390
+let me add in my usual
+我加入
+
+260
+00:08:58,340 --> 00:09:00,260
+regularization term here of
+正则项
+
+261
+00:09:00,460 --> 00:09:02,530
+theta J K squared.
+theta J K的平方
+
+262
+00:09:03,110 --> 00:09:04,270
+As usual this sum is from
+通常这个和是k
+
+263
+00:09:04,840 --> 00:09:05,980
+K equals 1 through N
+从1到n
+
+264
+00:09:06,330 --> 00:09:08,670
+so here theta J is
+这里theta j
+
+265
+00:09:08,880 --> 00:09:10,050
+going to be an N plus
+是一个n+1
+
+266
+00:09:10,520 --> 00:09:12,400
+1 dimensional vector where,
+维的向量
+
+267
+00:09:12,620 --> 00:09:14,630
+in our early example, n was equal to two,
+在我们以前的例子里,n等于2
+
+268
+00:09:15,320 --> 00:09:17,090
+but more generally, n is
+但是通常情况下,n是
+
+269
+00:09:17,260 --> 00:09:20,980
+the number of features we have per movie.
+每个电影所拥有的特征数量
+
+270
+00:09:21,730 --> 00:09:22,270
+And so as usual we don't regularize over theta 0.
+并且我们不会调整theta0
+
+271
+00:09:22,390 --> 00:09:23,710
+We don't regularize over the
+我们不会调整
+
+272
+00:09:23,910 --> 00:09:24,750
+biased term because the sum is
+偏差因为和是
+
+273
+00:09:24,930 --> 00:09:28,590
+from K1 through N. If
+从k1到kn
+
+274
+00:09:28,760 --> 00:09:30,430
+you minimize this as
+如果你将
+
+275
+00:09:30,570 --> 00:09:31,780
+a function of theta J you get a
+theta j这个公式最小化,你会得到
+
+276
+00:09:31,900 --> 00:09:33,010
+good solution, you get a
+一个好的解决方案,你会得到
+
+277
+00:09:33,180 --> 00:09:35,330
+pretty good estimate of a parameter vector theta j
+一个很好的对参数向量theta j的估计
+
+278
+00:09:36,490 --> 00:09:37,200
+with which to make the predictions
+用来对
+
+279
+00:09:37,940 --> 00:09:39,460
+for user J's movie ratings.
+用户j的电影评价做预测
+
+280
+00:09:40,820 --> 00:09:42,250
+For recommender systems, we're going
+对于推荐系统,我们将
+
+281
+00:09:42,520 --> 00:09:44,140
+to change this notation a little
+这个符号稍微改变一下
+
+282
+00:09:44,500 --> 00:09:46,130
+bit. So to simplify the subsequent math,
+为了简化后来的数学公式
+
+283
+00:09:46,690 --> 00:09:48,440
+I'm actually going to get rid of this term MJ.
+我将去掉Mj
+
+284
+00:09:49,570 --> 00:09:50,720
+So that's just a constant right
+这就是一个常数
+
+285
+00:09:50,970 --> 00:09:52,140
+so I can delete it without changing
+所以我能删除他而不用改变
+
+286
+00:09:53,000 --> 00:09:54,310
+the value of theta J that
+thetaj的值
+
+287
+00:09:54,430 --> 00:09:55,840
+I get out of this optimization,
+我从最优化中得到的
+
+288
+00:09:56,010 --> 00:09:57,030
+so if you imagine taking this
+所以如果你想象用这个
+
+289
+00:09:57,220 --> 00:09:58,850
+whole equation, taking this
+公式,用这个
+
+290
+00:09:59,010 --> 00:10:00,290
+whole expression and multiplying it by
+表达并且用
+
+291
+00:10:00,870 --> 00:10:02,540
+MJ and get rid of that constant, and when
+Mj乘以它,并去掉那个常数,
+
+292
+00:10:02,950 --> 00:10:04,110
+I minimize this I should still get
+当我最小化这个,我应该得到
+
+293
+00:10:04,200 --> 00:10:06,590
+the same value of theta J as before.
+和之前一样的数值
+
+294
+00:10:06,710 --> 00:10:07,780
+So, just to repeat what
+为了重复
+
+295
+00:10:08,440 --> 00:10:10,060
+we wrote on the previous slide, here
+在之前ppt写下的东西
+
+296
+00:10:10,340 --> 00:10:12,250
+is our optimization objective: In order
+这里我们最优目标:
+
+297
+00:10:12,580 --> 00:10:13,620
+to learn theta J, which is
+为了学习thetaj
+
+298
+00:10:13,990 --> 00:10:15,080
+a parameter for user J,
+用户j的一个参数
+
+299
+00:10:15,790 --> 00:10:17,570
+we're going to minimize over theta j
+我们将最小化thetaj
+
+300
+00:10:17,770 --> 00:10:19,820
+this optimization objectives. So
+这个最优目标
+
+301
+00:10:20,100 --> 00:10:21,360
+this is our usual squared
+这是方差
+
+302
+00:10:21,720 --> 00:10:24,830
+error term and then this is our regularization term.
+这是我们的正则项
+
+303
+00:10:26,050 --> 00:10:27,410
+Now of course in building
+当然在构建
+
+304
+00:10:27,690 --> 00:10:28,790
+a recommender system we don't
+推荐系统过程中我们不仅是
+
+305
+00:10:29,030 --> 00:10:29,800
+just want to learn parameters
+想要学习
+
+306
+00:10:30,420 --> 00:10:31,500
+for a single user, we want
+单一用户的参数,我们想
+
+307
+00:10:31,650 --> 00:10:33,140
+to learn parameters for all of
+学习所有
+
+308
+00:10:33,490 --> 00:10:35,640
+our users, I have n subscript u
+用户的参数,我用nu
+
+309
+00:10:35,760 --> 00:10:36,730
+users, so I want to
+用户,我想
+
+310
+00:10:36,950 --> 00:10:38,920
+learn all of these parameters and
+学习所有这些参数
+
+311
+00:10:39,060 --> 00:10:39,830
+so what I'm going to do
+并且我将要
+
+312
+00:10:40,140 --> 00:10:42,320
+is take this minimization, take
+用这个最小化,
+
+313
+00:10:42,500 --> 00:10:45,480
+this optimization objective and just add an extra summation there.
+用这个最优目标并且加入一个额外的和
+
+314
+00:10:45,800 --> 00:10:47,610
+So, you know, this expression here
+这个表达式
+
+315
+00:10:48,410 --> 00:10:49,200
+with the one half on top again, so
+..................................
+
+316
+00:10:49,240 --> 00:10:50,510
+it's exactly the same
+这完全和
+
+317
+00:10:50,780 --> 00:10:52,520
+as what we have on top except
+上面得到的是一样的,除了
+
+318
+00:10:52,950 --> 00:10:53,980
+that now, instead of just
+与其
+
+319
+00:10:54,090 --> 00:10:55,670
+doing this for a specific user theta
+仅仅为了特定用户theta做这个
+
+320
+00:10:55,960 --> 00:10:57,270
+J, I'm going to sum
+还不如我对
+
+321
+00:10:57,680 --> 00:10:59,340
+my objective over all of
+所有用户的目标求和
+
+322
+00:10:59,490 --> 00:11:00,940
+my users and then minimize
+并且最小化
+
+323
+00:11:01,260 --> 00:11:03,700
+this overall optimization objective.
+这个最优目标
+
+324
+00:11:04,320 --> 00:11:05,570
+Minimize this overall cost function.
+最小化这个方程
+
+325
+00:11:06,730 --> 00:11:09,200
+And when I minimize this
+当我最小化这个theta1-theta nu
+
+326
+00:11:09,380 --> 00:11:10,560
+as a function of theta 1,
+的方程
+
+327
+00:11:11,360 --> 00:11:12,400
+theta 2, up to
+
+328
+00:11:12,600 --> 00:11:14,130
+theta NU, I will
+我将
+
+329
+00:11:14,270 --> 00:11:15,750
+get a separate parameter
+得到
+
+330
+00:11:16,030 --> 00:11:17,340
+vector each user and
+每个用户的参数方程
+
+331
+00:11:17,450 --> 00:11:18,720
+I can then use that
+然后我就能用这个
+
+332
+00:11:19,090 --> 00:11:20,460
+to make predictions for all of
+来对所有
+
+333
+00:11:20,530 --> 00:11:21,610
+my users for all of
+nu个用户进行预测
+
+334
+00:11:21,720 --> 00:11:23,150
+my N subscript u users.
+
+335
+00:11:24,520 --> 00:11:26,560
+So putting everything together, this
+总体来讲
+
+336
+00:11:27,180 --> 00:11:28,730
+was our optimization objective on
+这就是我们的最优化目标
+
+337
+00:11:28,880 --> 00:11:29,940
+top and to give
+为了
+
+338
+00:11:30,170 --> 00:11:31,070
+this thing a name, I'll just call this
+给他起个名字,我就叫他
+
+339
+00:11:31,930 --> 00:11:33,480
+J of theta 1,
+theta1,theta2...theta nu 的j
+
+340
+00:11:33,630 --> 00:11:35,520
+dot, dot, dot theta NU.
+
+341
+00:11:36,050 --> 00:11:37,280
+So J as usual is my
+j是我
+
+342
+00:11:37,590 --> 00:11:39,830
+optimization objective which I'm trying to minimize.
+试图最小化的最优化目标
+
+343
+00:11:41,330 --> 00:11:42,500
+Next, in order to actually
+下面,为了
+
+344
+00:11:42,880 --> 00:11:44,310
+do the minimization, if you
+实现最小化,如果你
+
+345
+00:11:44,500 --> 00:11:45,840
+were to derive the gradient
+用梯度下降发
+
+346
+00:11:46,150 --> 00:11:47,410
+descent updates, these are
+
+347
+00:11:47,530 --> 00:11:48,720
+the equations you would get,
+你会得到这些方程
+
+348
+00:11:49,900 --> 00:11:51,300
+so you would take theta
+用theta jk
+
+349
+00:11:51,750 --> 00:11:53,310
+JK and subtract from
+减去alpha
+
+350
+00:11:53,430 --> 00:11:56,190
+it alpha, which is the learning rate, times these terms here on the right.
+alpha是一个学习速率,在右边乘以这些项
+
+351
+00:11:56,280 --> 00:11:57,540
+So we have slightly different cases
+我们得到不同的例子
+
+352
+00:11:58,160 --> 00:11:59,660
+so when K equals 0 and when K is not
+当k等于0时,k不等于0时
+
+353
+00:11:59,840 --> 00:12:01,460
+equal to 0, because our regularization
+以为我们的正则项
+
+354
+00:12:01,960 --> 00:12:04,380
+term here regularizes only the
+仅仅对
+
+355
+00:12:04,910 --> 00:12:06,430
+values of theta JK for
+thetajk的值进行调整
+
+356
+00:12:06,610 --> 00:12:07,690
+K not equal to zero. So
+这里k不等于0
+
+357
+00:12:07,830 --> 00:12:09,470
+we don't regularize theta 0
+部门不调整theta 0
+
+358
+00:12:10,090 --> 00:12:11,610
+so the slightly different updates
+对于k的变化等于0
+
+359
+00:12:12,270 --> 00:12:13,580
+for k equals zero, and k not equal to 0.
+k不等于0
+
+360
+00:12:14,680 --> 00:12:16,080
+And this term, over
+这一项
+
+361
+00:12:16,250 --> 00:12:18,090
+here, for example is just a partial
+仅仅是
+
+362
+00:12:18,520 --> 00:12:20,790
+derivative with respect to your parameter,
+对于你的参数的偏导数
+
+363
+00:12:21,090 --> 00:12:24,300
+that of your
+是你的
+
+364
+00:12:25,350 --> 00:12:28,270
+optimization objective, right?
+最优目标,对不?
+
+365
+00:12:28,790 --> 00:12:30,280
+And so, this is just
+这是
+
+366
+00:12:30,680 --> 00:12:33,000
+gradient descent and I've
+梯度下降
+
+367
+00:12:33,230 --> 00:12:35,440
+already computed the derivatives and plugged them into here.
+我们已经计算出导数并将其放到这
+
+368
+00:12:36,560 --> 00:12:39,580
+If these gradient
+如果这些
+
+369
+00:12:40,570 --> 00:12:41,810
+descent updates look a
+梯度下降变化
+
+370
+00:12:41,980 --> 00:12:42,870
+lot like what we had for
+很像我们
+
+371
+00:12:43,050 --> 00:12:44,700
+linear regression, that's because these
+在线性回归中得到的,这是因为
+
+372
+00:12:44,880 --> 00:12:47,250
+are essentially the same as linear regression.
+这些本质上和线性回归是一样的
+
+373
+00:12:48,190 --> 00:12:49,510
+The only minor difference is that
+唯一不同之处在于
+
+374
+00:12:49,780 --> 00:12:51,120
+for linear regression we have
+对于线性回归,我们
+
+375
+00:12:51,580 --> 00:12:52,600
+these 1 over M terms
+有1到m项
+
+376
+00:12:52,990 --> 00:12:54,710
+- it's really 1
+它实际上是
+
+377
+00:12:54,810 --> 00:12:56,770
+over MJ - but
+1到Mj
+
+378
+00:12:57,550 --> 00:12:59,230
+because earlier when we were
+但是因为以前当我们
+
+379
+00:12:59,370 --> 00:13:00,780
+deriving the optimization objective
+求最优目标时
+
+380
+00:13:01,270 --> 00:13:03,540
+we got rid of this, that's why we don't have this 1 over M term.
+我们去掉了这个,这就是为什么我们没有1到m项
+
+381
+00:13:04,440 --> 00:13:05,880
+But otherwise it's really sum over
+但是它是
+
+382
+00:13:06,080 --> 00:13:08,350
+my training examples of, you
+误差训练集合的和
+
+383
+00:13:08,530 --> 00:13:09,890
+know, the error times
+乘以
+
+384
+00:13:10,230 --> 00:13:13,390
+XK plus that regularization
+Xk加上正则项
+
+385
+00:13:14,900 --> 00:13:16,550
+term contributes to the derivative.
+得到偏导
+
+386
+00:13:18,120 --> 00:13:19,040
+So if you are using
+所以如果你用
+
+387
+00:13:19,200 --> 00:13:20,360
+gradient descent, here is how
+梯度下降法
+
+388
+00:13:20,680 --> 00:13:22,140
+you can minimize the cost
+这就是怎样最小化方程j
+
+389
+00:13:22,440 --> 00:13:23,880
+function j, to learn all
+为了学习所有
+
+390
+00:13:24,110 --> 00:13:25,490
+the parameters, and using these
+的参数,我们用
+
+391
+00:13:25,640 --> 00:13:26,980
+formulas for the derivatives, if
+这些求偏导的公式
+
+392
+00:13:27,090 --> 00:13:28,240
+you want, you can also plug them
+如果你想,你可以同样
+
+393
+00:13:28,440 --> 00:13:29,710
+into a more advanced optimization
+用一个更先进的最优化方法
+
+394
+00:13:30,290 --> 00:13:31,710
+algorithm like cluster gradient or
+比如像群集梯度或者
+
+395
+00:13:31,810 --> 00:13:33,730
+LBFGS or what have you, and use
+LBFGS或者你有的其他方法,并且用
+
+396
+00:13:33,940 --> 00:13:35,930
+that to try to minimize the cost function J as well.
+这些方法来最小化方程j
+
+397
+00:13:37,360 --> 00:13:38,450
+So hopefully you now know
+现在你知道
+
+398
+00:13:38,750 --> 00:13:40,510
+how you can apply essentially a
+怎样使用一个
+
+399
+00:13:41,000 --> 00:13:42,820
+variation on linear regression in
+线性回归的变化来
+
+400
+00:13:42,950 --> 00:13:45,460
+order to predict different movie ratings by different users.
+预测不同用户对不同电影的评价
+
+401
+00:13:46,350 --> 00:13:47,510
+This particular algorithm is called
+这个特殊的算法叫做
+
+402
+00:13:48,030 --> 00:13:49,930
+a content based recommendations, or
+基于内容的推荐算法
+
+403
+00:13:50,040 --> 00:13:51,980
+content based approach because we
+因为我们
+
+404
+00:13:52,130 --> 00:13:53,200
+assume that we have available
+假设我们有
+
+405
+00:13:53,650 --> 00:13:55,430
+to us, features for the different movies.
+可以使用的不同电影的特征
+
+406
+00:13:56,150 --> 00:13:57,330
+So we have features that
+我们用这些特征
+
+407
+00:13:57,490 --> 00:13:58,610
+capture what is the
+来表示
+
+408
+00:13:58,700 --> 00:14:00,260
+content of these movies. How romantic is this movie?
+这些电影的内容,这个电影有多浪漫
+
+409
+00:14:01,280 --> 00:14:03,050
+How much action is in this movie?
+这个电影里的动作成分有多少
+
+410
+00:14:03,430 --> 00:14:04,690
+And we are really using features of the
+并且我们使用这些特征
+
+411
+00:14:04,780 --> 00:14:06,910
+content of the movies to make our predictions.
+来进行预测
+
+412
+00:14:08,350 --> 00:14:09,770
+But for many movies we
+但是对于许多电影都是
+
+413
+00:14:09,920 --> 00:14:11,300
+don't actually have such features,
+没有这样特征的
+
+414
+00:14:11,820 --> 00:14:13,630
+or it may be very difficult to get
+或者很难获取
+
+415
+00:14:13,850 --> 00:14:14,970
+such features for all of
+所有电影的此类特征的
+
+416
+00:14:15,050 --> 00:14:16,160
+our movies, for all
+对于
+
+417
+00:14:16,460 --> 00:14:17,800
+of whatever items we are trying to sell.
+我们正试图出售的所有的任何项目。
+
+418
+00:14:18,880 --> 00:14:20,430
+So in the next video, we'll
+所以在下一段视频中
+
+419
+00:14:20,590 --> 00:14:21,530
+start to talk about an approach
+我们将开始讲解一种
+
+420
+00:14:22,010 --> 00:14:23,290
+to recommender systems that isn't
+不是基于内容的推荐系统方法
+
+421
+00:14:23,570 --> 00:14:24,710
+content based and does not
+并且不假设
+
+422
+00:14:24,980 --> 00:14:26,090
+assume that we have
+我们拥有
+
+423
+00:14:26,670 --> 00:14:28,420
+someone else giving us all of these features,
+别人给我们的所有这些特征,
+
+424
+00:14:28,880 --> 00:14:30,300
+for all of the movies in our data set.
+在我们的数据集所有的电影。
+
diff --git a/srt/16 - 3 - Collaborative Filtering (10 min).srt b/srt/16 - 3 - Collaborative Filtering (10 min).srt
new file mode 100644
index 00000000..0a4501d1
--- /dev/null
+++ b/srt/16 - 3 - Collaborative Filtering (10 min).srt
@@ -0,0 +1,1466 @@
+1
+00:00:01,060 --> 00:00:02,420
+In this video we'll talk about
+在这段视频中 我们要讲
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,620 --> 00:00:03,900
+an approach to building a
+一种构建推荐系统的方法
+
+3
+00:00:03,970 --> 00:00:06,390
+recommender system that's called collaborative filtering.
+叫做协同过滤(collaborative filtering)
+
+4
+00:00:07,540 --> 00:00:08,880
+The algorithm that we're talking
+我们所讲的算法
+
+5
+00:00:09,180 --> 00:00:10,400
+about has a very interesting
+有一个值得一提的
+
+6
+00:00:10,680 --> 00:00:11,830
+property that it does
+特点 那就是它能实现
+
+7
+00:00:12,120 --> 00:00:13,290
+what is called feature learning and
+对特征的学习
+
+8
+00:00:13,790 --> 00:00:14,800
+by that I mean that this
+我的意思是
+
+9
+00:00:14,960 --> 00:00:16,270
+will be an algorithm that can
+这种算法能够
+
+10
+00:00:16,450 --> 00:00:19,010
+start to learn for itself what features to use.
+自行学习所要使用的特征
+
+11
+00:00:21,130 --> 00:00:22,100
+Here was the data set that
+我们建一个数据集
+
+12
+00:00:22,220 --> 00:00:23,440
+we had and we had
+假定是为每一部电影准备的
+
+13
+00:00:23,720 --> 00:00:25,030
+assumed that for each movie,
+对每一部电影
+
+14
+00:00:25,690 --> 00:00:27,000
+someone had come and told
+我们找一些人来
+
+15
+00:00:27,320 --> 00:00:28,640
+us how romantic that
+告诉我们这部电影
+
+16
+00:00:28,840 --> 00:00:30,550
+movie was and how much action there was in that movie.
+浪漫指数是多少 动作指数是多少
+
+17
+00:00:31,680 --> 00:00:32,880
+But as you can imagine it
+但想一下就知道
+
+18
+00:00:33,020 --> 00:00:34,320
+can be very difficult and time
+这样做难度很大
+
+19
+00:00:34,500 --> 00:00:36,390
+consuming and expensive to actually try
+也很花费时间
+
+20
+00:00:36,490 --> 00:00:37,860
+to get someone to, you know,
+你想想 要让每个人
+
+21
+00:00:38,050 --> 00:00:39,440
+watch each movie and tell
+看完每一部电影
+
+22
+00:00:39,700 --> 00:00:40,880
+you how romantic each movie and
+告诉你你每一部电影有多浪漫 多动作
+
+23
+00:00:41,410 --> 00:00:42,570
+how action packed is each
+这是一件不容易的事情
+
+24
+00:00:42,660 --> 00:00:44,270
+movie, and often you'll
+而且通常
+
+25
+00:00:44,390 --> 00:00:46,760
+want even more features than just these two.
+你还会希望得到除这两个特征之外的其他指数
+
+26
+00:00:46,980 --> 00:00:48,130
+So where do you get these features from?
+那么你怎样才能得到这些特征呢?
+
+27
+00:00:49,890 --> 00:00:50,920
+So let's change the problem
+所以 让我们转移一下问题
+
+28
+00:00:51,500 --> 00:00:53,220
+a bit and suppose that
+假如我们
+
+29
+00:00:53,980 --> 00:00:55,160
+we have a data set where
+有某一个数据集
+
+30
+00:00:55,410 --> 00:00:57,980
+we do not know the values of these features.
+我们并不知道特征的值是多少
+
+31
+00:00:58,380 --> 00:00:59,280
+So we're given the data set
+所以比如我们得到一些
+
+32
+00:00:59,640 --> 00:01:01,140
+of movies and of
+关于电影的数据
+
+33
+00:01:01,270 --> 00:01:03,550
+how the users rated them, but we
+不同用户对电影的评分
+
+34
+00:01:03,760 --> 00:01:05,190
+have no idea how romantic each
+我们并不知道每部电影
+
+35
+00:01:05,370 --> 00:01:06,140
+movie is and we have no
+到底有多少浪漫的成分
+
+36
+00:01:06,310 --> 00:01:07,660
+idea how action packed each
+也不知道到底每部电影里面动作成分是多少
+
+37
+00:01:07,820 --> 00:01:09,940
+movie is so I've replaced all of these things with question marks.
+于是我把所有的问题都打上问号
+
+38
+00:01:11,310 --> 00:01:12,330
+But now let's make a slightly different assumption.
+现在我们稍稍改变一下这个假设
+
+39
+00:01:13,870 --> 00:01:15,570
+Let's say we've gone to each of our users, and each of our users has told has told us
+假设我们采访了每一位用户 而且每一位用户都告诉我们
+
+40
+00:01:15,980 --> 00:01:18,510
+how much they like the
+他们是否喜欢
+
+41
+00:01:18,820 --> 00:01:20,040
+romantic movies and how much
+爱情电影 以及
+
+42
+00:01:20,220 --> 00:01:22,320
+they like action packed movies.
+他们是否喜欢动作电影
+
+43
+00:01:22,830 --> 00:01:26,090
+So Alice has associated a current of theta 1.
+这样 Alice 就有了对应的参数 θ(1)
+
+44
+00:01:26,820 --> 00:01:27,470
+Bob theta 2.
+Bob 的是 θ(2)
+
+45
+00:01:27,910 --> 00:01:28,440
+Carol theta 3.
+Carol 的是 θ(3)
+
+46
+00:01:28,970 --> 00:01:30,330
+Dave theta 4.
+Dave 的是 θ(4)
+
+47
+00:01:30,500 --> 00:01:31,530
+And let's say we also use this
+我们还有这样的假设
+
+48
+00:01:31,780 --> 00:01:33,040
+and that Alice tells us
+假如 Alice 告诉我们
+
+49
+00:01:33,380 --> 00:01:35,340
+that she really
+她十分喜欢
+
+50
+00:01:35,610 --> 00:01:36,960
+likes romantic movies and so
+爱情电影
+
+51
+00:01:37,140 --> 00:01:38,780
+there's a five there which
+于是 Alice 的特征 x1
+
+52
+00:01:39,280 --> 00:01:41,210
+is the multiplier associated with X1 and lets
+对应的值就是5
+
+53
+00:01:41,570 --> 00:01:42,680
+say that Alice tells us she
+假设 Alice 告诉我们
+
+54
+00:01:42,840 --> 00:01:45,030
+really doesn't like action movies and so there's a 0 there.
+她非常不喜欢动作电影 于是这一个特征就是0
+
+55
+00:01:46,060 --> 00:01:47,190
+And Bob tells us something similar
+Bob 也有相似的喜好
+
+56
+00:01:48,660 --> 00:01:49,770
+so we have theta 2 over here.
+所以也就有了 θ(2) 的数据
+
+57
+00:01:50,630 --> 00:01:52,460
+Whereas Carol tells us that
+但 Carol 说
+
+58
+00:01:53,570 --> 00:01:54,720
+she really likes action movies
+她非常喜欢动作电影
+
+59
+00:01:55,240 --> 00:01:56,450
+which is why there's a 5 there,
+于是这个特征就被记录为5
+
+60
+00:01:56,900 --> 00:01:58,600
+that's the multiplier associated with X2,
+也就是 x2 的值
+
+61
+00:01:58,980 --> 00:02:00,160
+and remember there's also
+别忘了
+
+62
+00:02:01,210 --> 00:02:03,490
+X0 equals 1 and let's
+我们仍然有等于1的 x0
+
+63
+00:02:03,770 --> 00:02:05,390
+say that Carol tells us
+假设 Carol 告诉我们
+
+64
+00:02:05,610 --> 00:02:07,000
+she doesn't like romantic
+她不喜欢爱情电影之类的
+
+65
+00:02:07,390 --> 00:02:09,640
+movies and so on, similarly for Dave.
+而且戴夫也是这样
+
+66
+00:02:09,840 --> 00:02:11,030
+So let's assume that somehow
+于是 我们假定 某种程度上
+
+67
+00:02:11,440 --> 00:02:12,830
+we can go to users and
+我们就可以着眼于用户
+
+68
+00:02:13,290 --> 00:02:14,600
+each user J just tells
+看看任意的用户 j
+
+69
+00:02:15,020 --> 00:02:16,160
+us what is the value
+对应的 θ(j) 是怎样的
+
+70
+00:02:17,090 --> 00:02:18,870
+of theta J for them.
+这样就明确地告诉了我们
+
+71
+00:02:19,450 --> 00:02:22,190
+And so basically specifies to us of how much they like different types of movies.
+他们对不同题材电影的喜欢程度
+
+72
+00:02:24,060 --> 00:02:25,280
+If we can get these parameters
+如果我们能够从用户那里
+
+73
+00:02:25,990 --> 00:02:27,890
+theta from our users then it
+得到这些 θ 参考值
+
+74
+00:02:28,050 --> 00:02:29,820
+turns out that it becomes possible to
+那么我们理论上就能
+
+75
+00:02:29,960 --> 00:02:31,210
+try to infer what are the
+推测出每部电影的
+
+76
+00:02:31,310 --> 00:02:33,710
+values of x1 and x2 for each movie.
+x1 以及 x2 的值
+
+77
+00:02:34,800 --> 00:02:35,140
+Let's look at an example.
+举例来说
+
+78
+00:02:35,730 --> 00:02:36,560
+Let's look at movie 1.
+假如我们看电影1
+
+79
+00:02:38,690 --> 00:02:39,790
+So that movie 1 has associated
+于是电影1就对应于
+
+80
+00:02:40,580 --> 00:02:42,050
+with it a feature vector x1.
+表示特征的向量 x1 联系在一起了
+
+81
+00:02:42,890 --> 00:02:45,420
+And you know this movie is called Love at last but let's ignore that.
+这部电影的名字叫《爱到最后》 但这不重要
+
+82
+00:02:45,770 --> 00:02:46,750
+Let's pretend we don't know what
+假设我们不知道
+
+83
+00:02:46,870 --> 00:02:49,060
+this movie is about, so let's ignore the title of this movie.
+这部电影的主要内容 所以也不要在意电影的名字
+
+84
+00:02:50,180 --> 00:02:52,270
+All we know is that Alice loved this move.
+我们知道的就是 Alice 喜欢这部电影
+
+85
+00:02:52,450 --> 00:02:53,110
+Bob loved this movie.
+Bob 喜欢这部电影
+
+86
+00:02:53,750 --> 00:02:55,370
+Carol and Dave hated this movie.
+Carol 和 Dave 不喜欢它
+
+87
+00:02:56,450 --> 00:02:57,450
+So what can we infer?
+那么我们能推断出什么呢?
+
+88
+00:02:57,830 --> 00:02:58,900
+Well, we know from the
+好的 我们从
+
+89
+00:02:58,990 --> 00:03:00,510
+feature vectors that Alice
+特征向量知道了
+
+90
+00:03:00,780 --> 00:03:03,190
+and Bob love romantic movies
+Alice 和 Bob 喜欢爱情电影
+
+91
+00:03:03,700 --> 00:03:05,660
+because they told us that there's a 5 here.
+因为他们都在这里评了5分
+
+92
+00:03:06,290 --> 00:03:07,480
+Whereas Carol and Dave,
+然而 Carol 和 Dave
+
+93
+00:03:08,380 --> 00:03:10,150
+we know that they hate
+我们知道他们不喜欢
+
+94
+00:03:10,510 --> 00:03:11,920
+romantic movies and that
+爱情电影
+
+95
+00:03:12,300 --> 00:03:13,990
+they love action movies. So
+但喜欢动作电影
+
+96
+00:03:14,730 --> 00:03:16,050
+because those are the parameter
+由于你知道这些
+
+97
+00:03:16,340 --> 00:03:18,830
+vectors that you know, uses 3 and 4, Carol and Dave, gave us.
+是可以从 第3和第4个参数看出来的
+
+98
+00:03:20,110 --> 00:03:20,950
+And so based on the fact
+同时 由于我们知道
+
+99
+00:03:21,390 --> 00:03:22,340
+that movie 1 is loved
+Alice 和 Bob
+
+100
+00:03:22,880 --> 00:03:24,120
+by Alice and Bob and
+喜欢电影1
+
+101
+00:03:24,340 --> 00:03:26,460
+hated by Carol and Dave, we might
+而 Carol 和 Dave 不喜欢它
+
+102
+00:03:26,910 --> 00:03:30,810
+reasonably conclude that this is probably a romantic movie,
+我们可以推断 这可能是一部爱情片
+
+103
+00:03:31,180 --> 00:03:34,240
+it is probably not much of an action movie.
+而不太可能是动作片
+
+104
+00:03:35,290 --> 00:03:36,360
+this example is a little
+这个例子在数学上
+
+105
+00:03:36,520 --> 00:03:38,090
+bit mathematically simplified but what
+可能某种程度上简化了
+
+106
+00:03:38,260 --> 00:03:40,330
+we're really asking is what
+但我们真正需要的是
+
+107
+00:03:40,590 --> 00:03:42,010
+feature vector should X1 be
+特征向量 x(1) 应该是什么
+
+108
+00:03:42,840 --> 00:03:45,370
+so that theta 1 transpose
+才能让 θ(1) 的转置
+
+109
+00:03:46,030 --> 00:03:48,940
+x1 is approximately equal to 5,
+乘以x(1) 约等于5
+
+110
+00:03:49,660 --> 00:03:51,700
+that's Alice's rating, and
+也就是 Alice 的评分值
+
+111
+00:03:51,920 --> 00:03:55,360
+theta 2 transpose x1 is
+然后 θ(2) 的转置乘以 x(1)
+
+112
+00:03:55,510 --> 00:03:56,660
+also approximately equal to 5,
+也近似于5
+
+113
+00:03:57,670 --> 00:03:59,100
+and theta 3 transpose x1 is
+而 θ(3) 的转置 乘以 x(1)
+
+114
+00:03:59,310 --> 00:04:02,180
+approximately equal to 0,
+约等于0
+
+115
+00:04:03,020 --> 00:04:05,250
+so this would be Carol's rating, and
+这是 Carol 的评分
+
+116
+00:04:06,970 --> 00:04:09,780
+theta 4 transpose X1
+而 θ(4) 的转置乘以 x(1)
+
+117
+00:04:10,740 --> 00:04:11,630
+is approximately equal to 0.
+也约等于0
+
+118
+00:04:12,590 --> 00:04:13,520
+And from this it looks
+由此可知
+
+119
+00:04:13,770 --> 00:04:16,000
+like, you know, X1 equals
+x(1) 应该用
+
+120
+00:04:16,870 --> 00:04:18,770
+one that's the intercept term, and
+[1 1.0 0.0] 这个向量表示
+
+121
+00:04:19,080 --> 00:04:20,960
+then 1.0, 0.0, that makes sense
+第一个1 是截距项
+
+122
+00:04:21,310 --> 00:04:22,390
+given what we know of Alice,
+这样才能得出
+
+123
+00:04:22,790 --> 00:04:24,110
+Bob, Carol, and Dave's preferences
+Alice Bob Carol 和 Dave 四个人
+
+124
+00:04:24,770 --> 00:04:25,940
+for movies and the way they rated this movie.
+对电影评分的结果
+
+125
+00:04:27,700 --> 00:04:29,080
+And so more generally, we can
+由此及之 我们可以
+
+126
+00:04:29,220 --> 00:04:30,210
+go down this list and try
+继续列举 试着
+
+127
+00:04:30,430 --> 00:04:31,520
+to figure out what might
+弄明白
+
+128
+00:04:31,700 --> 00:04:35,260
+be reasonable features for these other movies as well.
+其他电影的合理特征
+
+129
+00:04:39,160 --> 00:04:41,890
+Let's formalize this problem of learning the features XI.
+让我们将这一学习问题标准化到任意特征 x(i)
+
+130
+00:04:42,410 --> 00:04:44,220
+Let's say that our
+假设我们的用户
+
+131
+00:04:44,340 --> 00:04:45,860
+users have given us their preferences.
+告诉了我们的偏好
+
+132
+00:04:46,580 --> 00:04:47,950
+So let's say that our users have
+就是说用户们
+
+133
+00:04:48,130 --> 00:04:49,100
+come and, you know, told us
+已经给我们提供了
+
+134
+00:04:49,330 --> 00:04:50,800
+these values for theta 1
+θ(1) 到 θ(nu) 的值
+
+135
+00:04:50,890 --> 00:04:52,990
+through theta of NU
+θ(1) 到 θ(nu) 的值
+
+136
+00:04:53,280 --> 00:04:54,430
+and we want to learn the
+而我们想知道
+
+137
+00:04:54,790 --> 00:04:56,130
+feature vector XI for movie
+电影 i 的
+
+138
+00:04:56,540 --> 00:04:58,020
+number I. What we can
+特征向量 x(i) 我们能做的
+
+139
+00:04:58,200 --> 00:05:00,830
+do is therefore pose the following optimization problem.
+是列出以下的最优化的问题
+
+140
+00:05:01,220 --> 00:05:02,210
+So we want to sum over
+所以 我们想要把
+
+141
+00:05:02,840 --> 00:05:04,600
+all the indices J for
+所有指数 j 相加
+
+142
+00:05:04,930 --> 00:05:06,280
+which we have a rating
+得到对电影 i 的评分
+
+143
+00:05:06,950 --> 00:05:08,340
+for movie I because
+因为我们
+
+144
+00:05:08,750 --> 00:05:10,040
+we're trying to learn the features
+想要求得电影 i 的特征
+
+145
+00:05:10,950 --> 00:05:13,560
+for movie I that is this feature vector XI.
+也就是向量 x(i)
+
+146
+00:05:14,650 --> 00:05:15,660
+So and then what we
+所以现在我们
+
+147
+00:05:15,780 --> 00:05:18,450
+want to do is minimize this squared
+要做的是最小化这个平方误差
+
+148
+00:05:19,020 --> 00:05:20,160
+error, so we want to choose
+我们要选择
+
+149
+00:05:20,420 --> 00:05:22,430
+features XI, so that,
+特征 x(i)
+
+150
+00:05:22,900 --> 00:05:25,000
+you know, the predictive value of
+使得 我们预测的用户 j
+
+151
+00:05:25,200 --> 00:05:26,820
+how user J rates movie
+对该电影 i 评分的预测值评分值
+
+152
+00:05:27,110 --> 00:05:28,170
+I will be similar,
+跟我们从用户 j 处
+
+153
+00:05:28,900 --> 00:05:30,130
+will be not too far in the
+实际得到的评分值
+
+154
+00:05:30,440 --> 00:05:31,910
+squared error sense of the actual
+不会相差太远
+
+155
+00:05:32,530 --> 00:05:35,330
+value YIJ that we actually observe in
+也就是这个差值
+
+156
+00:05:35,530 --> 00:05:37,130
+the rating of user j
+不要太大
+
+157
+00:05:38,310 --> 00:05:40,790
+on movie I. So, just to
+所以 总结一下
+
+158
+00:05:41,040 --> 00:05:42,320
+summarize what this term does
+这一阶段要做的
+
+159
+00:05:42,840 --> 00:05:44,060
+is it tries to choose features
+就是为所有
+
+160
+00:05:45,040 --> 00:05:46,590
+XI so that for
+为电影评分的
+
+161
+00:05:46,960 --> 00:05:48,210
+all the users J that
+用户 j
+
+162
+00:05:48,360 --> 00:05:50,190
+have rated that movie, the
+选择特征 x(i)
+
+163
+00:05:50,860 --> 00:05:52,830
+algorithm also predicts a
+这一算法同样也预测出一个值
+
+164
+00:05:52,900 --> 00:05:55,490
+value for how that user would have rated that movie
+表示该用户将会如何评价某部电影
+
+165
+00:05:56,170 --> 00:05:57,720
+that is not too far, in
+而这个预测值
+
+166
+00:05:57,810 --> 00:05:59,730
+the squared error sense, from the
+在平方误差的形式中
+
+167
+00:06:00,000 --> 00:06:02,310
+actual value that the user had rated that movie.
+与用户对该电影评分的实际值尽量接近
+
+168
+00:06:03,380 --> 00:06:04,560
+So that's the squared error term.
+这就是那个平方误差项了
+
+169
+00:06:05,420 --> 00:06:07,200
+As usual, we can
+和之前一样
+
+170
+00:06:07,310 --> 00:06:08,430
+also add this sort of
+我们可以加上一个正则化项
+
+171
+00:06:08,520 --> 00:06:09,850
+regularization term to prevent
+来防止特征的数值
+
+172
+00:06:10,300 --> 00:06:11,870
+the features from becoming too big.
+变得过大
+
+173
+00:06:13,720 --> 00:06:15,610
+So this is how we
+这就是我们
+
+174
+00:06:15,760 --> 00:06:16,910
+would learn the features
+如何从一部特定的电影中
+
+175
+00:06:17,420 --> 00:06:19,140
+for one specific movie but
+学习到特征的方法
+
+176
+00:06:19,690 --> 00:06:20,480
+what we want to do is
+但我们要做的是
+
+177
+00:06:20,740 --> 00:06:22,060
+learn all the features for all
+学习出所有电影的
+
+178
+00:06:22,230 --> 00:06:23,820
+the movies and so what
+所有特征
+
+179
+00:06:24,080 --> 00:06:25,050
+I'm going to do is add this
+所以我现在要做的是
+
+180
+00:06:25,240 --> 00:06:26,620
+extra summation here so
+在此加上另外的一个求和
+
+181
+00:06:26,780 --> 00:06:28,840
+I'm going to sum over all Nm
+我要对所有的电影 nm 求和
+
+182
+00:06:29,260 --> 00:06:33,140
+movies, N subscript m movies, and minimize
+n 下标 m 个电影
+
+183
+00:06:33,830 --> 00:06:34,670
+this objective on top
+然后最小化整个这个目标函数
+
+184
+00:06:35,010 --> 00:06:37,080
+that sums of all movies.
+针对所有的电影
+
+185
+00:06:37,410 --> 00:06:39,930
+And if you do that, you end up with the following optimization problem.
+这样你就会得到如下的最优化的问题
+
+186
+00:06:40,950 --> 00:06:42,320
+And if you minimize
+如果你将这个最小化
+
+187
+00:06:42,890 --> 00:06:44,520
+this, you have hopefully a
+就应该能得到所有电影的
+
+188
+00:06:44,680 --> 00:06:47,440
+reasonable set of features for all of your movies.
+一系列合理的特征
+
+189
+00:06:48,650 --> 00:06:50,080
+So putting everything together, what
+好的 把我们
+
+190
+00:06:50,210 --> 00:06:51,050
+we, the algorithm we talked
+前一个视频讨论的算法
+
+191
+00:06:51,330 --> 00:06:52,730
+about in the previous video and
+以及我们刚刚
+
+192
+00:06:53,180 --> 00:06:54,810
+the algorithm that we just talked about in this video.
+在这个视频中讲过的算法合在一起
+
+193
+00:06:55,730 --> 00:06:57,070
+In the previous video, what we
+上一个视频中
+
+194
+00:06:57,180 --> 00:06:58,710
+showed was that you know,
+我们讲的是
+
+195
+00:06:58,820 --> 00:06:59,700
+if you have a set of
+如果你有一系列
+
+196
+00:06:59,790 --> 00:07:00,640
+movie ratings, so if you
+对电影的评分 那么如果你
+
+197
+00:07:00,640 --> 00:07:03,960
+have the data the rij's and
+有r(i,j) 和 y(i,j)
+
+198
+00:07:04,090 --> 00:07:06,100
+then you have the yij's that will be the movie ratings.
+也就是对电影的评分
+
+199
+00:07:08,500 --> 00:07:09,650
+Then given features for your
+于是 根据不同电影的特征
+
+200
+00:07:09,800 --> 00:07:11,800
+different movies we can learn these parameters theta.
+我们可以得到参数 θ
+
+201
+00:07:12,340 --> 00:07:13,110
+So if you knew the features,
+这样 如果你知道了特征
+
+202
+00:07:13,830 --> 00:07:15,000
+you can learn the parameters
+你就能学习出不同用户的
+
+203
+00:07:15,650 --> 00:07:16,850
+theta for your different users.
+参数 θ 值
+
+204
+00:07:18,250 --> 00:07:19,770
+And what we showed earlier in
+我们之前
+
+205
+00:07:19,930 --> 00:07:21,400
+this video is that if
+这个视频中讲的是
+
+206
+00:07:21,790 --> 00:07:22,860
+your users are willing to
+如果用户愿意
+
+207
+00:07:23,000 --> 00:07:25,450
+give you parameters, then you
+为你提供参数 那么你就
+
+208
+00:07:25,560 --> 00:07:28,060
+can estimate features for the different movies.
+可以为不同的电影估计特征
+
+209
+00:07:29,270 --> 00:07:31,490
+So this is kind of a chicken and egg problem.
+这有点像鸡和蛋的问题
+
+210
+00:07:31,770 --> 00:07:32,290
+Which comes first?
+到底先有鸡还是先有蛋?
+
+211
+00:07:32,900 --> 00:07:35,570
+You know, do we want if we can get the thetas, we can know the Xs.
+就是说 如果我们能知道 θ 就能学习到 x
+
+212
+00:07:36,060 --> 00:07:38,160
+If we have the Xs, we can learn the thetas.
+如果我们知道 x 也会学出 θ 来
+
+213
+00:07:39,500 --> 00:07:40,500
+And what you can
+而这样一来 你能做的
+
+214
+00:07:40,680 --> 00:07:41,790
+do is, and then
+就是
+
+215
+00:07:41,910 --> 00:07:43,000
+this actually works, what you
+如果这真的可行的话
+
+216
+00:07:43,110 --> 00:07:44,530
+can do is in fact randomly
+实际上你能做的就是
+
+217
+00:07:45,170 --> 00:07:47,160
+guess some value of the thetas.
+随机猜de θ 的值
+
+218
+00:07:48,210 --> 00:07:49,200
+Now based on your initial random
+基于你一开始随机
+
+219
+00:07:49,530 --> 00:07:50,630
+guess for the thetas, you can
+猜测出的 θ 的值
+
+220
+00:07:50,940 --> 00:07:52,530
+then go ahead and use
+继你可以继续下去
+
+221
+00:07:53,160 --> 00:07:54,210
+the procedure that we just talked
+运用我们刚刚讲到的
+
+222
+00:07:54,460 --> 00:07:55,810
+about in order to
+步骤 我们可以学习出
+
+223
+00:07:56,060 --> 00:07:57,740
+learn features for your different movies.
+不同电影的特征
+
+224
+00:07:58,800 --> 00:07:59,990
+Now given some initial set
+给出已有的一些电影的
+
+225
+00:08:00,130 --> 00:08:01,160
+of features for your movies you
+原始特征
+
+226
+00:08:01,240 --> 00:08:02,730
+can then use this first
+你可以运用
+
+227
+00:08:03,050 --> 00:08:04,060
+method that we talked about
+我们在上一个视频中讨论过的
+
+228
+00:08:04,130 --> 00:08:06,180
+in the previous video to try to get
+第一种方法 可以得到
+
+229
+00:08:06,360 --> 00:08:08,590
+an even better estimate for your parameters theta.
+对参数 θ 的更好估计
+
+230
+00:08:09,560 --> 00:08:12,420
+Now that you have a better setting of the parameters theta for your users,
+这样就会为用户提供更好的参数 θ 集
+
+231
+00:08:12,860 --> 00:08:13,850
+we can use that to maybe
+我们就可以用这些
+
+232
+00:08:14,070 --> 00:08:15,140
+even get a better set of
+得到更好的
+
+233
+00:08:15,240 --> 00:08:17,110
+features and so on.
+特征集或者其他数据
+
+234
+00:08:17,380 --> 00:08:18,400
+We can sort of keep
+然后我们可以继续
+
+235
+00:08:18,600 --> 00:08:19,440
+iterating, going back and forth
+迭代 不停重复
+
+236
+00:08:19,790 --> 00:08:21,270
+and optimizing theta, x theta,
+优化θ x θ
+
+237
+00:08:21,560 --> 00:08:24,000
+x theta, nd this
+x θ 这非常有效
+
+238
+00:08:24,270 --> 00:08:25,290
+actually works and if you
+如果你
+
+239
+00:08:25,410 --> 00:08:26,340
+do this, this will actually
+这样做的话
+
+240
+00:08:26,800 --> 00:08:28,360
+cause your album to converge
+你的算法将会收敛到
+
+241
+00:08:28,930 --> 00:08:30,430
+to a reasonable set of
+一组合理的电影的特征
+
+242
+00:08:31,340 --> 00:08:32,650
+features for you movies and a
+以及一组对合理的
+
+243
+00:08:32,790 --> 00:08:34,880
+reasonable set of parameters for your different users.
+对不同用户参数的估计
+
+244
+00:08:36,080 --> 00:08:38,870
+So this is a basic collaborative filtering algorithm.
+这就是基本的协同过滤算法
+
+245
+00:08:39,770 --> 00:08:40,850
+This isn't actually the final
+这实际并不是最后
+
+246
+00:08:41,020 --> 00:08:42,890
+algorithm that we're going to use. In the next
+我们将要使用的算法
+
+247
+00:08:43,120 --> 00:08:44,100
+video we are going to be able to improve
+下一个视频中
+
+248
+00:08:44,790 --> 00:08:45,610
+on this algorithm and make
+我们将改进这个算法
+
+249
+00:08:45,920 --> 00:08:47,430
+it quite a bit more computationally efficient.
+让其在计算时更为高效
+
+250
+00:08:48,390 --> 00:08:49,510
+But, hopefully this gives you
+但是这节课希望能让你
+
+251
+00:08:49,640 --> 00:08:50,600
+a sense of how you
+基本了解如何
+
+252
+00:08:50,680 --> 00:08:51,980
+can formulate a
+构建一个问题
+
+253
+00:08:52,040 --> 00:08:52,990
+problem where you can simultaneously
+在这个问题中
+
+254
+00:08:53,930 --> 00:08:57,200
+learn the parameters and simultaneously learn the features from the different movies.
+从不同的电影处学到参数以及特征
+
+255
+00:08:58,440 --> 00:08:59,660
+And for this problem, for the
+对于这个问题
+
+256
+00:08:59,740 --> 00:09:01,100
+recommender system problem, this is
+对于推荐系统
+
+257
+00:09:01,390 --> 00:09:02,950
+possible only because each user
+可能就根据每个用户
+
+258
+00:09:03,490 --> 00:09:04,840
+rates multiple movies and hopefully
+对多部电影的评分
+
+259
+00:09:05,100 --> 00:09:06,410
+each movie is rated
+以及每部电影由
+
+260
+00:09:06,790 --> 00:09:08,710
+by multiple users. And so
+由不同用户的评分
+
+261
+00:09:09,280 --> 00:09:10,150
+you can do this back and
+这样你就可以反复进行这样的过程
+
+262
+00:09:10,270 --> 00:09:11,150
+forth process to estimate theta
+来估计出 θ 和 x
+
+263
+00:09:11,200 --> 00:09:14,400
+and x. So to
+总结一下
+
+264
+00:09:14,830 --> 00:09:15,910
+summarize, in this video we've
+在这个视频中
+
+265
+00:09:16,140 --> 00:09:18,710
+seen an initial collaborative filtering algorithm.
+我们了解了最基本的协同过滤算法
+
+266
+00:09:19,680 --> 00:09:21,550
+The term collaborative filtering refers
+协同过滤算法指的是
+
+267
+00:09:22,020 --> 00:09:23,620
+to the observation that when
+当你执行这个算法时
+
+268
+00:09:23,760 --> 00:09:25,020
+you run this algorithm with a
+你通过一大堆用户
+
+269
+00:09:25,210 --> 00:09:26,790
+large set of users, what all
+得到的数据
+
+270
+00:09:26,960 --> 00:09:28,410
+of these users are effectively doing are sort of
+这些用户实际上在高效地
+
+271
+00:09:29,070 --> 00:09:31,300
+collaboratively--or collaborating to
+进行了协同合作
+
+272
+00:09:31,490 --> 00:09:32,770
+get better movie ratings for
+来得到每个人
+
+273
+00:09:33,010 --> 00:09:34,610
+everyone because with every
+对电影的评分值
+
+274
+00:09:34,840 --> 00:09:36,540
+user rating some subset with the movies,
+只要用户对某几部电影进行评分
+
+275
+00:09:37,350 --> 00:09:39,040
+every user is helping the
+每个用户就都在帮助算法
+
+276
+00:09:39,300 --> 00:09:41,490
+algorithm a little bit to learn better features,
+更好的学习出特征
+
+277
+00:09:42,900 --> 00:09:44,390
+and then by helping--
+这样 通过自己
+
+278
+00:09:44,490 --> 00:09:46,690
+by rating a few movies myself, I will be helping
+对几部电影评分之后
+
+279
+00:09:47,810 --> 00:09:49,550
+the system learn better features and
+我就能帮助系统更好的学习到特征
+
+280
+00:09:49,680 --> 00:09:50,750
+then these features can be used
+这些特征可以
+
+281
+00:09:50,930 --> 00:09:52,610
+by the system to make better
+被系统运用 为其他人
+
+282
+00:09:52,890 --> 00:09:54,380
+movie predictions for everyone else.
+做出更准确的电影预测
+
+283
+00:09:54,640 --> 00:09:55,450
+And so there is a sense of
+协同的另一层意思
+
+284
+00:09:55,530 --> 00:09:56,980
+collaboration where every user is
+是说每位用户
+
+285
+00:09:57,370 --> 00:09:58,980
+helping the system learn better features
+都在为了大家的利益
+
+286
+00:09:59,360 --> 00:10:00,740
+for the common good. This
+学习出更好的特征
+
+287
+00:10:00,810 --> 00:10:03,450
+is this collaborative filtering.
+这就是协同过滤
+
+288
+00:10:04,070 --> 00:10:04,990
+And, in the next video what we
+在下一个视频中
+
+289
+00:10:05,140 --> 00:10:07,490
+going to do is take the ideas that
+我们要把这些想法
+
+290
+00:10:07,740 --> 00:10:08,850
+have worked out, and try to
+付诸实施
+
+291
+00:10:09,090 --> 00:10:09,910
+develop a better an even
+尝试开发一种
+
+292
+00:10:10,170 --> 00:10:11,920
+better algorithm, a slightly better
+更完美的算法
+
+293
+00:10:12,180 --> 00:10:13,640
+technique for collaborative filtering.
+为协同过滤算法做出一点改进
+
diff --git a/srt/16 - 4 - Collaborative Filtering Algorithm (9 min).srt b/srt/16 - 4 - Collaborative Filtering Algorithm (9 min).srt
new file mode 100644
index 00000000..1413cf9e
--- /dev/null
+++ b/srt/16 - 4 - Collaborative Filtering Algorithm (9 min).srt
@@ -0,0 +1,1196 @@
+1
+00:00:00,240 --> 00:00:01,690
+In the last couple videos, we
+在前面几个视频里
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,820 --> 00:00:02,990
+talked about the ideas of
+我们谈到几个概念
+
+3
+00:00:03,140 --> 00:00:04,570
+how, first, if you're
+首先
+
+4
+00:00:04,780 --> 00:00:06,210
+given features for movies, you
+如果给你几个特征表示电影
+
+5
+00:00:06,920 --> 00:00:08,610
+can use that to learn parameters data for users.
+我们可以使用这些资料去获得用户的参数数据
+
+6
+00:00:09,490 --> 00:00:11,400
+And second, if you're given parameters for the users,
+第二 如果给你用户的参数数据
+
+7
+00:00:11,920 --> 00:00:13,570
+you can use that to learn features for the movies.
+你可以使用这些资料去获得电影的特征
+
+8
+00:00:14,480 --> 00:00:15,550
+In this video we're going
+本节视频中
+
+9
+00:00:15,650 --> 00:00:16,670
+to take those ideas and put
+我们将会使用这些概念
+
+10
+00:00:16,850 --> 00:00:18,130
+them together to come up
+并且将它们合并成
+
+11
+00:00:18,280 --> 00:00:20,130
+with a collaborative filtering algorithm.
+协同过滤算法 (Collaborative Filtering Algorithm)
+
+12
+00:00:21,250 --> 00:00:22,450
+So one of the things we worked
+我们之前做过的事情
+
+13
+00:00:22,520 --> 00:00:23,640
+out earlier is that if
+其中之一是
+
+14
+00:00:23,680 --> 00:00:24,510
+you have features for the
+假如你有了电影的特征
+
+15
+00:00:24,600 --> 00:00:25,740
+movies then you can solve
+你就可以解出
+
+16
+00:00:26,070 --> 00:00:27,590
+this minimization problem to find
+这个最小化问题
+
+17
+00:00:27,950 --> 00:00:30,010
+the parameters theta for your users.
+为你的用户找到参数 θ
+
+18
+00:00:30,730 --> 00:00:32,260
+And then we also
+然后我们也
+
+19
+00:00:32,640 --> 00:00:33,960
+worked that out, if you
+知道了
+
+20
+00:00:34,360 --> 00:00:37,440
+are given the parameters theta,
+如果你拥有参数 θ
+
+21
+00:00:38,080 --> 00:00:38,990
+you can also use that to
+你也可以用该参数
+
+22
+00:00:39,170 --> 00:00:40,800
+estimate the features x, and
+通过解一个最小化问题
+
+23
+00:00:40,870 --> 00:00:42,980
+you can do that by solving this minimization problem.
+去计算出特征 x
+
+24
+00:00:44,310 --> 00:00:45,720
+So one thing you
+所以你可以做的事
+
+25
+00:00:45,880 --> 00:00:47,360
+could do is actually go back and forth.
+是不停地重复这些计算
+
+26
+00:00:47,870 --> 00:00:50,230
+Maybe randomly initialize the parameters
+或许是随机地初始化这些参数
+
+27
+00:00:50,510 --> 00:00:51,350
+and then solve for theta,
+然后解出 θ
+
+28
+00:00:51,780 --> 00:00:52,690
+solve for x, solve for theta,
+解出 x 解出 θ
+
+29
+00:00:52,870 --> 00:00:54,330
+solve for x. But, it
+解出 x
+
+30
+00:00:54,420 --> 00:00:55,220
+turns out that there is a
+但实际上呢
+
+31
+00:00:55,400 --> 00:00:56,760
+more efficient algorithm that doesn't
+存在一个更有效率的算法
+
+32
+00:00:56,980 --> 00:00:57,910
+need to go back and forth
+让我们不再需要再这样不停地
+
+33
+00:00:58,110 --> 00:00:59,700
+between the x's and the
+计算 x 和 θ
+
+34
+00:00:59,730 --> 00:01:00,670
+thetas, but that can solve
+而是能够将
+
+35
+00:01:01,300 --> 00:01:04,250
+for theta and x simultaneously.
+x 和 θ 同时计算出来
+
+36
+00:01:05,160 --> 00:01:06,310
+And here it is. What we are going to do, is basically take
+下面就是这种算法 我们所要做的
+
+37
+00:01:06,600 --> 00:01:08,990
+both of these optimization objectives, and
+是将这两个优化目标函数
+
+38
+00:01:09,130 --> 00:01:10,640
+put them into the same objective.
+给合为一个
+
+39
+00:01:11,550 --> 00:01:12,590
+So I'm going to define the
+所以我要来定义
+
+40
+00:01:12,730 --> 00:01:15,010
+new optimization objective j, which
+这个新的优化目标函数 J
+
+41
+00:01:15,250 --> 00:01:16,540
+is a cost function, that
+它依然是一个代价函数
+
+42
+00:01:16,640 --> 00:01:17,630
+is a function of my features
+是我特征 x
+
+43
+00:01:18,050 --> 00:01:19,150
+x and a function
+和参数 θ
+
+44
+00:01:19,790 --> 00:01:20,750
+of my parameters theta.
+的函数
+
+45
+00:01:21,660 --> 00:01:23,050
+And, it's basically the two optimization objectives
+它其实就是上面那两个优化目标函数
+
+46
+00:01:23,520 --> 00:01:24,920
+I had on top, but I put together.
+但我将它们给合在一起
+
+47
+00:01:26,270 --> 00:01:27,760
+So, in order to
+为了把这个解释清楚
+
+48
+00:01:28,060 --> 00:01:31,140
+explain this, first, I want to point out that this
+首先 我想指出
+
+49
+00:01:31,400 --> 00:01:33,420
+term over here, this squared
+这里的这个表达式
+
+50
+00:01:33,820 --> 00:01:35,490
+error term, is the same
+这个平方误差项
+
+51
+00:01:35,920 --> 00:01:39,250
+as this squared error term and the
+和下面的这个项是相同的
+
+52
+00:01:39,760 --> 00:01:40,880
+summations look a little bit
+可能两个求和看起来有点不同
+
+53
+00:01:41,050 --> 00:01:42,940
+different, but let's see what the summations are really doing.
+但让我们来看看它们到底到底在做什么
+
+54
+00:01:43,800 --> 00:01:45,090
+The first summation is sum
+第一个求和运算
+
+55
+00:01:45,480 --> 00:01:48,280
+over all users J and
+是所有用户 J 的总和
+
+56
+00:01:48,380 --> 00:01:50,590
+then sum over all movies rated by that user.
+和所有被用户评分过的电影总和
+
+57
+00:01:51,890 --> 00:01:53,240
+So, this is really summing over all
+所以这其实是正在将
+
+58
+00:01:53,470 --> 00:01:55,950
+pairs IJ, that correspond
+所有关于 (i,j) 对的项全加起来
+
+59
+00:01:56,510 --> 00:01:57,830
+to a movie that was rated by a user.
+表示被用户评分过的电影
+
+60
+00:01:58,550 --> 00:01:59,960
+Sum over J says, for every
+关于 j 的求和
+
+61
+00:02:00,150 --> 00:02:01,520
+user, the sum of
+意思是 对每个用户
+
+62
+00:02:01,740 --> 00:02:03,110
+all the movies rated by that user.
+关于该用户评分的电影的求和
+
+63
+00:02:04,250 --> 00:02:07,340
+This summation down here, just does things in the opposite order.
+而下面的求和运算只是用相反的顺序去进行计算
+
+64
+00:02:07,630 --> 00:02:08,710
+This says for every movie
+这写着关于每部电影 i
+
+65
+00:02:09,050 --> 00:02:11,140
+I, sum over all
+求和 关于的是
+
+66
+00:02:11,340 --> 00:02:12,480
+the users J that have
+所有曾经对它评分过的
+
+67
+00:02:12,690 --> 00:02:14,580
+rated that movie and so, you
+用户 j
+
+68
+00:02:14,690 --> 00:02:16,100
+know these summations, both of these
+所以这些求和运算
+
+69
+00:02:16,220 --> 00:02:18,150
+are just summations over all pairs
+这两种都是对所有 (i,j) 对的求和
+
+70
+00:02:18,930 --> 00:02:21,150
+ij for which
+其中
+
+71
+00:02:21,440 --> 00:02:24,620
+r of i J is equal to 1.
+r(i,j) 是等于1的
+
+72
+00:02:24,660 --> 00:02:26,580
+It's just something over all the
+这只是所有你有评分的用户
+
+73
+00:02:27,180 --> 00:02:29,810
+user movie pairs for which you have a rating.
+和电影对而已
+
+74
+00:02:30,840 --> 00:02:32,230
+and so those two terms
+因此 这两个式子
+
+75
+00:02:32,600 --> 00:02:34,740
+up there is just
+其实就是
+
+76
+00:02:34,930 --> 00:02:36,460
+exactly this first term, and
+这里的第一个式子
+
+77
+00:02:36,500 --> 00:02:38,310
+I've just written the summation here explicitly,
+我已经给出了这个求和式子
+
+78
+00:02:39,310 --> 00:02:40,290
+where I'm just saying the sum
+这里我写着
+
+79
+00:02:40,580 --> 00:02:42,290
+of all pairs IJ, such that
+其为所有 r(i,j) 值为1的
+
+80
+00:02:42,540 --> 00:02:45,060
+RIJ is equal to 1.
+(i,j) 对求和
+
+81
+00:02:45,310 --> 00:02:46,800
+So what we're going
+所以我们要做的
+
+82
+00:02:46,940 --> 00:02:48,790
+to do is define a
+是去定义
+
+83
+00:02:49,130 --> 00:02:51,410
+combined optimization objective that
+一个我们想将其最小化的
+
+84
+00:02:51,670 --> 00:02:53,290
+we want to minimize in order
+合并后的优化目标函数
+
+85
+00:02:53,550 --> 00:02:55,700
+to solve simultaneously for x and theta.
+让我们能同时解出 x 和 θ
+
+86
+00:02:56,970 --> 00:02:58,040
+And then the other terms in
+然后在这些优化目标函数里的
+
+87
+00:02:58,070 --> 00:03:00,250
+the optimization objective are this,
+另一个式子是这个
+
+88
+00:03:00,570 --> 00:03:02,870
+which is a regularization in terms of theta.
+其为 θ 所进行的正则化
+
+89
+00:03:03,770 --> 00:03:05,830
+So that came down here and
+它被放到这里
+
+90
+00:03:06,290 --> 00:03:08,190
+the final piece is this
+最后一部分
+
+91
+00:03:08,900 --> 00:03:10,690
+term which is
+是这项式
+
+92
+00:03:10,850 --> 00:03:12,970
+my optimization objective for
+是我 x 的优化目标函数
+
+93
+00:03:13,170 --> 00:03:16,180
+the x's and that became this.
+然后它变成这个
+
+94
+00:03:16,500 --> 00:03:18,020
+And this optimization objective
+这个优化目标函数 J
+
+95
+00:03:18,720 --> 00:03:19,730
+j actually has an interesting property
+它有一个很有趣的特性
+
+96
+00:03:20,240 --> 00:03:20,950
+that if you were to hold
+如果你假设
+
+97
+00:03:21,410 --> 00:03:23,070
+the x's constant and just
+x 为常数
+
+98
+00:03:23,260 --> 00:03:25,490
+minimize with respect to the thetas then
+并关于 θ 优化的话
+
+99
+00:03:25,670 --> 00:03:27,040
+you'd be solving exactly this problem,
+你其实就是在计算这个式子
+
+100
+00:03:27,840 --> 00:03:28,450
+whereas if you were to do
+反过来也一样
+
+101
+00:03:28,620 --> 00:03:29,590
+the opposite, if you were to
+如果你把 θ 作为常量
+
+102
+00:03:29,690 --> 00:03:31,310
+hold the thetas constant, and minimize
+然后关于 x
+
+103
+00:03:31,670 --> 00:03:32,650
+j only with respect to
+求 J 的最小值的话
+
+104
+00:03:32,750 --> 00:03:34,920
+the x's, then it becomes equivalent to this.
+那就与第二个式子相等
+
+105
+00:03:35,230 --> 00:03:36,780
+Because either this term
+因为不管是这个部分
+
+106
+00:03:37,060 --> 00:03:38,860
+or this term is constant if
+还是这个部分 将会变成常数
+
+107
+00:03:38,970 --> 00:03:40,510
+you're minimizing only the respective x's or only respective thetas.
+如果你将它化简成只以 x 或 θ 表达的话
+
+108
+00:03:40,920 --> 00:03:43,680
+So here's an optimization
+所以这里是
+
+109
+00:03:44,640 --> 00:03:46,840
+objective that puts together my
+一个将我的 x 和 θ
+
+110
+00:03:47,440 --> 00:03:50,230
+cost functions in terms of x and in terms of theta.
+合并起来的代价函数
+
+111
+00:03:51,620 --> 00:03:53,050
+And in order to
+然后
+
+112
+00:03:53,470 --> 00:03:54,750
+come up with just one
+为了解出
+
+113
+00:03:55,090 --> 00:03:56,130
+optimization problem, what we're going
+这个优化目标问题
+
+114
+00:03:56,280 --> 00:03:57,590
+to do, is treat this
+我们所要做的是
+
+115
+00:03:58,430 --> 00:03:59,850
+cost function, as a
+将这个代价函数视为
+
+116
+00:03:59,880 --> 00:04:00,890
+function of my features
+特征 x
+
+117
+00:04:01,410 --> 00:04:02,540
+x and of my user
+和用户参数 θ 的
+
+118
+00:04:03,180 --> 00:04:05,020
+pro user parameters data and
+函数
+
+119
+00:04:05,140 --> 00:04:06,570
+just minimize this whole thing, as
+然后全部化简为
+
+120
+00:04:06,740 --> 00:04:07,830
+a function of both the
+一个既关于 x
+
+121
+00:04:08,120 --> 00:04:10,210
+Xs and a function of the thetas.
+也关于 θ 的函数
+
+122
+00:04:11,300 --> 00:04:12,400
+And really the only difference
+这和
+
+123
+00:04:12,540 --> 00:04:13,800
+between this and the older
+前面的算法之间
+
+124
+00:04:14,160 --> 00:04:15,650
+algorithm is that, instead
+唯一的不同是
+
+125
+00:04:15,980 --> 00:04:17,340
+of going back and forth, previously
+不需要反复计算
+
+126
+00:04:17,840 --> 00:04:20,110
+we talked about minimizing with respect
+就像我们之前所提到的
+
+127
+00:04:20,420 --> 00:04:22,130
+to theta then minimizing with respect to x,
+先关于 θ 最小化 然后关于 x 最小化
+
+128
+00:04:22,260 --> 00:04:23,370
+whereas minimizing with respect to theta,
+然后再关于 θ 最小化
+
+129
+00:04:23,900 --> 00:04:25,270
+minimizing with respect to x and so on.
+再关于 x 最小化...
+
+130
+00:04:26,130 --> 00:04:28,090
+In this new version instead of
+在新版本里头
+
+131
+00:04:28,560 --> 00:04:30,020
+sequentially going between the
+不需要不断地在 x 和 θ
+
+132
+00:04:30,220 --> 00:04:31,880
+2 sets of parameters x and theta,
+这两个参数之间不停折腾
+
+133
+00:04:32,180 --> 00:04:32,940
+what we are going to do
+我们所要做的是
+
+134
+00:04:33,230 --> 00:04:34,600
+is just minimize with respect
+将这两组参数
+
+135
+00:04:34,780 --> 00:04:36,410
+to both sets of parameters simultaneously.
+同时化简
+
+136
+00:04:39,750 --> 00:04:41,290
+Finally one last detail
+最后一件事是
+
+137
+00:04:42,030 --> 00:04:44,380
+is that when we're learning the features this way.
+当我们以这样的方法学习特征量时
+
+138
+00:04:45,110 --> 00:04:46,410
+Previously we have been using
+之前我们所使用的
+
+139
+00:04:46,840 --> 00:04:49,290
+this convention that
+前提是
+
+140
+00:04:49,470 --> 00:04:50,540
+we have a feature x0 equals
+我们所使用的特征 x0
+
+141
+00:04:50,740 --> 00:04:52,940
+one that corresponds to an interceptor.
+等于1 对应于一个截距
+
+142
+00:04:54,140 --> 00:04:55,530
+When we are using this
+当我们以
+
+143
+00:04:55,760 --> 00:04:57,790
+sort of formalism where we're are actually learning the features,
+这种形式真的去学习特征量时
+
+144
+00:04:58,300 --> 00:05:00,200
+we are actually going to do away with this convention.
+我们必须要去掉这个前提
+
+145
+00:05:01,400 --> 00:05:04,220
+And so the features we are going to learn x, will be in Rn.
+所以这些我们将学习的特征量 x 是 n 维实数
+
+146
+00:05:05,430 --> 00:05:06,650
+Whereas previously we had
+而先前我们所有的
+
+147
+00:05:06,810 --> 00:05:09,770
+features x and Rn + 1 including the intercept term.
+特征值x 是 n+1 维 包括截距
+
+148
+00:05:10,390 --> 00:05:13,390
+By getting rid of x0 we now just have x in Rn.
+删除掉x0 我们现在只会有 n 维的 x
+
+149
+00:05:14,880 --> 00:05:16,520
+And so similarly, because the
+同样地
+
+150
+00:05:16,590 --> 00:05:17,780
+parameters theta is in
+因为参数 θ 是
+
+151
+00:05:17,850 --> 00:05:19,260
+the same dimension, we now
+在同一个维度上
+
+152
+00:05:19,510 --> 00:05:21,010
+also have theta in RN
+所以 θ 也是 n 维的
+
+153
+00:05:21,540 --> 00:05:23,340
+because if there's no
+因为如果没有 x0
+
+154
+00:05:23,710 --> 00:05:24,580
+x0, then there's no need
+那么 θ0
+
+155
+00:05:25,370 --> 00:05:26,880
+parameter theta 0 as well.
+也不再需要
+
+156
+00:05:27,960 --> 00:05:28,880
+And the reason we do away
+我们将这个前提移除的理由是
+
+157
+00:05:29,160 --> 00:05:30,390
+with this convention is because
+因为我们现在是在
+
+158
+00:05:31,010 --> 00:05:32,610
+we're now learning all the features, right?
+学习所有的特征 对吧?
+
+159
+00:05:32,820 --> 00:05:34,280
+So there is no need
+所以我们没有必要
+
+160
+00:05:34,420 --> 00:05:36,650
+to hard code the feature that is always equal to one.
+去将这个等于一的特征值固定死
+
+161
+00:05:37,170 --> 00:05:38,310
+Because if the algorithm really wants
+因为如果算法真的需要
+
+162
+00:05:38,600 --> 00:05:39,450
+a feature that is always equal
+一个特征永远为1
+
+163
+00:05:40,060 --> 00:05:41,830
+to 1, it can choose to learn one for itself.
+它可以选择靠自己去获得1这个数值
+
+164
+00:05:42,290 --> 00:05:43,430
+So if the algorithm chooses,
+所以如果这算法想要的话
+
+165
+00:05:43,720 --> 00:05:45,330
+it can set the feature X1 equals 1.
+它可以将特征值 x1 设为1
+
+166
+00:05:45,670 --> 00:05:47,010
+So there's no need
+所以没有必要
+
+167
+00:05:47,260 --> 00:05:48,300
+to hard code the feature of
+去将1 这个特征定死
+
+168
+00:05:48,440 --> 00:05:50,060
+001, the algorithm now has
+这样算法有了
+
+169
+00:05:50,340 --> 00:05:55,890
+the flexibility to just learn it by itself. So, putting
+灵活性去自行学习
+
+170
+00:05:56,420 --> 00:05:58,410
+everything together, here is
+所以 把所有讲的这些合起来
+
+171
+00:05:58,780 --> 00:05:59,910
+our collaborative filtering algorithm.
+即是我们的协同过滤算法
+
+172
+00:06:01,460 --> 00:06:02,330
+first we are going to
+首先我们将会把
+
+173
+00:06:03,010 --> 00:06:05,580
+initialize x and
+x 和 θ
+
+174
+00:06:05,820 --> 00:06:07,290
+theta to small random values.
+初始为小的随机值
+
+175
+00:06:08,450 --> 00:06:09,200
+And this is a little bit
+这有点像
+
+176
+00:06:09,310 --> 00:06:11,700
+like neural network training, where there
+神经网络训练
+
+177
+00:06:11,720 --> 00:06:14,240
+we were also initializing all the parameters of a neural network to small random values.
+我们也是将所有神经网路的参数用小的随机数值来初始化
+
+178
+00:06:16,640 --> 00:06:17,730
+Next we're then going
+接下来 我们要用
+
+179
+00:06:17,950 --> 00:06:20,110
+to minimize the cost function using
+梯度下降 或者某些其他的高级优化算法
+
+180
+00:06:20,500 --> 00:06:23,360
+great intercepts or one of the advance optimization algorithms.
+把这个代价函数最小化
+
+181
+00:06:24,610 --> 00:06:25,890
+So, if you take derivatives you
+所以如果你求导的话
+
+182
+00:06:26,020 --> 00:06:27,460
+find that the great intercept
+你会发现梯度下降法
+
+183
+00:06:27,590 --> 00:06:29,320
+like these and so this
+写出来的更新式是这样的
+
+184
+00:06:29,630 --> 00:06:31,160
+term here is the
+这个部分就是
+
+185
+00:06:31,660 --> 00:06:33,890
+partial derivative of the cost function,
+代价函数
+
+186
+00:06:35,140 --> 00:06:35,940
+I'm not going to write that out,
+这里我简写了
+
+187
+00:06:36,110 --> 00:06:37,860
+with respect to the feature
+关于特征值 x(i)k 的偏微分
+
+188
+00:06:38,070 --> 00:06:40,020
+value Xik and similarly
+然后相同地
+
+189
+00:06:41,020 --> 00:06:42,430
+this term here is also
+这部分
+
+190
+00:06:43,030 --> 00:06:44,660
+a partial derivative value of
+也是代价函数
+
+191
+00:06:44,730 --> 00:06:46,480
+the cost function with respect to the parameter
+关于我们正在最小化的参数 θ
+
+192
+00:06:46,930 --> 00:06:48,950
+theta that we're minimizing.
+所做的偏微分
+
+193
+00:06:50,210 --> 00:06:51,410
+And just as a reminder, in
+提醒一下
+
+194
+00:06:51,760 --> 00:06:52,920
+this formula that we no
+这公式里
+
+195
+00:06:53,130 --> 00:06:54,760
+longer have this X0 equals
+我们不再有这等于1 的 x0 项
+
+196
+00:06:54,970 --> 00:06:56,740
+1 and so we have
+所以
+
+197
+00:06:57,010 --> 00:07:00,010
+that x is in Rn and theta is a Rn.
+x 是 n 维 θ 也是n 维
+
+198
+00:07:01,480 --> 00:07:03,100
+In this new formalism, we're regularizing
+在这个新的表达式里
+
+199
+00:07:03,760 --> 00:07:05,220
+every one of our perimeters theta, you know, every one of our parameters Xn.
+我们将所有的参数 θ 和 xn 做正则化
+
+200
+00:07:07,400 --> 00:07:09,060
+There's no longer the special
+不存在 θ0
+
+201
+00:07:09,480 --> 00:07:11,850
+case theta zero, which was
+这种特殊的情况 会需要不同地正则化
+
+202
+00:07:12,210 --> 00:07:13,760
+regularized differently, or which
+或者说是
+
+203
+00:07:13,860 --> 00:07:15,440
+was not regularized compared to
+跟 θ1 到 θn 的正则化
+
+204
+00:07:15,560 --> 00:07:17,650
+the parameters theta 1 down to theta.
+不同的 θ0 的正则化
+
+205
+00:07:18,370 --> 00:07:19,710
+So there is now no longer a
+所以现在不存在 θ0
+
+206
+00:07:20,070 --> 00:07:21,150
+theta 0, which is why
+这就是为什么
+
+207
+00:07:21,400 --> 00:07:22,450
+in these updates, I did not
+在这些更新式里
+
+208
+00:07:22,700 --> 00:07:24,080
+break out a special case for k equals 0.
+我并没有分出 k 等于0的特殊情况
+
+209
+00:07:26,070 --> 00:07:27,230
+So we then use gradient descent
+所以我们使用梯度下降
+
+210
+00:07:27,740 --> 00:07:28,710
+to minimize the cost function
+来最小化这个
+
+211
+00:07:29,090 --> 00:07:30,260
+j with respect to the
+代价函数 J
+
+212
+00:07:30,390 --> 00:07:32,000
+features x and with respect to the parameters theta.
+关于特征 x 和参数 θ
+
+213
+00:07:33,160 --> 00:07:35,050
+And finally, given a
+最后 给你一个用户
+
+214
+00:07:35,140 --> 00:07:36,320
+user, if a user
+如果这个用户
+
+215
+00:07:36,570 --> 00:07:38,920
+has some parameters, theta, and
+具有一些参数 θ
+
+216
+00:07:39,410 --> 00:07:40,540
+if there's a movie with
+以及给你一部电影
+
+217
+00:07:40,690 --> 00:07:41,980
+some sort of learned features x,
+带有已知的特征 x
+
+218
+00:07:42,580 --> 00:07:43,720
+we would then predict that that
+我们可以预测
+
+219
+00:07:43,970 --> 00:07:44,940
+movie would be given a
+这部电影会被
+
+220
+00:07:45,030 --> 00:07:46,200
+star rating by that user
+θ 转置乘以 x
+
+221
+00:07:47,010 --> 00:07:48,780
+of theta transpose j. Or
+给出怎样的评分
+
+222
+00:07:48,860 --> 00:07:50,370
+just to fill those in,
+或者将这些直接填入
+
+223
+00:07:50,640 --> 00:07:52,250
+then we're saying that if user
+那我们可以说
+
+224
+00:07:52,630 --> 00:07:53,780
+J has not yet
+如果用户 j
+
+225
+00:07:54,010 --> 00:07:55,980
+rated movie I, then
+尚未对电影 i 评分
+
+226
+00:07:56,170 --> 00:07:57,300
+what we do is predict that
+那我们可以预测
+
+227
+00:07:58,150 --> 00:07:59,120
+user J is going to
+这个用户 j
+
+228
+00:07:59,710 --> 00:08:01,420
+rate movie I according to
+将会根据 θ(j) 转置乘以 x(i)
+
+229
+00:08:02,300 --> 00:08:04,230
+theta J transpose Xi.
+对电影 i 评分
+
+230
+00:08:06,650 --> 00:08:08,010
+So that's the collaborative
+所以这就是
+
+231
+00:08:08,810 --> 00:08:10,170
+filtering algorithm and if
+协同过滤算法
+
+232
+00:08:10,310 --> 00:08:12,230
+you implement this algorithm you actually get a pretty
+如果你使用这个算法
+
+233
+00:08:12,730 --> 00:08:14,080
+decent algorithm that will simultaneously
+你可以得到一个十分有用的算法
+
+234
+00:08:15,060 --> 00:08:16,770
+learn good features for hopefully
+可以同时学习
+
+235
+00:08:17,110 --> 00:08:18,460
+all the movies as well as
+几乎所有电影的特征
+
+236
+00:08:18,570 --> 00:08:19,890
+learn parameters for all the
+和所有用户参数
+
+237
+00:08:20,050 --> 00:08:21,290
+users and hopefully give
+然后有很大机会
+
+238
+00:08:21,440 --> 00:08:23,060
+pretty good predictions for how
+能对不同用户会如何对他们尚未评分的电影做出评??价
+
+239
+00:08:23,290 --> 00:08:25,890
+different users will rate different movies that they have not yet rated
+给出相当准确的预测
+
diff --git a/srt/16 - 5 - Vectorization_ Low Rank Matrix Factorization (8 min).srt b/srt/16 - 5 - Vectorization_ Low Rank Matrix Factorization (8 min).srt
new file mode 100644
index 00000000..f43a274b
--- /dev/null
+++ b/srt/16 - 5 - Vectorization_ Low Rank Matrix Factorization (8 min).srt
@@ -0,0 +1,1182 @@
+1
+00:00:00,530 --> 00:00:01,650
+In the last few videos, we
+在最后几个视频中,(字幕翻译:中国海洋大学,周丽雅)
+
+2
+00:00:01,730 --> 00:00:03,890
+talked about a collaborative filtering algorithm.
+我们讨论一个协同过滤算法。
+
+3
+00:00:04,830 --> 00:00:05,890
+In this video I'm going to
+在这个视频中,我将
+
+4
+00:00:05,970 --> 00:00:07,120
+say a little bit about the
+介绍一下这个算法的
+
+5
+00:00:07,490 --> 00:00:09,090
+vectorization implementation of this algorithm.
+向量化实现,
+
+6
+00:00:09,980 --> 00:00:12,670
+And also talk a little bit about other things you can do with this algorithm.
+另外再介绍一下你使用这个算法可以实现的一些功能。
+
+7
+00:00:13,340 --> 00:00:14,520
+For example, one of the
+比如说,你可以用这个算法实现:
+
+8
+00:00:14,600 --> 00:00:15,830
+things you can do is, given
+给定一个商品,
+
+9
+00:00:16,180 --> 00:00:17,390
+one product can you find
+你可以找到
+
+10
+00:00:17,770 --> 00:00:19,160
+other products that are related
+与之相关的其他商品。
+
+11
+00:00:19,270 --> 00:00:20,210
+to this so that for
+比如说:
+
+12
+00:00:20,490 --> 00:00:23,140
+example, a user has recently been looking at one product.
+一个用户最近一直在寻找一个商品
+
+13
+00:00:23,650 --> 00:00:24,990
+Are there other related products
+有没有一些相关的其他商品
+
+14
+00:00:25,520 --> 00:00:27,170
+that you could recommend to this user?
+你能推荐给这个用户?
+
+15
+00:00:27,620 --> 00:00:28,980
+So let's see what we could do about that.
+所以针对这样的问题一起看一下我们如何解决
+
+16
+00:00:30,170 --> 00:00:31,190
+What I'd like to do is work
+我希望可以
+
+17
+00:00:31,550 --> 00:00:33,520
+out an alternative way of
+找到另一种方法
+
+18
+00:00:33,740 --> 00:00:35,710
+writing out the predictions of the collaborative filtering algorithm.
+写出协同过滤算法的预测值
+
+19
+00:00:37,370 --> 00:00:38,590
+To start, here is our
+首先,这是我们的数据集,
+
+20
+00:00:38,960 --> 00:00:40,440
+data set with our
+数据集中包含
+
+21
+00:00:40,750 --> 00:00:41,880
+five movies and what I'm
+五部电影,下一步
+
+22
+00:00:42,160 --> 00:00:43,150
+going to do is take
+我要做的是
+
+23
+00:00:43,390 --> 00:00:44,520
+all the ratings by all the
+得到所有用户对所有电影的评分
+
+24
+00:00:44,850 --> 00:00:46,500
+users and group them into
+然后把它们分组写入矩阵
+
+25
+00:00:47,080 --> 00:00:48,800
+a matrix. So, here we have
+这里我们有五部电影、
+
+26
+00:00:49,200 --> 00:00:51,390
+five movies and four
+四个用户,
+
+27
+00:00:51,670 --> 00:00:53,390
+users, and so this
+因此
+
+28
+00:00:53,670 --> 00:00:54,550
+matrix y is going to be
+这个矩阵y是5行4列的矩阵。
+
+29
+00:00:54,910 --> 00:00:57,110
+a 5 by 4 matrix. It's just you know, taking all
+正如你所知,构成矩阵时,要包括所有的元素,
+
+30
+00:00:57,340 --> 00:00:58,770
+of the elements, all of this data.
+和所有的数据,
+
+31
+00:00:59,820 --> 00:01:02,390
+Including question marks, and grouping them into this matrix.
+包括问号,然后把它们按组写入矩阵。
+
+32
+00:01:03,290 --> 00:01:04,470
+And of course the elements of this
+当然,这个矩阵的
+
+33
+00:01:04,650 --> 00:01:06,400
+matrix of the (i, j) element of
+第(i, j)个元素
+
+34
+00:01:06,500 --> 00:01:07,860
+this matrix is really what
+就是
+
+35
+00:01:08,060 --> 00:01:09,710
+we were previously writing as y
+我们之前说的y(i,j)
+
+36
+00:01:10,520 --> 00:01:12,090
+superscript i, j. It's
+其中ij是上标。它是
+
+37
+00:01:12,220 --> 00:01:13,480
+the rating given to movie i
+第j个用户给第i部电影的评分。
+
+38
+00:01:14,140 --> 00:01:15,640
+by user j. Given this
+给定矩阵y
+
+39
+00:01:16,070 --> 00:01:17,290
+matrix y of all the
+y包含所有我们已知的评分,
+
+40
+00:01:17,430 --> 00:01:18,520
+ratings that we have, there's
+有
+
+41
+00:01:18,700 --> 00:01:20,500
+an alternative way of writing
+另一种方法可以写出
+
+42
+00:01:20,880 --> 00:01:23,340
+out all the predictive ratings of the algorithm.
+这个算法的所有预测评分。
+
+43
+00:01:24,320 --> 00:01:26,210
+And, in particular if you
+尤其是如果你想
+
+44
+00:01:26,430 --> 00:01:27,540
+look at what a certain
+查看某一个用户
+
+45
+00:01:27,920 --> 00:01:29,480
+user predicts on a
+对某一个电影的评分预测
+
+46
+00:01:29,690 --> 00:01:31,250
+certain movie, what user j
+用户j
+
+47
+00:01:31,950 --> 00:01:35,540
+predicts on movie i is given by this formula.
+对电影i的评分预测由这个公式给出(θ(j)的转置乘以x(i))
+
+48
+00:01:37,010 --> 00:01:38,570
+And so, if you have
+因此,如果你
+
+49
+00:01:39,440 --> 00:01:40,330
+a matrix of the predicted
+有一个预测评分的矩阵
+
+50
+00:01:40,910 --> 00:01:42,000
+ratings, what you would
+你所拥有的
+
+51
+00:01:42,180 --> 00:01:43,600
+have is the following
+就是下面这个矩阵,
+
+52
+00:01:45,030 --> 00:01:48,140
+matrix where the i, j entry.
+矩阵元素的标号为i,j
+
+53
+00:01:49,650 --> 00:01:51,440
+So this corresponds to the rating
+这对应了我们预测的
+
+54
+00:01:52,000 --> 00:01:54,020
+that we predict using j
+用户j给电影i的打分
+
+55
+00:01:54,460 --> 00:01:55,690
+will give to movie i
+
+56
+00:01:57,130 --> 00:01:58,440
+is exactly equal to that
+这与
+
+57
+00:01:58,790 --> 00:02:00,680
+theta j transpose
+theta j 转置乘以xi的值相等。
+
+58
+00:02:00,900 --> 00:02:01,940
+XI, and so, you know, this is a matrix
+因此,这个矩阵中
+
+59
+00:02:02,520 --> 00:02:04,310
+where this first element
+第一个元素
+
+60
+00:02:04,750 --> 00:02:05,930
+the one-one element is a
+即第一行第一列的元素
+
+61
+00:02:06,220 --> 00:02:07,450
+predictive rating of user one
+是用户一对
+
+62
+00:02:07,760 --> 00:02:09,360
+or movie one and this
+电影一的评分预测
+
+63
+00:02:09,560 --> 00:02:11,070
+element, this is the one-two
+这是第一行第二个元素
+
+64
+00:02:11,430 --> 00:02:12,680
+element is the predicted rating
+它是第二个用户
+
+65
+00:02:13,470 --> 00:02:14,640
+of user two on movie
+对第一部电影的评分预测,
+
+66
+00:02:14,930 --> 00:02:16,070
+one, and so on,
+以此类推。
+
+67
+00:02:16,630 --> 00:02:18,670
+and this is the
+这是
+
+68
+00:02:19,000 --> 00:02:20,130
+predicted rating of user one
+第一个用户
+
+69
+00:02:20,930 --> 00:02:23,380
+on the last movie and
+对最后一部电影的评分预测。
+
+70
+00:02:23,640 --> 00:02:25,100
+if you want, you know,
+如果你要预测,
+
+71
+00:02:25,400 --> 00:02:26,870
+this rating is what we
+则这个评分就是
+
+72
+00:02:27,020 --> 00:02:28,050
+would have predicted for this value
+对这个值的预测;
+
+73
+00:02:29,050 --> 00:02:32,470
+and this rating is
+这个评分就是
+
+74
+00:02:32,650 --> 00:02:33,570
+what we would have predicted for that
+对这个值的预测,
+
+75
+00:02:33,910 --> 00:02:35,080
+value, and so on.
+以此类推。
+
+76
+00:02:36,180 --> 00:02:37,480
+Now, given this matrix of
+现在,给定这个预测评分矩阵
+
+77
+00:02:37,560 --> 00:02:39,290
+Predictive ratings there is then
+则有一个
+
+78
+00:02:39,610 --> 00:02:42,670
+a simpler or vectorized way of writing these out.
+比较简单的或者向量化的方法来写出它们。
+
+79
+00:02:43,640 --> 00:02:44,640
+In particular if I define
+比如说,如果我定义
+
+80
+00:02:45,120 --> 00:02:46,850
+the matrix x, and this
+矩阵x,它
+
+81
+00:02:46,970 --> 00:02:48,090
+is going to be just
+可以写成
+
+82
+00:02:48,370 --> 00:02:50,980
+like the matrix we had earlier for linear regression to be
+像之前讲过的线性回归的矩阵形式
+
+83
+00:02:52,070 --> 00:02:53,820
+sort of x1 transpose x2
+第一行是x1的转置
+
+84
+00:02:55,050 --> 00:02:57,060
+transpose down to
+然后第二行是x2的转置
+
+85
+00:02:58,530 --> 00:03:01,740
+x of nm transpose.
+一直到xnm的转置
+
+86
+00:03:02,420 --> 00:03:03,320
+So I'm take all the features
+我将提取所有的电影的特征
+
+87
+00:03:04,210 --> 00:03:05,670
+for my movies and stack
+然后逐行的
+
+88
+00:03:06,140 --> 00:03:07,260
+them in rows.
+写入矩阵中。
+
+89
+00:03:07,950 --> 00:03:08,860
+So if you think of
+所以如果将
+
+90
+00:03:08,980 --> 00:03:09,810
+each movie as one example
+每部电影看作一个样本
+
+91
+00:03:10,350 --> 00:03:11,200
+and stack all of the features
+将不同电影的所有属性都
+
+92
+00:03:11,670 --> 00:03:13,460
+of the different movies and rows.
+按行写入矩阵
+
+93
+00:03:14,290 --> 00:03:16,160
+And if we also to
+如果我们
+
+94
+00:03:16,280 --> 00:03:18,550
+find a matrix capital theta,
+找到一个矩阵,用大写的theta 表示
+
+95
+00:03:19,870 --> 00:03:20,840
+and what I'm going to
+我要做的是
+
+96
+00:03:21,180 --> 00:03:22,490
+do is take each of
+取出每个
+
+97
+00:03:22,750 --> 00:03:25,780
+the per user parameter
+用户参数向量
+
+98
+00:03:26,280 --> 00:03:28,520
+vectors, and stack them in rows, like so.
+像这样按行写入
+
+99
+00:03:28,790 --> 00:03:29,690
+So that's theta 1, which
+这是theta 1
+
+100
+00:03:30,220 --> 00:03:31,880
+is the parameter vector for the first user.
+是对第一个用户的参数向量
+
+101
+00:03:33,430 --> 00:03:36,100
+And, you know, theta 2, and
+theta 2
+
+102
+00:03:37,040 --> 00:03:38,100
+so, you must stack
+你必须像这样
+
+103
+00:03:38,360 --> 00:03:39,470
+them in rows like this to
+按行把它们写入
+
+104
+00:03:39,650 --> 00:03:41,530
+define a matrix capital
+来定义矩阵大写theta
+
+105
+00:03:42,070 --> 00:03:43,830
+theta and so I have
+因此我有
+
+106
+00:03:45,870 --> 00:03:48,410
+nu parameter vectors all stacked in rows like this.
+nu 个参数向量,像这样按行写入矩阵
+
+107
+00:03:50,000 --> 00:03:51,390
+Now given this definition
+现在给定这个
+
+108
+00:03:52,080 --> 00:03:53,400
+for the matrix x and this
+对矩阵x的定义,和
+
+109
+00:03:53,590 --> 00:03:54,870
+definition for the matrix theta
+这个对矩阵theta的定义
+
+110
+00:03:55,820 --> 00:03:56,970
+in order to have a
+为了获得
+
+111
+00:03:57,290 --> 00:03:59,330
+vectorized way of computing the
+一个向量化方法来计算
+
+112
+00:03:59,420 --> 00:04:00,330
+matrix of all the predictions
+预测矩阵,
+
+113
+00:04:01,060 --> 00:04:03,570
+you can just compute x times
+你可以只是计算x乘以
+
+114
+00:04:04,710 --> 00:04:07,050
+the matrix theta transpose, and
+矩阵theta 的转置,
+
+115
+00:04:07,160 --> 00:04:08,380
+that gives you a vectorized way
+它就是一个向量化的方法
+
+116
+00:04:08,570 --> 00:04:10,530
+of computing this matrix over here.
+来计算这个矩阵,
+
+117
+00:04:11,680 --> 00:04:12,460
+To give the collaborative filtering
+这个协同过滤算法
+
+118
+00:04:12,480 --> 00:04:15,220
+algorithm that you've been using another name.
+有另一个名字,
+
+119
+00:04:16,070 --> 00:04:17,190
+The algorithm that we're using
+我们正使用的这个算法
+
+120
+00:04:17,660 --> 00:04:19,840
+is also called low rank
+也叫低秩
+
+121
+00:04:21,240 --> 00:04:22,540
+matrix factorization.
+矩阵分解
+
+122
+00:04:24,280 --> 00:04:25,410
+And so if you hear
+因此如果你听到
+
+123
+00:04:25,620 --> 00:04:26,760
+people talk about low rank matrix
+人们谈论低秩矩阵
+
+124
+00:04:27,210 --> 00:04:29,490
+factorization that's essentially exactly
+分解,基本上他们所说的就是
+
+125
+00:04:30,390 --> 00:04:32,100
+the algorithm that we have been talking about.
+我们正在讨论的这个算法。
+
+126
+00:04:32,590 --> 00:04:33,900
+And this term comes from the
+这个术语来自于:
+
+127
+00:04:33,990 --> 00:04:36,100
+property that this matrix
+这个矩阵的数学性质;
+
+128
+00:04:36,770 --> 00:04:38,880
+x times theta transpose has a
+矩阵x乘以theta 的转置
+
+129
+00:04:39,110 --> 00:04:40,780
+mathematical property in linear
+在线性代数中有一个数学性质
+
+130
+00:04:41,030 --> 00:04:42,410
+algebra called that this
+称为
+
+131
+00:04:42,670 --> 00:04:43,820
+is a low rank matrix and
+低秩矩阵
+
+132
+00:04:44,720 --> 00:04:45,800
+so that's what gives
+这就是为什么算法
+
+133
+00:04:46,060 --> 00:04:47,190
+rise to this name low
+起名叫
+
+134
+00:04:47,340 --> 00:04:48,570
+rank matrix factorization for these
+低秩矩阵分解的原因。
+
+135
+00:04:48,930 --> 00:04:50,240
+algorithms, because of this low
+因为
+
+136
+00:04:50,410 --> 00:04:53,580
+rank property of this matrix x theta transpose.
+矩阵x乘以theta 的转置的低秩性质。
+
+137
+00:04:54,830 --> 00:04:55,640
+In case you don't know what
+如果你不知道什么
+
+138
+00:04:55,910 --> 00:04:57,310
+low rank means or in case you don't
+是低秩,或者如果你不知道
+
+139
+00:04:57,620 --> 00:04:59,770
+know what a low rank matrix is, don't worry about it.
+一个低秩矩阵是什么,不要紧,
+
+140
+00:04:59,970 --> 00:05:02,820
+You really don't need to know that in order to use this algorithm.
+如果只是为了使用这个算法,你不需要知道那些知识。
+
+141
+00:05:03,740 --> 00:05:04,790
+But if you're an expert in
+但是如果你是一个线性代数领域的专家,或了解线性代数
+
+142
+00:05:04,890 --> 00:05:06,110
+linear algebra, that's what gives
+那是为什么这个算法
+
+143
+00:05:06,320 --> 00:05:07,580
+this algorithm, this other name
+又有另一个名字
+
+144
+00:05:07,850 --> 00:05:12,370
+of low rank matrix factorization.
+低秩矩阵分解的原因
+
+145
+00:05:12,620 --> 00:05:14,090
+Finally, having run the
+最后,在已经运行了
+
+146
+00:05:14,300 --> 00:05:16,350
+collaborative filtering algorithm here's
+协同过滤算法之后
+
+147
+00:05:17,310 --> 00:05:18,160
+something else that you can do
+再讲一个问题,
+
+148
+00:05:18,530 --> 00:05:20,060
+which is use the learned
+利用已经学习到的属性
+
+149
+00:05:20,320 --> 00:05:23,510
+features in order to find related movies.
+来找到相关的电影。
+
+150
+00:05:25,060 --> 00:05:26,810
+Specifically for each product i
+具体的说,就是对每个商品i,
+
+151
+00:05:27,050 --> 00:05:27,810
+really for each movie i, we've
+比如对每个电影i,我们已经
+
+152
+00:05:28,810 --> 00:05:30,970
+learned a feature vector xi.
+学到一个属性向量xi,
+
+153
+00:05:31,740 --> 00:05:32,880
+So, you know, when you learn a
+当你学习某一组特征时,
+
+154
+00:05:32,930 --> 00:05:34,220
+certain features without really know
+你之前并不知道
+
+155
+00:05:34,590 --> 00:05:35,420
+that can the advance what the
+该选取哪些
+
+156
+00:05:35,610 --> 00:05:37,850
+different features are going to be, but if you
+不同的特征,但是如果你
+
+157
+00:05:37,940 --> 00:05:39,550
+run the algorithm and perfectly the features
+运行这个算法,一些特征
+
+158
+00:05:39,990 --> 00:05:41,690
+will tend to capture what are
+将捕捉到
+
+159
+00:05:41,930 --> 00:05:43,490
+the important aspects of these
+
+160
+00:05:43,730 --> 00:05:45,340
+different movies or different products or what have you.
+不同电影或不同商品的重要的方面。
+
+161
+00:05:45,480 --> 00:05:47,120
+What are the important aspects that cause
+这些重要的方面将导致
+
+162
+00:05:47,610 --> 00:05:48,600
+some users to like certain
+一些用户喜欢某些
+
+163
+00:05:48,930 --> 00:05:49,830
+movies and cause some users
+电影,导致一些用户
+
+164
+00:05:50,210 --> 00:05:51,670
+to like different sets of movies.
+喜欢另外一些电影。
+
+165
+00:05:52,470 --> 00:05:53,380
+So maybe you end up
+可能你最终
+
+166
+00:05:53,540 --> 00:05:55,050
+learning a feature, you know, where x1
+学习一个特征,比如x1
+
+167
+00:05:55,260 --> 00:05:56,550
+equals romance, x2 equals
+代表浪漫爱情,x2代表
+
+168
+00:05:57,060 --> 00:05:59,180
+action similar to
+动作片
+
+169
+00:05:59,460 --> 00:06:00,590
+an earlier video and maybe you
+也许你
+
+170
+00:06:00,710 --> 00:06:02,100
+learned a different feature x3 which
+学到另一个不同的属性x3,
+
+171
+00:06:02,210 --> 00:06:04,520
+is a degree to which this is a comedy.
+描述电影的喜剧效果
+
+172
+00:06:05,330 --> 00:06:07,000
+Then some feature x4 which is, you know, some other thing.
+特征x4可能代表其他的特征。
+
+173
+00:06:07,270 --> 00:06:09,750
+And you have N
+这样你总共有N
+
+174
+00:06:09,940 --> 00:06:11,600
+features all together and after
+个特征,在你
+
+175
+00:06:12,610 --> 00:06:14,420
+you have learned features it's actually often
+学习完特征之后,实际上
+
+176
+00:06:14,750 --> 00:06:16,030
+pretty difficult to go in
+很难理解
+
+177
+00:06:16,420 --> 00:06:18,120
+to the learned features and come
+这些被学习到的特征
+
+178
+00:06:18,390 --> 00:06:19,980
+up with a human understandable
+并对这些特征给出人类可以理解的解释。
+
+179
+00:06:20,810 --> 00:06:22,850
+interpretation of what these features really are.
+
+180
+00:06:22,950 --> 00:06:24,540
+But in practice, you know, the
+但是,实际上,
+
+181
+00:06:24,620 --> 00:06:27,480
+features even though these features can be hard to visualize.
+即使这些特征难以可视化,
+
+182
+00:06:28,100 --> 00:06:29,570
+It can be hard to figure out just what these features are.
+人们难以理解这些特征的含义,
+
+183
+00:06:31,070 --> 00:06:32,160
+Usually, it will learn
+但是,通常,算法将学到一些重要
+
+184
+00:06:32,410 --> 00:06:33,400
+features that are very meaningful
+特征,这些特征非常有意义,
+
+185
+00:06:33,960 --> 00:06:35,250
+for capturing whatever are the
+它们捕捉到一部电影的
+
+186
+00:06:35,870 --> 00:06:37,120
+most important or the most salient
+最重要的特征
+
+187
+00:06:37,880 --> 00:06:39,300
+properties of a movie
+
+188
+00:06:39,710 --> 00:06:41,800
+that causes you to like or dislike it.
+这些特征导致了你喜欢或不喜欢这部电影。
+
+189
+00:06:41,860 --> 00:06:44,950
+And so now let's say we want to address the following problem.
+现在再看下一个问题:
+
+190
+00:06:45,970 --> 00:06:47,410
+Say you have some specific movie
+比如你有一部电影
+
+191
+00:06:47,790 --> 00:06:48,980
+i and you want
+i,你想要
+
+192
+00:06:49,120 --> 00:06:50,750
+to find other movies j
+找到另一部电影j
+
+193
+00:06:51,620 --> 00:06:52,680
+that are related to that movie.
+它与电影i相关。
+
+194
+00:06:53,150 --> 00:06:54,770
+And so well, why would you want to do this?
+为什么你要这样做呢?
+
+195
+00:06:54,920 --> 00:06:56,120
+Right, maybe you have a
+好的,假设你有一个用户
+
+196
+00:06:56,320 --> 00:06:57,840
+user that's browsing movies, and they're
+正在浏览电影,他们
+
+197
+00:06:58,360 --> 00:07:00,210
+currently watching movie j, than
+当前正在看电影j,
+
+198
+00:07:00,550 --> 00:07:01,820
+what's a reasonable movie to recommend
+那么在他们看完电影j后,
+
+199
+00:07:02,350 --> 00:07:04,110
+to them to watch after they're done with movie j?
+推荐给他们哪一部电影比较合理呢?
+
+200
+00:07:04,530 --> 00:07:06,040
+Or if someone's recently purchased movie
+或者如果有人最近买了电影j的碟片,
+
+201
+00:07:06,330 --> 00:07:07,470
+j, well, what's a different
+
+202
+00:07:07,730 --> 00:07:11,000
+movie that would be reasonable to recommend to them for them to consider purchasing.
+那么向他们再推荐哪部电影更合理呢?
+
+203
+00:07:12,190 --> 00:07:13,000
+So, now that you have
+现在,既然你已经
+
+204
+00:07:13,080 --> 00:07:14,540
+learned these feature vectors, this gives
+学习到了这些特征向量,这给了
+
+205
+00:07:14,640 --> 00:07:16,080
+us a very convenient way to
+我们一种方便的方法去
+
+206
+00:07:16,250 --> 00:07:17,930
+measure how similar two movies are.
+衡量两个电影的相似度。
+
+207
+00:07:18,670 --> 00:07:20,530
+In particular, movie i
+举例说,电影i
+
+208
+00:07:21,460 --> 00:07:22,340
+has a feature vector xi.
+有一个特征向量xi,
+
+209
+00:07:23,290 --> 00:07:24,200
+and so if you can find
+如果你找到
+
+210
+00:07:24,640 --> 00:07:27,500
+a different movie, j, so
+一个另一个电影j,
+
+211
+00:07:27,710 --> 00:07:29,300
+that the distance between
+xi和xj的距离很小,
+
+212
+00:07:29,780 --> 00:07:30,800
+xi and xj is small,
+
+213
+00:07:33,080 --> 00:07:34,010
+then this is a pretty
+那么,这就表明
+
+214
+00:07:34,430 --> 00:07:36,980
+strong indication that, you know, movies
+
+215
+00:07:37,830 --> 00:07:41,360
+j and i are somehow similar.
+电影j和i相似,
+
+216
+00:07:42,320 --> 00:07:44,080
+At least in the sense that some of them
+至少从这个意义上说,
+
+217
+00:07:44,200 --> 00:07:46,950
+likes movie i, maybe more likely to like movie j as well.
+一些喜欢电影i的人也可能喜欢电影j.
+
+218
+00:07:47,760 --> 00:07:49,940
+So, just to recap, if
+现在,简要的回顾一下,如果
+
+219
+00:07:50,590 --> 00:07:52,130
+your user is looking
+你的用户正在看
+
+220
+00:07:52,510 --> 00:07:53,710
+at some movie i and if
+某个电影i,如果
+
+221
+00:07:54,150 --> 00:07:55,060
+you want to find the 5
+你想要找到5个
+
+222
+00:07:55,310 --> 00:07:56,640
+most similar movies to that
+与电影i最相似的电影
+
+223
+00:07:56,900 --> 00:07:57,860
+movie in order to recommend
+目的是推荐
+
+224
+00:07:58,230 --> 00:07:59,590
+5 new movies to them, what
+五部新电影给用户,你要做的是
+
+225
+00:07:59,690 --> 00:08:00,650
+you do is find the five
+找到5部电影j,
+
+226
+00:08:00,970 --> 00:08:02,260
+movies j, with the
+
+227
+00:08:02,340 --> 00:08:03,880
+smallest distance between the
+这些电影的特征向量与电影i的特征向量
+
+228
+00:08:04,190 --> 00:08:05,680
+features between these different movies.
+有最小的距离。
+
+229
+00:08:06,550 --> 00:08:09,220
+And this could give you a few different movies to recommend to your user.
+这样你就能够向你的用户推荐几部不同的电影了。
+
+230
+00:08:10,010 --> 00:08:11,500
+So with that, hopefully, you
+希望通过以上的学习,你
+
+231
+00:08:11,680 --> 00:08:13,350
+now know how to use
+现在知道如何使用
+
+232
+00:08:13,700 --> 00:08:15,930
+a vectorized implementation to compute
+一个向量化的实现来计算
+
+233
+00:08:16,560 --> 00:08:18,130
+all the predicted ratings of
+所有用户对所有电影的
+
+234
+00:08:18,210 --> 00:08:20,280
+all the users and all the
+评分预测值。
+
+235
+00:08:20,390 --> 00:08:21,720
+movies, and also how to do
+也可以实现
+
+236
+00:08:21,920 --> 00:08:23,300
+things like use learned features
+利用已经学习到的特征
+
+237
+00:08:23,930 --> 00:08:25,360
+to find what might be movies
+来找到彼此
+
+238
+00:08:25,480 --> 00:08:27,490
+and what might be products that aren't related to each other.
+(不?)相关的电影或商品。
+
diff --git a/srt/16 - 6 - Implementational Detail_ Mean Normalization (9 min).srt b/srt/16 - 6 - Implementational Detail_ Mean Normalization (9 min).srt
new file mode 100644
index 00000000..48ab81ca
--- /dev/null
+++ b/srt/16 - 6 - Implementational Detail_ Mean Normalization (9 min).srt
@@ -0,0 +1,1225 @@
+1
+00:00:00,400 --> 00:00:01,510
+By now you've seen all
+到目前为止 你已经了解到了
+
+2
+00:00:01,800 --> 00:00:03,600
+of the main pieces of the
+推荐系统算法或者
+
+3
+00:00:04,030 --> 00:00:06,760
+recommender system algorithm or the collaborative filtering algorithm.
+协同过滤算法的所有要点
+
+4
+00:00:07,770 --> 00:00:08,770
+In this video I want
+在这节视频中
+
+5
+00:00:08,940 --> 00:00:10,620
+to just share one last implementational detail,
+我想分享最后一点实现过程中的细节
+
+6
+00:00:12,000 --> 00:00:14,140
+namely mean normalization, which
+这一点就是均值归一化
+
+7
+00:00:14,350 --> 00:00:15,520
+can sometimes just make the
+有时它可以让算法
+
+8
+00:00:15,570 --> 00:00:17,090
+algorithm work a little bit better.
+运行得更好
+
+9
+00:00:18,290 --> 00:00:20,820
+To motivate the idea of mean normalization, let's
+为了了解均值归一化这个想法的动机
+
+10
+00:00:22,130 --> 00:00:24,390
+consider an example of where
+我们考虑这样一个例子
+
+11
+00:00:24,710 --> 00:00:26,530
+there's a user that has not rated any movies.
+有一个用户没有给任何电影评分
+
+12
+00:00:28,050 --> 00:00:29,290
+So, in addition to our
+加上之前我们有四个用户
+
+13
+00:00:29,540 --> 00:00:30,790
+four users, Alice, Bob, Carol,
+Alice Bob Carol 和 Dave
+
+14
+00:00:31,060 --> 00:00:32,710
+and Dave, I've added a
+我现在加上了第五个用户 Eve
+
+15
+00:00:32,870 --> 00:00:35,110
+fifth user, Eve, who hasn't rated any movies.
+她没有给任何电影评分
+
+16
+00:00:36,470 --> 00:00:37,920
+Let's see what our collaborative filtering
+我们来看看协同过滤算法
+
+17
+00:00:38,350 --> 00:00:39,570
+algorithm will do on this user.
+会对这个用户做什么
+
+18
+00:00:41,020 --> 00:00:43,140
+Let's say that n is equal to 2 and so
+假如说 n 等于2
+
+19
+00:00:43,390 --> 00:00:44,420
+we're going to learn two features
+所以我们要学习两个特征变量
+
+20
+00:00:45,410 --> 00:00:46,470
+and we are going to have
+我们要学习出
+
+21
+00:00:46,630 --> 00:00:47,890
+to learn a parameter vector theta
+一个参数向量θ(5)
+
+22
+00:00:48,140 --> 00:00:50,420
+5, which is going to be
+这是一个二维向量
+
+23
+00:00:51,130 --> 00:00:52,560
+in R2, remember this
+提醒一下
+
+24
+00:00:52,750 --> 00:00:55,920
+is now vectors in Rn not Rn+1,
+这个向量是 n 维的 而不是 n+1 维的
+
+25
+00:00:57,070 --> 00:00:58,210
+we'll learn the parameter vector theta 5 for our user number 5, Eve.
+我们要学习5号用户 Eve 的参数向量 θ(5)
+
+26
+00:00:59,780 --> 00:01:00,800
+So if we look in
+如果我们看
+
+27
+00:01:00,960 --> 00:01:02,020
+the first term in this
+这个优化目标的
+
+28
+00:01:02,200 --> 00:01:04,020
+optimization objective, well the
+第一项
+
+29
+00:01:04,220 --> 00:01:05,490
+user Eve hasn't rated any
+用户 Eve 没给任何电影打过分
+
+30
+00:01:05,730 --> 00:01:07,860
+movies, so there are
+所以对用户 Eve 来说
+
+31
+00:01:08,120 --> 00:01:10,750
+no movies for
+没有电影
+
+32
+00:01:11,050 --> 00:01:12,810
+which Rij is equal to
+满足 r(i,j)=1
+
+33
+00:01:13,130 --> 00:01:14,590
+one for the user Eve and
+这个条件
+
+34
+00:01:14,700 --> 00:01:15,840
+so this first term plays no
+所以这第一项
+
+35
+00:01:16,060 --> 00:01:17,400
+role at all in determining theta 5
+完全不影响 θ(5) 的值
+
+36
+00:01:18,610 --> 00:01:19,790
+because there are no movies that Eve has rated.
+因为没有电影被 Eve 评过分
+
+37
+00:01:20,960 --> 00:01:22,120
+And so the only term that
+所以影响 θ(5) 值的唯一一项
+
+38
+00:01:22,260 --> 00:01:24,300
+effects theta 5 is this term.
+是这一项
+
+39
+00:01:24,880 --> 00:01:25,830
+And so we're saying that we
+这就是说
+
+40
+00:01:25,910 --> 00:01:28,840
+want to choose vector theta 5 so
+我们想选一个向量 θ(5)
+
+41
+00:01:28,950 --> 00:01:33,820
+that the last regularization term is
+使得最后的正则化项
+
+42
+00:01:34,540 --> 00:01:35,500
+as small as possible.
+尽可能地小
+
+43
+00:01:35,920 --> 00:01:38,470
+In other words we want to minimize this
+换句话说 我们想要最小化这个式子
+
+44
+00:01:39,040 --> 00:01:39,610
+lambda over 2 theta 5 subscript 1 squared
+λ/2[(θ(5)_1)^2+(θ(5)_2)^2]
+
+45
+00:01:40,880 --> 00:01:43,150
+plus theta 5
+λ/2[(θ(5)_1)^2+(θ(5)_2)^2]
+
+46
+00:01:43,820 --> 00:01:45,840
+subscript 2 squared so
+λ/2[(θ(5)_1)^2+(θ(5)_2)^2]
+
+47
+00:01:46,040 --> 00:01:47,170
+that's the component of the
+它们是和用户5有关的
+
+48
+00:01:47,270 --> 00:01:49,460
+regularization term that corresponds to
+正则化项的要素
+
+49
+00:01:49,740 --> 00:01:51,610
+user 5, and of course
+当然
+
+50
+00:01:51,850 --> 00:01:53,280
+if your goal is to
+如果你的目标是
+
+51
+00:01:53,550 --> 00:01:55,540
+minimize this term, then
+最小化这一项
+
+52
+00:01:55,900 --> 00:01:56,790
+what you're going to end up
+那么你最终得到的
+
+53
+00:01:56,980 --> 00:01:58,520
+with is just theta 5 equals 0 0.
+就会是 θ(5)=[0;0]
+
+54
+00:01:59,670 --> 00:02:01,550
+Because a regularization term
+因为正则化项
+
+55
+00:02:01,850 --> 00:02:03,270
+is encouraging us to set
+会让你的参数
+
+56
+00:02:03,510 --> 00:02:05,120
+parameters close to 0
+接近0
+
+57
+00:02:05,620 --> 00:02:07,580
+and if there is
+如果没有数据
+
+58
+00:02:07,730 --> 00:02:08,820
+no data to try to
+能够使得参数
+
+59
+00:02:08,990 --> 00:02:10,210
+pull the parameters away from
+远离0
+
+60
+00:02:10,410 --> 00:02:11,460
+0, because this first term
+因为这第一项
+
+61
+00:02:12,710 --> 00:02:13,800
+doesn't effect theta 5,
+不影响 θ(5) 值
+
+62
+00:02:13,880 --> 00:02:15,410
+we just end up with theta 5
+我们就会得到
+
+63
+00:02:15,690 --> 00:02:18,450
+equals the vector of all zeros. And
+θ(5) 等于零向量的结果
+
+64
+00:02:18,590 --> 00:02:19,610
+so when we go to
+所以当我们要预测
+
+65
+00:02:19,730 --> 00:02:20,920
+predict how user 5 would
+用户5会如何
+
+66
+00:02:21,280 --> 00:02:22,570
+rate any movie, we have
+给电影打分
+
+67
+00:02:22,890 --> 00:02:25,850
+that theta 5 transpose xi,
+我们有 θ(5) 转置乘以 x(i)
+
+68
+00:02:26,900 --> 00:02:28,380
+for any i, that's just going
+对任意i
+
+69
+00:02:29,950 --> 00:02:31,060
+to be equal to zero.
+结果都会等于0
+
+70
+00:02:31,570 --> 00:02:33,320
+Because theta 5 is 0 for any value of
+因为对任意x值 θ(5) 都是0
+
+71
+00:02:33,750 --> 00:02:35,780
+x, this inner product is going to be equal to 0. And what we're
+这个内积就会等于0
+
+72
+00:02:35,900 --> 00:02:37,160
+going to have therefore, is that
+因此我们得到的结果是
+
+73
+00:02:37,310 --> 00:02:38,780
+we're going to predict that Eve
+我们会预测
+
+74
+00:02:39,480 --> 00:02:40,870
+is going to rate every single
+Eve 给所有电影的评分
+
+75
+00:02:41,170 --> 00:02:42,690
+movie with zero stars.
+都是零星
+
+76
+00:02:44,050 --> 00:02:45,970
+But this doesn't seem very useful does it?
+但是这个结果看起来没什么用吧?
+
+77
+00:02:46,110 --> 00:02:47,310
+I mean if you look at the different movies,
+我的意思是 如果你看不同的电影
+
+78
+00:02:47,770 --> 00:02:49,710
+Love at Last, this first movie,
+《爱到最后》 这第一个电影
+
+79
+00:02:50,130 --> 00:02:52,300
+a couple people rated it 5 stars.
+两个人给它评了五星
+
+80
+00:02:54,940 --> 00:02:56,870
+And for even the Swords vs. Karate, someone rated it 5 stars.
+甚至对《剑与空手道》 也有人评了五星
+
+81
+00:02:57,410 --> 00:02:58,780
+So some people do like some movies.
+所以某些人确实喜欢某些电影
+
+82
+00:02:59,270 --> 00:03:01,030
+It seems not useful to
+看起来只是预测
+
+83
+00:03:01,160 --> 00:03:03,750
+just predict that Eve is going to rate everything 0 stars.
+Eve 会给他们全部评零星是没用的
+
+84
+00:03:04,570 --> 00:03:05,850
+And in fact if we're predicting
+而且实际上如果我们预测
+
+85
+00:03:06,410 --> 00:03:08,340
+that eve is going to rate everything 0 stars,
+Eve 会给所有电影零星的话
+
+86
+00:03:09,050 --> 00:03:10,100
+we also don't have any
+我们还是没有任何好方法
+
+87
+00:03:10,280 --> 00:03:11,660
+good way of recommending any movies
+来把电影推荐给她
+
+88
+00:03:11,810 --> 00:03:12,930
+to her, because you know
+因为你知道
+
+89
+00:03:13,130 --> 00:03:15,320
+all of these movies are getting exactly the
+预测结果是所有这些电影
+
+90
+00:03:15,410 --> 00:03:16,810
+same predicted rating for Eve
+都会被 Eve 给出一样的评分
+
+91
+00:03:17,010 --> 00:03:18,500
+so there's no one movie with
+所以没有一部电影
+
+92
+00:03:18,660 --> 00:03:20,010
+a higher predicted rating that
+拥有高一点儿的预测评分
+
+93
+00:03:20,210 --> 00:03:22,880
+we could recommend to her, so, that's not very good.
+让我们能推荐给她 所以这不太好
+
+94
+00:03:24,520 --> 00:03:27,340
+The idea of mean normalization will let us fix this problem.
+均值归一化的想法可以让我们解决这个问题
+
+95
+00:03:28,160 --> 00:03:29,410
+So here's how it works.
+下面介绍它是如果工作的
+
+96
+00:03:30,760 --> 00:03:31,720
+As before let me group all of my
+和以前一样
+
+97
+00:03:32,370 --> 00:03:33,750
+movie ratings into this matrix
+我们把所有评分放到矩阵Y里
+
+98
+00:03:34,280 --> 00:03:35,400
+Y, so just take all of
+就是把所有这些评分
+
+99
+00:03:35,460 --> 00:03:36,700
+these ratings and group them
+全部整合到矩阵Y中
+
+100
+00:03:37,240 --> 00:03:38,400
+into matrix Y. And this
+这边这列
+
+101
+00:03:38,560 --> 00:03:39,740
+column over here of all
+全部是问号的这列
+
+102
+00:03:39,910 --> 00:03:41,220
+question marks corresponds to
+对应的是
+
+103
+00:03:41,670 --> 00:03:43,300
+Eve's not having rated any movies.
+Eve 没有给任何电影评分
+
+104
+00:03:44,830 --> 00:03:46,890
+Now to perform mean normalization what I'm going to
+现在要实现均值归一化
+
+105
+00:03:47,140 --> 00:03:48,350
+do is compute the average
+我要做的就是计算
+
+106
+00:03:48,720 --> 00:03:50,610
+rating that each movie obtained.
+每个电影所得评分的均值
+
+107
+00:03:51,120 --> 00:03:51,760
+And I'm going to store that
+我要把它们存在一个向量中
+
+108
+00:03:52,040 --> 00:03:54,780
+in a vector that we'll call mu.
+我们称这个向量为 μ
+
+109
+00:03:55,210 --> 00:03:57,250
+So the first movie got two 5-star and two 0-star ratings,
+所以第一个电影得到了两个5星和两个0星的评价
+
+110
+00:03:57,760 --> 00:03:58,960
+so the average of that is a 2.5-star rating.
+均值就是2.5星评价
+
+111
+00:03:59,040 --> 00:04:01,470
+The second movie had
+第二个电影的平均评价
+
+112
+00:04:01,620 --> 00:04:04,300
+an average of 2.5-stars and so on.
+是2.5星 等等
+
+113
+00:04:04,470 --> 00:04:06,300
+And the final movie that has 0, 0, 5, 0.
+最后一个电影的评分是 0 0 5 0
+
+114
+00:04:06,330 --> 00:04:07,440
+And the average of 0, 0,
+0 0 5 0 的平均值
+
+115
+00:04:07,520 --> 00:04:09,190
+5, 0, that averages out to
+0 0 5 0 的平均值
+
+116
+00:04:09,620 --> 00:04:11,500
+an average of 1.25
+就是1.25星评价
+
+117
+00:04:12,240 --> 00:04:14,910
+rating. And what I'm going to
+我要做的事
+
+118
+00:04:15,000 --> 00:04:15,900
+do is look at all
+要把所有的电影评分
+
+119
+00:04:16,020 --> 00:04:17,610
+the movie ratings and I'm going
+要把所有的电影评分
+
+120
+00:04:18,010 --> 00:04:19,550
+to subtract off the mean rating.
+减去平均评分
+
+121
+00:04:20,110 --> 00:04:22,990
+So this first element 5 I'm going to subtract off 2.5 and that gives me 2.5.
+所以这第一个元素5 我要减去2.5 等于2.5
+
+122
+00:04:26,900 --> 00:04:29,380
+And the second element 5 subtract off of 2.5,
+第二个元素5 减去2.5
+
+123
+00:04:29,590 --> 00:04:30,000
+get a 2.5.
+得到2.5
+
+124
+00:04:30,410 --> 00:04:31,760
+And then the 0,
+然后是0 0
+
+125
+00:04:32,040 --> 00:04:34,560
+0, subtract off 2.5 and you get -2.5, -2.5.
+减去2.5 得到-2.5 -2.5
+
+126
+00:04:35,450 --> 00:04:36,530
+In other words, what
+换句话说
+
+127
+00:04:36,620 --> 00:04:38,010
+I'm going to do is take
+我要做的就是
+
+128
+00:04:38,310 --> 00:04:39,440
+my matrix of movie ratings,
+把我的电影评分矩阵
+
+129
+00:04:39,960 --> 00:04:42,070
+take this wide matrix, and
+也就是这个Y矩阵
+
+130
+00:04:42,730 --> 00:04:45,580
+subtract form each row the average rating for that movie.
+把它的每一行都减去那个电影的平均评分
+
+131
+00:04:46,580 --> 00:04:47,580
+So, what I'm doing is
+所以我做的就是
+
+132
+00:04:48,010 --> 00:04:49,600
+just normalizing each movie to
+把每个电影都归一化为
+
+133
+00:04:49,740 --> 00:04:51,610
+have an average rating of zero.
+平均评分为零
+
+134
+00:04:52,800 --> 00:04:53,580
+And so just one last example.
+最后一个例子
+
+135
+00:04:54,000 --> 00:04:56,010
+If you look at this last row, 0 0 5 0.
+如果你看最后一行 0 0 5 0
+
+136
+00:04:56,270 --> 00:04:56,940
+We're going to subtract 1.25, and
+我们要减去1.25
+
+137
+00:04:57,000 --> 00:04:58,590
+so I end up with
+最后我得到
+
+138
+00:05:00,950 --> 00:05:02,300
+these values over here.
+那边这些值
+
+139
+00:05:02,510 --> 00:05:03,730
+So now and of course
+那么现在
+
+140
+00:05:03,940 --> 00:05:05,380
+the question marks stay a question
+当然这些问号没变
+
+141
+00:05:06,960 --> 00:05:06,960
+mark.
+还是问号
+
+142
+00:05:07,880 --> 00:05:09,630
+So each movie in
+所以每个电影
+
+143
+00:05:09,810 --> 00:05:11,040
+this new matrix Y has
+在新矩阵Y中的
+
+144
+00:05:11,210 --> 00:05:12,780
+an average rating of 0.
+平均评分都是0
+
+145
+00:05:13,940 --> 00:05:15,180
+What I'm going to do then, is
+接下来我要做的就是
+
+146
+00:05:15,440 --> 00:05:16,850
+take this set of ratings
+对这个评分数据集
+
+147
+00:05:17,590 --> 00:05:20,170
+and use it with my collaborative filtering algorithm.
+使用协同过滤算法
+
+148
+00:05:20,480 --> 00:05:22,130
+So I'm going to pretend that this
+所以我要假设
+
+149
+00:05:22,430 --> 00:05:24,200
+was the data that I had
+这就是我从用户那儿
+
+150
+00:05:24,420 --> 00:05:25,570
+gotten from my users, or pretend that
+得到的数据
+
+151
+00:05:25,810 --> 00:05:27,400
+these are the actual ratings I
+或者假设它们就是
+
+152
+00:05:27,530 --> 00:05:28,940
+had gotten from the users, and I'm
+我从用户那儿得到的实际评分
+
+153
+00:05:29,250 --> 00:05:30,130
+going to use this as my
+我要把这个当做我的数据集
+
+154
+00:05:30,270 --> 00:05:31,730
+data set with which to
+用它来学习
+
+155
+00:05:32,000 --> 00:05:33,920
+learn my parameters theta
+我的参数 θ(j)
+
+156
+00:05:34,560 --> 00:05:36,540
+J and my features XI
+和特征变量 x(i)
+
+157
+00:05:36,860 --> 00:05:39,320
+- from these mean normalized movie ratings.
+就是用这些均值归一化后的电影评分来学习
+
+158
+00:05:41,280 --> 00:05:42,040
+When I want to make predictions
+当我想要做
+
+159
+00:05:42,660 --> 00:05:43,910
+of movie ratings, what I'm
+电影评分预测时
+
+160
+00:05:44,070 --> 00:05:44,980
+going to do is the
+我要做的步骤如下
+
+161
+00:05:45,250 --> 00:05:46,830
+following: for user J on movie
+对用户j对电影i的评分
+
+162
+00:05:47,130 --> 00:05:49,250
+I, I'm gonna predict theta
+我要预测它为
+
+163
+00:05:49,600 --> 00:05:54,730
+J transpose XI, where
+θ(j) 转置乘以 x(i)
+
+164
+00:05:55,070 --> 00:05:55,990
+X and theta are the parameters
+其中 x 和 θ 都是
+
+165
+00:05:56,590 --> 00:05:58,230
+that I've learned from this mean normalized data set.
+均值归一化的数据集中学习出的参数
+
+166
+00:05:59,180 --> 00:06:00,680
+But, because on the data
+但是因为我已经对数据集
+
+167
+00:06:00,950 --> 00:06:02,260
+set, I had subtracted off the
+减去了均值
+
+168
+00:06:02,330 --> 00:06:04,000
+means in order to make
+所以为了
+
+169
+00:06:04,040 --> 00:06:05,210
+a prediction on movie i,
+给电影i预测评分
+
+170
+00:06:05,510 --> 00:06:07,220
+I'm going to need to add back in the mean,
+我要把这个均值加回来
+
+171
+00:06:08,070 --> 00:06:08,730
+and so i'm going to add
+所以我要再加回 μi
+
+172
+00:06:08,840 --> 00:06:10,690
+back in mu i. And
+所以我要再加回 μi
+
+173
+00:06:10,830 --> 00:06:11,780
+so that's going to be
+所以这就是
+
+174
+00:06:11,830 --> 00:06:13,350
+my prediction where in my training
+我得到的预测值
+
+175
+00:06:13,660 --> 00:06:14,860
+data subtracted off all the
+因为训练数据减去了所有的均值
+
+176
+00:06:14,930 --> 00:06:16,290
+means and so when we
+所以当我做预测时
+
+177
+00:06:16,440 --> 00:06:20,770
+make predictions and we need
+我们需要
+
+178
+00:06:21,770 --> 00:06:23,030
+to add back in these
+给电影 i
+
+179
+00:06:23,410 --> 00:06:23,880
+means mu i for movie i. And
+加回这个均值 μi
+
+180
+00:06:24,100 --> 00:06:25,320
+so specifically if you user
+具体来说
+
+181
+00:06:25,330 --> 00:06:26,840
+5 which is Eve, the same argument as
+如果用户5 Eve
+
+182
+00:06:27,010 --> 00:06:28,250
+the previous slide still applies in
+之前幻灯片里的的描述仍然成立
+
+183
+00:06:28,440 --> 00:06:29,870
+the sense that Eve had
+Eve 从来没有
+
+184
+00:06:30,080 --> 00:06:31,600
+not rated any movies and
+给任何电影打分
+
+185
+00:06:31,760 --> 00:06:32,930
+so the learned parameter for
+所以学习到的用户5的参数
+
+186
+00:06:33,710 --> 00:06:35,030
+user 5 is still going to
+仍然还是
+
+187
+00:06:35,970 --> 00:06:37,990
+be equal to 0, 0.
+会等于 0 0
+
+188
+00:06:38,270 --> 00:06:39,910
+And so what we're
+所以我们会得到的是
+
+189
+00:06:40,130 --> 00:06:41,320
+going to get then is that
+所以我们会得到的是
+
+190
+00:06:41,690 --> 00:06:42,980
+on a particular movie i we're
+对特定的电影 i
+
+191
+00:06:43,130 --> 00:06:44,900
+going to predict for Eve theta
+我们预测 Eve 的评分是
+
+192
+00:06:45,480 --> 00:06:49,930
+5, transpose xi plus
+θ(5) 转置乘以 x(i)
+
+193
+00:06:51,260 --> 00:06:52,890
+add back in mu i and
+然后再加上 μi
+
+194
+00:06:53,010 --> 00:06:54,360
+so this first component is
+所以如果 θ(5) 等于0的话
+
+195
+00:06:54,460 --> 00:06:57,520
+going to be equal to zero, if theta five is equal to zero.
+这第一部分就会等于0
+
+196
+00:06:58,290 --> 00:06:59,190
+And so on movie i, we
+所以对电影 i 的评分
+
+197
+00:06:59,260 --> 00:07:00,660
+are going to end a predicting mu
+我们最终会预测为 μi
+
+198
+00:07:01,090 --> 00:07:03,190
+i. And, this actually makes sense.
+这实际上是说得通的
+
+199
+00:07:03,380 --> 00:07:03,690
+It means that
+它的意思是
+
+200
+00:07:03,900 --> 00:07:05,270
+on movie 1 we're
+对于电影 1
+
+201
+00:07:05,390 --> 00:07:06,990
+going to predict Eve rates it 2.5.
+我们会预测 Eve 对它的评分是2.5
+
+202
+00:07:07,270 --> 00:07:10,260
+On movie 2 we're gonna predict Eve rates it 2.5.
+对于电影2 我们会预测 Eve 给它2.5星
+
+203
+00:07:10,420 --> 00:07:11,640
+On movie 3 we're
+对于电影3
+
+204
+00:07:11,880 --> 00:07:13,000
+gonna predict Eve rates it at 2
+我们会预测 Eve 给它2星
+
+205
+00:07:13,200 --> 00:07:14,510
+and so on.
+依次类推
+
+206
+00:07:14,780 --> 00:07:15,960
+This actually makes sense, because it says
+这其实说得通
+
+207
+00:07:16,320 --> 00:07:17,730
+that if Eve hasn't rated
+因为它的意思是
+
+208
+00:07:18,020 --> 00:07:18,870
+any movies and we just
+如果 Eve 没给任何电影评分
+
+209
+00:07:19,100 --> 00:07:20,180
+don't know anything about this new
+我们就对这个新用户 Eve 一无所知
+
+210
+00:07:20,410 --> 00:07:21,630
+user Eve, what we're going
+我们要做的就是预测
+
+211
+00:07:21,810 --> 00:07:23,770
+to do is just predict for
+她对每个电影的评分
+
+212
+00:07:23,940 --> 00:07:25,140
+each of the movies, what are
+就是这些电影所得的平均评分
+
+213
+00:07:25,230 --> 00:07:27,520
+the average rating that those movies got.
+就是这些电影所得的平均评分
+
+214
+00:07:30,060 --> 00:07:31,480
+Finally, as an aside, in
+最后再补充一下
+
+215
+00:07:31,810 --> 00:07:33,290
+this video we talked about mean
+在这个视频中
+
+216
+00:07:33,540 --> 00:07:35,220
+normalization, where we normalized
+我们谈到了均值归一化
+
+217
+00:07:35,320 --> 00:07:36,450
+each row of the matrix y,
+我们归一化矩阵Y
+
+218
+00:07:37,510 --> 00:07:38,100
+to have mean 0.
+使得每行的均值都是0
+
+219
+00:07:39,020 --> 00:07:40,730
+In case you have some movies
+如果有些电影是没有评分的
+
+220
+00:07:41,020 --> 00:07:42,330
+with no ratings, so it is
+这个情形类似于
+
+221
+00:07:42,590 --> 00:07:44,320
+analogous to a user who hasn't rated anything,
+有的用户没有给任何电影评分的情况
+
+222
+00:07:44,590 --> 00:07:45,550
+but in case you have some
+但是如果你有些电影
+
+223
+00:07:46,250 --> 00:07:47,530
+movies with no ratings, you
+是没有评分的
+
+224
+00:07:47,590 --> 00:07:48,700
+can also play with versions
+你可以尝试这个算法的其他版本
+
+225
+00:07:49,320 --> 00:07:50,700
+of the algorithm, where you
+你可以对不同的列
+
+226
+00:07:50,900 --> 00:07:52,190
+normalize the different columns
+进行归一化
+
+227
+00:07:52,790 --> 00:07:53,990
+to have means zero, instead of
+使得它们的均值为0
+
+228
+00:07:54,280 --> 00:07:55,180
+normalizing the rows to have mean
+而不是把行均值归一化为0
+
+229
+00:07:55,500 --> 00:07:56,990
+zero, although that's maybe
+虽说这个可能不太重要
+
+230
+00:07:57,240 --> 00:07:58,770
+less important, because if you
+因为如果你
+
+231
+00:07:58,870 --> 00:07:59,810
+really have a movie with no
+真的有个电影没有评分
+
+232
+00:08:00,040 --> 00:08:01,390
+rating, maybe you just
+可能不管怎么说
+
+233
+00:08:01,590 --> 00:08:03,920
+shouldn't recommend that movie to anyone, anyway.
+你就不该把这个电影推荐给任何人
+
+234
+00:08:04,700 --> 00:08:08,010
+And so, taking
+所以说
+
+235
+00:08:08,540 --> 00:08:09,980
+care of the case of a user who hasn't
+解决用户没评价过
+
+236
+00:08:10,490 --> 00:08:11,780
+rated anything might be more
+任何电影的状况
+
+237
+00:08:12,010 --> 00:08:13,170
+important than taking care of
+可能比解决
+
+238
+00:08:13,310 --> 00:08:14,550
+the case of a movie
+电影没被评价过
+
+239
+00:08:14,860 --> 00:08:16,090
+that hasn't gotten a single rating.
+的状况更重要
+
+240
+00:08:18,930 --> 00:08:20,080
+So to summarize, that's how
+最后总结一下
+
+241
+00:08:20,360 --> 00:08:21,830
+you can do mean normalization as
+这就是可以说是作为协同过滤算法的预处理步骤
+
+242
+00:08:22,110 --> 00:08:25,110
+a sort of pre-processing step for collaborative filtering.
+均值归一化的实现
+
+243
+00:08:25,740 --> 00:08:26,670
+Depending on your data set,
+根据你的数据集的不同
+
+244
+00:08:26,960 --> 00:08:28,140
+this might some times make your implementation
+它可能有时会让实现的算法
+
+245
+00:08:28,540 --> 00:08:30,040
+work just a little bit better.
+表现得好一点儿 (字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
diff --git a/srt/17 - 1 - Learning With Large Datasets (6 min).srt b/srt/17 - 1 - Learning With Large Datasets (6 min).srt
new file mode 100644
index 00000000..ca38e44e
--- /dev/null
+++ b/srt/17 - 1 - Learning With Large Datasets (6 min).srt
@@ -0,0 +1,345 @@
+1
+00:00:00,332 --> 00:00:04,284
+In the next few videos, we'll talk about large scale machine learning.
+在接下来的几个视频,我们将谈论大规模机器学习。(字幕翻译:中国海洋大学 黄海广 haiguang2000@qq.com)
+
+2
+00:00:04,284 --> 00:00:08,316
+That is, algorithms but viewing with big data sets.
+也就是大数据集的算法。
+
+3
+00:00:08,316 --> 00:00:12,839
+If you look back at a recent 5 or 10-year history of machine learning.
+如果你看看最近5年或10年机器学习的历史。
+
+4
+00:00:12,839 --> 00:00:17,853
+One of the reasons that learning algorithms work so much better now than even say, 5-years ago,
+一个原因就是:现在的一个学习算法比5年前更好,
+
+5
+00:00:17,853 --> 00:00:22,657
+is just the sheer amount of data that we have now and that we can train our algorithms on.
+只是说我们现在,我们在数据量极其庞大可以训练我们的算法。
+
+6
+00:00:22,657 --> 00:00:29,741
+In these next few videos, we'll talk about algorithms for dealing when we have such massive data sets.
+在接下来的视频,我们将谈论时,当我们有如此庞大的数据集的算法处理。
+
+7
+00:00:32,926 --> 00:00:35,527
+So why do we want to use such large data sets?
+我们为什么要用这么大的数据集?
+
+8
+00:00:35,527 --> 00:00:40,564
+We've already seen that one of the best ways to get a high performance machine learning system,
+我们已经看到了一个得到一个高性能的机器学习系统的最佳途径,
+
+9
+00:00:40,564 --> 00:00:46,168
+is if you take a low-bias learning algorithm, and train that on a lot of data.
+如果你采取低偏差的学习算法,并对大量数据进行训练。
+
+10
+00:00:46,168 --> 00:00:53,561
+And so, one early example we have already seen was this example of classifying between confusable words.
+一个早期的例子,我们已经看到了一个分类容易混淆的词的例子。
+
+11
+00:00:53,561 --> 00:01:00,726
+So, for breakfast, I ate two (TWO) eggs and we saw in this example, these sorts of results,
+上面的例子分类的结果,早餐我吃了(TWO)个鸡蛋
+
+12
+00:01:00,726 --> 00:01:06,436
+where, you know, so long as you feed the algorithm a lot of data, it seems to do very well.
+如你所知,只要你给的算法很多数据,它似乎做的很好。
+
+13
+00:01:06,436 --> 00:01:10,419
+And so it's results like these that has led to the saying in machine learning that
+所以这样的结果表明,在机器学习中,
+
+14
+00:01:10,419 --> 00:01:15,151
+often it's not who has the best algorithm that wins. It's who has the most data.
+决定因素往往不是那些拥有最好的算法,而是这是谁的数据多。
+
+15
+00:01:15,151 --> 00:01:19,568
+So you want to learn from large data sets, at least when we can get such large data sets.
+所以你想从大型数据集的学习,至少我们得获得这样的大型数据集。
+
+16
+00:01:19,568 --> 00:01:27,027
+But learning with large data sets comes with its own unique problems, specifically, computational problems.
+但大型数据集的学习都有自己独特的问题,特别是计算问题。
+
+17
+00:01:27,027 --> 00:01:33,870
+Let's say your training set size is M equals 100,000,000.
+我们说你的训练集的大小为M = 100000000。
+
+18
+00:01:33,870 --> 00:01:37,934
+And this is actually pretty realistic for many modern data sets.
+其实这是很现实的许多现代数据集。
+
+19
+00:01:37,934 --> 00:01:40,518
+If you look at the US Census data set, if there are, you know,
+如果你看看美国人口普查数据,如果有的话,你知道的,
+
+20
+00:01:40,518 --> 00:01:44,663
+300 million people in the US, you can usually get hundreds of millions of records.
+在美国有三亿人口,你通常可以得到成千上万的记录。
+
+21
+00:01:44,663 --> 00:01:47,856
+If you look at the amount of traffic that popular websites get,
+如果你看流行的网站的流量数据,
+
+22
+00:01:47,856 --> 00:01:52,509
+you easily get training sets that are much larger than hundreds of millions of examples.
+你轻易得到的训练集,一亿数据还大得多。
+
+23
+00:01:52,509 --> 00:01:57,407
+And let's say you want to train a linear regression model, or maybe a logistic regression model,
+让我们说你想训练一个线性回归模型,或者一个Logistic回归模型,
+
+24
+00:01:57,407 --> 00:02:01,692
+in which case this is the gradient descent rule.
+在这种情况下,这是梯度下降规则。
+
+25
+00:02:01,692 --> 00:02:05,372
+And if you look at what you need to do to compute the gradient,
+如果你看看你需要做什么来计算梯度,
+
+26
+00:02:05,372 --> 00:02:09,992
+which is this term over here, then when M is a hundred million,
+这个词在这里是,当m是一亿,
+
+27
+00:02:09,992 --> 00:02:13,976
+you need to carry out a summation over a hundred million terms,
+你需要进行求和一亿条,
+
+28
+00:02:13,976 --> 00:02:18,977
+in order to compute these derivatives terms and to perform a single step of decent.
+为了计算这些导数和单步下降。
+
+29
+00:02:18,977 --> 00:02:25,627
+Because of the computational expense of summing over a hundred million entries
+因为总的超过一亿次的计算
+
+30
+00:02:25,627 --> 00:02:28,628
+in order to compute just one step of gradient descent,
+为了计算只是一步梯度下降,
+
+31
+00:02:28,628 --> 00:02:31,530
+in the next few videos we've spoken about techniques
+在接下来的几个视频我们已经谈论
+
+32
+00:02:31,530 --> 00:02:38,413
+for either replacing this with something else or to find more efficient ways to compute this derivative.
+或者更换这东西或找到更有效的方法来计算这个导数。
+
+33
+00:02:38,413 --> 00:02:41,709
+By the end of this sequence of videos on large scale machine learning,
+通过观看大规模机器学习的视频,
+
+34
+00:02:41,709 --> 00:02:47,045
+you know how to fit models, linear regression, logistic regression, neural networks and so on
+你知道如何适应模型,线性回归,logistic回归,神经网络等
+
+35
+00:02:47,045 --> 00:02:50,990
+even today's data sets with, say, a hundred million examples.
+甚至说,今天的数据集达到一亿例。
+
+36
+00:02:50,990 --> 00:02:56,035
+Of course, before we put in the effort into training a model with a hundred million examples,
+当然,在我们尝试把一亿个样本放入训练模型,
+
+37
+00:02:56,035 --> 00:03:01,276
+We should also ask ourselves, well, why not use just a thousand examples.
+我们也应该问问自己,为什么不使用一千个样本?
+
+38
+00:03:01,276 --> 00:03:04,923
+Maybe we can randomly pick the subsets of a thousand examples
+也许我们可以随机选择一千个样本的子集,
+
+39
+00:03:04,923 --> 00:03:10,254
+out of a hundred million examples and train our algorithm on just a thousand examples.
+通过一亿个样本得到的训练算法用于一千个样本。
+
+40
+00:03:10,254 --> 00:03:16,076
+So before investing the effort into actually developing and the software needed to train these massive models
+所以投资前,努力发展和实际需要训练这些大规模模型的软件
+
+41
+00:03:16,076 --> 00:03:22,461
+is often a good sanity check, if training on just a thousand examples might do just as well.
+往往是一个很好的检查,如果在一千个例子训练可能会做的一样。
+
+42
+00:03:22,461 --> 00:03:29,731
+The way to sanity check of using a much smaller training set might do just as well,
+的使用一个非常小的训练集的完整性检查的方式可能会做的一样,
+
+43
+00:03:29,731 --> 00:03:33,958
+that is if using a much smaller n equals 1000 size training set,
+如果使用一个非常小的训练集,n等于1000,
+
+44
+00:03:33,958 --> 00:03:37,797
+that might do just as well, it is the usual method of plotting the learning curves,
+会做的一样好,它是通常学习曲线的绘制方法,
+
+45
+00:03:37,797 --> 00:03:46,872
+so if you were to plot the learning curves and if your training objective were to look like this,
+如果你要画的学习曲线,如果你的训练目的是看起来像这样,
+
+46
+00:03:46,872 --> 00:03:49,553
+that's J train theta.
+这是Jtrain(θ),
+
+47
+00:03:49,553 --> 00:03:56,422
+And if your cross-validation set objective, Jcv of theta would look like this,
+如果你的交叉验证,设定目标,JCV(θ)看起来像这样
+
+48
+00:03:56,422 --> 00:04:00,310
+then this looks like a high-variance learning algorithm,
+这看起来像一个高方差学习算法,
+
+49
+00:04:00,310 --> 00:04:05,913
+and we will be more confident that adding extra training examples would improve performance.
+我们会更有信心,增加额外的培训例子会提高性能。
+
+50
+00:04:05,913 --> 00:04:10,462
+Whereas in contrast if you were to plot the learning curves,
+而相反如果你策划的学习曲线,
+
+51
+00:04:10,462 --> 00:04:20,339
+if your training objective were to look like this, and if your cross-validation objective were to look like that,
+如果你训练的目的是这个样子的,如果你的交叉验证的目的是那个样子,
+
+52
+00:04:20,339 --> 00:04:24,292
+then this looks like the classical high-bias learning algorithm.
+然后像经典的高偏差学习算法。
+
+53
+00:04:24,292 --> 00:04:28,084
+And in the latter case, you know, if you were to plot this up to,
+在后一种情况下,你知道,如果你这样画,
+
+54
+00:04:28,084 --> 00:04:33,437
+say, m equals 1000 and so that is m equals 500 up to m equals 1000,
+M等于1000,所以这是m等于500到m等于1000,
+
+55
+00:04:33,437 --> 00:04:39,400
+then it seems unlikely that increasing m to a hundred million will do much better
+看起来不太可能增加m到一亿会做得更好
+
+56
+00:04:39,400 --> 00:04:42,736
+and then you'd be just fine sticking to n equals 1000,
+然后你得坚持n等于1000,
+
+57
+00:04:42,736 --> 00:04:47,000
+rather than investing a lot of effort to figure out how the scale of the algorithm.
+而不是投入了很多努力弄清如何算法的尺度。
+
+58
+00:04:47,000 --> 00:04:51,029
+Of course, if you were in the situation shown by the figure on the right,
+当然,如果你在右图这种情况,
+
+59
+00:04:51,029 --> 00:04:53,885
+then one natural thing to do would be to add extra features,
+自然而然将会添加额外的特征,
+
+60
+00:04:53,885 --> 00:04:58,484
+or add extra hidden units to your neural network and so on,
+或在你的神经网络添加额外的隐藏单元,等等,
+
+61
+00:04:58,484 --> 00:05:04,627
+so that you end up with a situation closer to that on the left, where maybe this is up to n equals 1000,
+你最终接近左侧的情况,而这也许是到n等于1000,
+
+62
+00:05:04,627 --> 00:05:09,553
+and this then gives you more confidence that trying to add infrastructure to change the algorithm
+并给你更多的信心,努力把基础设施改变算法
+
+63
+00:05:09,553 --> 00:05:14,735
+to use much more than a thousand examples that might actually be a good use of your time.
+使用超过一千的样本,实际上可能是很好的利用你的时间。
+
+64
+00:05:14,735 --> 00:05:19,642
+So in large-scale machine learning, we like to come up with computationally reasonable ways,
+如此大规模的机器学习,我们要拿出合理的计算方法,
+
+65
+00:05:19,642 --> 00:05:24,026
+or computationally efficient ways, to deal with very big data sets.
+或计算的有效方法,处理非常大的数据集。
+
+66
+00:05:24,026 --> 00:05:26,826
+In the next few videos, we'll see two main ideas.
+在接下来的几个视频,我们会看到两个主要观点。
+
+67
+00:05:26,826 --> 00:05:33,464
+The first is called stochastic gradient descent and the second is called Map Reduce, for viewing with very big data sets.
+第一种称为随机梯度下降,第二是图的减少,观察的非常大的数据集。
+
+68
+00:05:33,464 --> 00:05:39,986
+And after you've learned about these methods, hopefully that will allow you to scale up your learning algorithms to big data
+当你了解了这些方法,希望这将允许你扩大你的学习算法在大数据
+
+69
+00:05:39,986 --> 00:05:43,986
+and allow you to get much better performance on many different applications.
+让你在许多不同的应用得到更好的性能。
+
diff --git a/srt/17 - 2 - Stochastic Gradient Descent (13 min).srt b/srt/17 - 2 - Stochastic Gradient Descent (13 min).srt
new file mode 100644
index 00000000..1e61eb48
--- /dev/null
+++ b/srt/17 - 2 - Stochastic Gradient Descent (13 min).srt
@@ -0,0 +1,1177 @@
+1
+00:00:00,251 --> 00:00:05,622
+对于很多机器学习算法 包括线性回归、逻辑回归、神经网络等等
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:00,251 --> 00:00:05,622
+For many learning algorithms, among them linear regression, logistic regression and neural networks,
+
+3
+00:00:05,622 --> 00:00:11,955
+算法的实现都是通过得出某个代价函数 或者某个最优化的目标来实现的
+
+4
+00:00:05,622 --> 00:00:11,955
+the way we derive the algorithm was by coming up with a cost function or coming up with an optimization objective.
+
+5
+00:00:11,955 --> 00:00:16,476
+然后使用梯度下降这样的方法来求得代价函数的最小值
+
+6
+00:00:11,955 --> 00:00:16,476
+And then using an algorithm like gradient descent to minimize that cost function.
+
+7
+00:00:16,476 --> 00:00:22,461
+当我们的训练集较大时 梯度下降算法则显得计算量非常大
+
+8
+00:00:16,476 --> 00:00:22,461
+We have a very large training set gradient descent becomes a computationally very expensive procedure.
+
+9
+00:00:22,461 --> 00:00:29,300
+在这段视频中 我想介绍一种跟普通梯度下降不同的方法 随机梯度下降(stochastic gradient descent)
+
+10
+00:00:22,461 --> 00:00:29,300
+In this video, we'll talk about a modification to the basic gradient descent algorithm called Stochastic gradient descent,
+
+11
+00:00:29,300 --> 00:00:37,841
+用这种方法我们可以将算法运用到较大训练集的情况中
+
+12
+00:00:29,300 --> 00:00:37,841
+which will allow us to scale these algorithms to much bigger training sets.
+
+13
+00:00:37,841 --> 00:00:41,928
+假如你要使用梯度下降法来训练某个线性回归模型
+
+14
+00:00:37,841 --> 00:00:41,928
+Suppose you are training a linear regression model using gradient descent.
+
+15
+00:00:41,928 --> 00:00:48,055
+简单复习一下 我们的假设函数是这样的
+
+16
+00:00:41,928 --> 00:00:48,055
+As a quick recap, the hypothesis will look like this, and the cost function will look like this,
+
+17
+00:00:48,055 --> 00:00:54,459
+代价函数是你的假设在训练集样本上预测的平均平方误差的二分之一倍的求和
+
+18
+00:00:48,055 --> 00:00:54,459
+which is the sum of one half of the average square error of your hypothesis on your m training examples,
+
+19
+00:00:54,459 --> 00:00:59,705
+通常我们看到的代价函数都是像这样的弓形函数
+
+20
+00:00:54,459 --> 00:00:59,705
+and the cost function we've already seen looks like this sort of bow-shaped function.
+
+21
+00:00:59,705 --> 00:01:06,659
+因此 画出以θ0和θ1为参数的代价函数J 就是这样的弓形函数
+
+22
+00:00:59,705 --> 00:01:06,659
+So, plotted as function of the parameters theta 0 and theta 1, the cost function J is a sort of a bow-shaped function.
+
+23
+00:01:06,659 --> 00:01:10,999
+这就是梯度下降算法 在内层循环中
+
+24
+00:01:06,659 --> 00:01:10,999
+And gradient descent looks like this, where in the inner loop of gradient descent
+
+25
+00:01:10,999 --> 00:01:15,594
+你需要用这个式子反复更新参数θ的值
+
+26
+00:01:10,999 --> 00:01:15,594
+you repeatedly update the parameters theta using that expression.
+
+27
+00:01:15,594 --> 00:01:22,574
+在这段视频剩下的时间里 我将依然以线性回归为例
+
+28
+00:01:15,594 --> 00:01:22,574
+Now in the rest of this video, I'm going to keep using linear regression as the running example.
+
+29
+00:01:22,574 --> 00:01:29,371
+但随机梯度下降的思想也可以应用于其他的学习算法
+
+30
+00:01:22,574 --> 00:01:29,371
+But the ideas here, the ideas of Stochastic gradient descent is fully general and also applies to other learning algorithms
+
+31
+00:01:29,371 --> 00:01:38,011
+比如逻辑回归、神经网络或者其他依靠梯度下降来进行训练的算法中
+
+32
+00:01:29,371 --> 00:01:38,011
+like logistic regression, neural networks and other algorithms that are based on training gradient descent on a specific training set.
+
+33
+00:01:38,011 --> 00:01:43,236
+这张图表示的是梯度下降的做法 这个点表示了参数的初始位置
+
+34
+00:01:38,011 --> 00:01:43,236
+So here's a picture of what gradient descent does, if the parameters are initialized to the point there
+
+35
+00:01:43,236 --> 00:01:50,072
+那么在你运行梯度下降的过程中 多步迭代最终会将参数锁定到全局最小值
+
+36
+00:01:43,236 --> 00:01:50,072
+then as you run gradient descent different iterations of gradient descent will take the parameters to the global minimum.
+
+37
+00:01:50,072 --> 00:01:55,193
+步进的轨迹看起来非常快地收敛到全局最小
+
+38
+00:01:50,072 --> 00:01:55,193
+So take a trajectory that looks like that and heads pretty directly to the global minimum.
+
+39
+00:01:55,193 --> 00:01:59,561
+而梯度下降法的问题是 当m值很大时
+
+40
+00:01:55,193 --> 00:01:59,561
+Now, the problem with gradient descent is that if m is large.
+
+41
+00:01:59,561 --> 00:02:08,382
+计算这个微分项的计算量就变得很大 因为需要对所有m个训练样本求和
+
+42
+00:01:59,561 --> 00:02:08,382
+Then computing this derivative term can be very expensive, because the surprise, summing over all m examples.
+
+43
+00:02:08,382 --> 00:02:15,644
+所以假如m的值为3亿 美国就有3亿人口
+
+44
+00:02:08,382 --> 00:02:15,644
+So if m is 300 million, alright. So in the United States, there are about 300 million people.
+
+45
+00:02:15,644 --> 00:02:20,783
+美国的人口普查数据就有这种量级的数据记录
+
+46
+00:02:15,644 --> 00:02:20,783
+And so the US or United States census data may have on the order of that many records.
+
+47
+00:02:20,783 --> 00:02:26,715
+所以如果想要为这么多数据拟合一个线性回归模型的话 那就需要对所有这3亿数据进行求和
+
+48
+00:02:20,783 --> 00:02:26,715
+So you want to fit the linear regression model to that then you need to sum over 300 million records.
+
+49
+00:02:26,715 --> 00:02:36,385
+这样的计算量太大了 这种梯度下降算法也被称为批量梯度下降(batch gradient descent)
+
+50
+00:02:26,715 --> 00:02:36,385
+And that's very expensive. To give the algorithm a name, this particular version of gradient descent is also called Batch gradient descent.
+
+51
+00:02:36,385 --> 00:02:41,352
+“批量”就表示我们需要每次都考虑所有的训练样本
+
+52
+00:02:36,385 --> 00:02:41,352
+And the term Batch refers to the fact that we're looking at all of the training examples at a time.
+
+53
+00:02:41,352 --> 00:02:44,303
+我们可以称为所有这批训练样本
+
+54
+00:02:41,352 --> 00:02:44,303
+We call it sort of a batch of all of the training examples.
+
+55
+00:02:44,303 --> 00:02:51,853
+也许这不是个恰当的名字 但做机器学习的人就是这么称呼它的
+
+56
+00:02:44,303 --> 00:02:51,853
+And it really isn't the, maybe the best name but this is what machine learning people call this particular version of gradient descent.
+
+57
+00:02:51,853 --> 00:02:57,157
+想象一下 如果你真的有这3亿人口的数据存在硬盘里
+
+58
+00:02:51,853 --> 00:02:57,157
+And if you imagine really that you have 300 million census records stored away on disc.
+
+59
+00:02:57,157 --> 00:03:05,945
+那么这种算法就需要把所有这3亿人口数据读入计算机 仅仅就为了算一个微分项而已
+
+60
+00:02:57,157 --> 00:03:05,945
+The way this algorithm works is you need to read into your computer memory all 300 million records in order to compute this derivative term.
+
+61
+00:03:05,945 --> 00:03:11,508
+你需要将这些数据连续传入计算机 因为计算机存不下那么大的数据量
+
+62
+00:03:05,945 --> 00:03:11,508
+You need to stream all of these records through computer because you can't store all your records in computer memory.
+
+63
+00:03:11,508 --> 00:03:16,425
+所以你需要很慢地读取数据 然后计算一个求和 再来算出微分
+
+64
+00:03:11,508 --> 00:03:16,425
+So you need to read through them and slowly, you know, accumulate the sum in order to compute the derivative.
+
+65
+00:03:16,425 --> 00:03:21,452
+所有这些做完以后 你才完成了一次梯度下降的迭代
+
+66
+00:03:16,425 --> 00:03:21,452
+And then having done all that work, that allows you to take one step of gradient descent.
+
+67
+00:03:21,452 --> 00:03:24,749
+然后你又需要重新来一遍
+
+68
+00:03:21,452 --> 00:03:24,749
+And now you need to do the whole thing again.
+
+69
+00:03:24,749 --> 00:03:28,424
+也就是再读取这3亿人口数据 做个求和
+
+70
+00:03:24,749 --> 00:03:28,424
+You know, scan through all 300 million records, accumulate these sums.
+
+71
+00:03:28,424 --> 00:03:32,578
+然后做完这些 你又完成了梯度下降的一小步
+
+72
+00:03:28,424 --> 00:03:32,578
+And having done all that work, you can take another little step using gradient descent.
+
+73
+00:03:32,578 --> 00:03:36,959
+然后再做一次 你得到第三次迭代 等等
+
+74
+00:03:32,578 --> 00:03:36,959
+And then do that again. And then you take yet a third step. And so on.
+
+75
+00:03:36,959 --> 00:03:40,819
+所以 要让算法收敛 绝对需要花很长的时间
+
+76
+00:03:36,959 --> 00:03:40,819
+And so it's gonna take a long time in order to get the algorithm to converge.
+
+77
+00:03:40,819 --> 00:03:45,375
+相比于批量梯度下降 我们介绍的方法就完全不同了
+
+78
+00:03:40,819 --> 00:03:45,375
+In contrast to Batch gradient descent, what we are going to do is come up with a different algorithm
+
+79
+00:03:45,375 --> 00:03:50,465
+这种方法在每一步迭代中 不用考虑全部的训练样本
+
+80
+00:03:45,375 --> 00:03:50,465
+that doesn't need to look at all the training examples in every single iteration,
+
+81
+00:03:50,465 --> 00:03:55,118
+只需要考虑一个训练样本
+
+82
+00:03:50,465 --> 00:03:55,118
+but that needs to look at only a single training example in one iteration.
+
+83
+00:03:55,118 --> 00:03:59,617
+在开始介绍新的算法之前 我把批量梯度下降算法再写在这里
+
+84
+00:03:55,118 --> 00:03:59,617
+Before moving on to the new algorithm, here's just a Batch gradient descent algorithm written out again
+
+85
+00:03:59,617 --> 00:04:05,794
+这里是代价函数 这里是迭代的更新过程
+
+86
+00:03:59,617 --> 00:04:05,794
+with that being the cost function and that being the update and of course this term here,
+
+87
+00:04:05,794 --> 00:04:10,678
+梯度下降法中的这一项
+
+88
+00:04:05,794 --> 00:04:10,678
+that's used in the gradient descent rule, that is the partial derivative
+
+89
+00:04:10,678 --> 00:04:17,933
+是最优化目标 代价函数Jtrain(θ) 关于参数θj的偏微分
+
+90
+00:04:10,678 --> 00:04:17,933
+with respect to the parameters theta J of our optimization objective, J train of theta.
+
+91
+00:04:17,933 --> 00:04:23,386
+下面我们来看更高效的这种方法
+
+92
+00:04:17,933 --> 00:04:23,386
+Now, let's look at the more efficient algorithm that scales better to large data sets.
+
+93
+00:04:23,386 --> 00:04:26,489
+为了更好地描述随机梯度下降算法
+
+94
+00:04:23,386 --> 00:04:26,489
+In order to work off the algorithms called Stochastic gradient descent,
+
+95
+00:04:26,489 --> 00:04:32,657
+代价函数的定义有一点区别 我们定义参数θ
+
+96
+00:04:26,489 --> 00:04:32,657
+this vectors the cost function in a slightly different way then they define the cost of the parameter theta
+
+97
+00:04:32,657 --> 00:04:40,471
+关于训练样本(x(i),y(i))的代价 等于二分之一倍
+
+98
+00:04:32,657 --> 00:04:40,471
+with respect to a training example x(i), y(i) to be equal to one half times the squared error
+
+99
+00:04:40,471 --> 00:04:44,791
+我的假设h(x(i))跟实际输出y(i)的误差的平方
+
+100
+00:04:40,471 --> 00:04:44,791
+that my hypothesis incurs on that example, x(i), y(i).
+
+101
+00:04:44,791 --> 00:04:53,386
+因此这个代价函数值实际上测量的是我的假设在某个样本(x(i),y(i))上的表现
+
+102
+00:04:44,791 --> 00:04:53,386
+So this cost function term really measures how well is my hypothesis doing on a single example x(i), y(i).
+
+103
+00:04:53,386 --> 00:05:01,010
+你可能已经发现 总体的代价函数Jtrain可以被写成这样等效的形式
+
+104
+00:04:53,386 --> 00:05:01,010
+Now you notice that the overall cost function j train can now be written in this equivalent form.
+
+105
+00:05:01,010 --> 00:05:09,606
+Jtrain(θ)就是我的假设函数 在所有m个训练样本中的每一个样本(x(i),y(i))上的代价函数的平均值
+
+106
+00:05:01,010 --> 00:05:09,606
+So j train is just the average over my m training examples of the cost of my hypothesis on that example x(i), y(i).
+
+107
+00:05:09,606 --> 00:05:13,522
+用这样的方法应用到线性回归中
+
+108
+00:05:09,606 --> 00:05:13,522
+Armed with this view of the cost function for linear regression,
+
+109
+00:05:13,522 --> 00:05:17,636
+我来写出随机梯度下降的算法
+
+110
+00:05:13,522 --> 00:05:17,636
+let me now write out what Stochastic gradient descent does.
+
+111
+00:05:17,636 --> 00:05:26,940
+随机梯度下降法的第一步是将所有数据打乱
+
+112
+00:05:17,636 --> 00:05:26,940
+The first step of Stochastic gradient descent is to randomly shuffle the data set.
+
+113
+00:05:26,940 --> 00:05:32,539
+我说的随机打乱的意思是 将所有m个训练样本重新排列
+
+114
+00:05:26,940 --> 00:05:32,539
+So by that I just mean randomly shuffle, or randomly reorder your m training examples.
+
+115
+00:05:32,539 --> 00:05:37,450
+这就是标准的数据预处理过程 稍后我们再回来讲
+
+116
+00:05:32,539 --> 00:05:37,450
+It's sort of a standard pre-processing step, come back to this in a minute.
+
+117
+00:05:37,450 --> 00:05:42,997
+随机梯度下降的主要算法如下
+
+118
+00:05:37,450 --> 00:05:42,997
+But the main work of Stochastic gradient descent is then done in the following.
+
+119
+00:05:42,997 --> 00:05:48,150
+在i等于1到m中进行循环
+
+120
+00:05:42,997 --> 00:05:48,150
+We're going to repeat for i equals 1 through m.
+
+121
+00:05:48,150 --> 00:05:53,067
+也就是对所有m个训练样本进行遍历 然后进行如下更新
+
+122
+00:05:48,150 --> 00:05:53,067
+So we'll repeatedly scan through my training examples and perform the following update.
+
+123
+00:05:53,067 --> 00:06:06,523
+我们按照这样的公式进行更新
+
+124
+00:05:53,067 --> 00:06:06,523
+Gonna update the parameter theta j as theta j minus alpha times h of x(i) minus y(i) times x(i)j.
+
+125
+00:06:06,523 --> 00:06:12,961
+同样还是对所有j的值进行
+
+126
+00:06:06,523 --> 00:06:12,961
+And we're going to do this update as usual for all values of j.
+
+127
+00:06:12,961 --> 00:06:24,708
+不难发现 这一项实际上就是我们批量梯度下降算法中 求和式里面的那一部分
+
+128
+00:06:12,961 --> 00:06:24,708
+Now, you notice that this term over here is exactly what we had inside the summation for Batch gradient descent.
+
+129
+00:06:24,708 --> 00:06:31,256
+事实上 如果你数学比较好的话 你可以证明这一项 也就是这一项
+
+130
+00:06:24,708 --> 00:06:31,256
+In fact, for those of you that are calculus is possible to show that that term here, that's this term here,
+
+131
+00:06:31,256 --> 00:06:43,511
+是等于这个cost函数关于参数θj的偏微分
+
+132
+00:06:31,256 --> 00:06:43,511
+is equal to the partial derivative with respect to my parameter theta j of the cost of the parameters theta on x(i), y(i).
+
+133
+00:06:43,511 --> 00:06:47,383
+这个cost函数就是我们之前先定义的代价函数
+
+134
+00:06:43,511 --> 00:06:47,383
+Where cost is of course this thing that was defined previously.
+
+135
+00:06:47,383 --> 00:06:52,081
+最后画上大括号结束算法的循环
+
+136
+00:06:47,383 --> 00:06:52,081
+And just the wrap of the algorithm, let me close my curly braces over there.
+
+137
+00:06:52,081 --> 00:06:59,365
+随机梯度下降的做法实际上就是扫描所有的训练样本
+
+138
+00:06:52,081 --> 00:06:59,365
+So what Stochastic gradient descent is doing is it is actually scanning through the training examples.
+
+139
+00:06:59,365 --> 00:07:04,349
+首先是我的第一组训练样本(x(1),y(1))
+
+140
+00:06:59,365 --> 00:07:04,349
+And first it's gonna look at my first training example x(1), y(1).
+
+141
+00:07:04,349 --> 00:07:09,399
+然后只对这第一个训练样本 我们的梯度下降
+
+142
+00:07:04,349 --> 00:07:09,399
+And then looking at only this first example, it's gonna take like a basically a little gradient descent step
+
+143
+00:07:09,399 --> 00:07:13,725
+只对这第一个训练样本的代价函数进行
+
+144
+00:07:09,399 --> 00:07:13,725
+with respect to the cost of just this first training example.
+
+145
+00:07:13,725 --> 00:07:15,717
+换句话说 我们要关注第一个样本
+
+146
+00:07:13,725 --> 00:07:15,717
+So in other words, we're going to look at the first example
+
+147
+00:07:15,717 --> 00:07:21,214
+然后把参数θ稍微修改一点 使其对第一个训练样本的拟合变得好一点
+
+148
+00:07:15,717 --> 00:07:21,214
+and modify the parameters a little bit to fit just the first training example a little bit better.
+
+149
+00:07:21,214 --> 00:07:29,244
+完成这个内层循环以后 然后再转向第二个训练样本
+
+150
+00:07:21,214 --> 00:07:29,244
+Having done this inside this inner for-loop is then going to go on to the second training example.
+
+151
+00:07:29,244 --> 00:07:33,848
+然后还是一样 在参数空间中进步一小步
+
+152
+00:07:29,244 --> 00:07:33,848
+And what it's going to do there is take another little step in parameter space,
+
+153
+00:07:33,848 --> 00:07:39,682
+也就是稍微把参数修改一点 然后让它对第二个样本的拟合更好一点
+
+154
+00:07:33,848 --> 00:07:39,682
+so modify the parameters just a little bit to try to fit just a second training example a little bit better.
+
+155
+00:07:39,682 --> 00:07:44,130
+做完第二个 再转向第三个训练样本
+
+156
+00:07:39,682 --> 00:07:44,130
+Having done that, is then going to go onto my third training example.
+
+157
+00:07:44,130 --> 00:07:51,722
+同样还是修改参数 让它更好的拟合第三个训练样本
+
+158
+00:07:44,130 --> 00:07:51,722
+And modify the parameters to try to fit just the third training example a little bit better, and so on
+
+159
+00:07:51,722 --> 00:07:55,114
+以此类推 直到完成所有的训练集
+
+160
+00:07:51,722 --> 00:07:55,114
+until you know, you get through the entire training set.
+
+161
+00:07:55,114 --> 00:08:01,297
+因此这种重复循环会遍历整个训练集
+
+162
+00:07:55,114 --> 00:08:01,297
+And then this ultra repeat loop may cause it to take multiple passes over the entire training set.
+
+163
+00:08:01,297 --> 00:08:07,346
+从这个角度分析随机梯度下降算法 我们能更好地理解为什么一开始要随机打乱数据
+
+164
+00:08:01,297 --> 00:08:07,346
+This view of Stochastic gradient descent also motivates why we wanted to start by randomly shuffling the data set.
+
+165
+00:08:07,346 --> 00:08:10,772
+这保证了我们在扫描训练集时
+
+166
+00:08:07,346 --> 00:08:10,772
+This doesn't show us that when we scan through the training site here,
+
+167
+00:08:10,772 --> 00:08:15,197
+我们对训练集样本的访问时随机的顺序
+
+168
+00:08:10,772 --> 00:08:15,197
+that we end up visiting the training examples in some sort of randomly sorted order.
+
+169
+00:08:15,197 --> 00:08:21,229
+不管你的数据是否已经随机排列过 或者一开始就是某个奇怪的顺序
+
+170
+00:08:15,197 --> 00:08:21,229
+Depending on whether your data already came randomly sorted or whether it came originally sorted in some strange order,
+
+171
+00:08:21,229 --> 00:08:26,391
+实际上这一步能让你的随机梯度下降稍微快一些收敛
+
+172
+00:08:21,229 --> 00:08:26,391
+in practice this would just speed up the conversions to Stochastic gradient descent just a little bit.
+
+173
+00:08:26,391 --> 00:08:30,985
+所以为了保险起见 最好还是先把所有数据随机打乱一下
+
+174
+00:08:26,391 --> 00:08:30,985
+So in the interest of safety, it's usually better to randomly shuffle the data set if you aren't sure
+
+175
+00:08:30,985 --> 00:08:34,056
+如果你不知道是否已经随机排列过的话
+
+176
+00:08:30,985 --> 00:08:34,056
+if it came to you in randomly sorted order.
+
+177
+00:08:34,056 --> 00:08:37,240
+但随机梯度下降的更重要的一点是
+
+178
+00:08:34,056 --> 00:08:37,240
+But more importantly another view of Stochastic gradient descent is
+
+179
+00:08:37,240 --> 00:08:45,504
+跟批量梯度下降不同 随机梯度下降不需要对所有m个训练样本
+
+180
+00:08:37,240 --> 00:08:45,504
+that it's a lot like descent but rather than wait to sum up these gradient terms over all m training examples,
+
+181
+00:08:45,504 --> 00:08:50,624
+求和来得到梯度项 而是只需要对单个训练样本求出这个梯度项
+
+182
+00:08:45,504 --> 00:08:50,624
+what we're doing is we're taking this gradient term using just one single training example
+
+183
+00:08:50,624 --> 00:08:54,810
+我们已经在这个过程中开始优化参数了
+
+184
+00:08:50,624 --> 00:08:54,810
+and we're starting to make progress in improving the parameters already.
+
+185
+00:08:54,810 --> 00:09:02,248
+就不用把所有那3亿的美国人口普查的数据拿来遍历一遍
+
+186
+00:08:54,810 --> 00:09:02,248
+So rather than, you know, waiting 'till taking a path through all 300,000 United States Census records,
+
+187
+00:09:02,248 --> 00:09:05,632
+不需要对所有这些数据进行扫描
+
+188
+00:09:02,248 --> 00:09:05,632
+say, rather than needing to scan through all of the training examples
+
+189
+00:09:05,632 --> 00:09:09,947
+然后才一点点地修改参数 直到达到全局最小值
+
+190
+00:09:05,632 --> 00:09:09,947
+before we can modify the parameters a little bit and make progress towards a global minimum.
+
+191
+00:09:09,947 --> 00:09:14,975
+对随机梯度下降来说 我们只需要一次关注一个训练样本
+
+192
+00:09:09,947 --> 00:09:14,975
+For Stochastic gradient descent instead we just need to look at a single training example
+
+193
+00:09:14,975 --> 00:09:22,188
+而我们已经开始一点点把参数朝着全局最小值的方向进行修改了
+
+194
+00:09:14,975 --> 00:09:22,188
+and we're already starting to make progress in this case of parameters towards, moving the parameters towards the global minimum.
+
+195
+00:09:22,188 --> 00:09:27,558
+这里把这个算法再重新写一遍 第一步是打乱数据
+
+196
+00:09:22,188 --> 00:09:27,558
+So, here's the algorithm written out again where the first step is to randomly shuffle the data
+
+197
+00:09:27,558 --> 00:09:35,089
+第二步是算法的关键 是关于某个单一的训练样本(x(i),y(i))来对参数进行更新
+
+198
+00:09:27,558 --> 00:09:35,089
+and the second step is where the real work is done, where that's the update with respect to a single training example x(i), y(i).
+
+199
+00:09:35,089 --> 00:09:40,139
+让我们来看看 这个算法是如何更新参数θ的
+
+200
+00:09:35,089 --> 00:09:40,139
+So, let's see what this algorithm does to the parameters.
+
+201
+00:09:40,139 --> 00:09:43,467
+之前我们已经看到 当使用批量梯度下降的时候
+
+202
+00:09:40,139 --> 00:09:43,467
+Previously, we saw that when we are using Batch gradient descent,
+
+203
+00:09:43,467 --> 00:09:46,331
+需要考虑所有的训练样本数据
+
+204
+00:09:43,467 --> 00:09:46,331
+that is the algorithm that looks at all the training examples in time,
+
+205
+00:09:46,331 --> 00:09:53,397
+批量梯度下降的收敛过程 会倾向于一条近似的直线 一直找到全局最小值
+
+206
+00:09:46,331 --> 00:09:53,397
+Batch gradient descent will tend to, you know, take a reasonably straight line trajectory to get to the global minimum like that.
+
+207
+00:09:53,397 --> 00:09:59,956
+与此不同的是 在随机梯度下降中 每一次迭代都会更快
+
+208
+00:09:53,397 --> 00:09:59,956
+In contrast with Stochastic gradient descent every iteration is going to be much faster
+
+209
+00:09:59,956 --> 00:10:03,108
+因为我们不需要对所有训练样本进行求和
+
+210
+00:09:59,956 --> 00:10:03,108
+because we don't need to sum up over all the training examples.
+
+211
+00:10:03,108 --> 00:10:07,259
+每一次迭代只需要保证对一个训练样本拟合好就行了
+
+212
+00:10:03,108 --> 00:10:07,259
+But every iteration is just trying to fit single training example better.
+
+213
+00:10:07,259 --> 00:10:13,931
+所以 如果我们从这个点开始进行随机梯度下降的话
+
+214
+00:10:07,259 --> 00:10:13,931
+So, if we were to start stochastic gradient descent, oh, let's start stochastic gradient descent at a point like that.
+
+215
+00:10:13,931 --> 00:10:19,556
+第一次迭代 可能会让参数朝着这个方向移动
+
+216
+00:10:13,931 --> 00:10:19,556
+The first iteration, you know, may take the parameters in that direction and
+
+217
+00:10:19,556 --> 00:10:23,791
+然后第二次迭代 只考虑第二个训练样本
+
+218
+00:10:19,556 --> 00:10:23,791
+maybe the second iteration looking at just the second example maybe just by chance,
+
+219
+00:10:23,791 --> 00:10:28,278
+假如很不幸 我们朝向了一个错误的方向
+
+220
+00:10:23,791 --> 00:10:28,278
+we get more unlucky and actually head in a bad direction with the parameters like that.
+
+221
+00:10:28,278 --> 00:10:33,731
+第三次迭代 我们又尽力让参数修改到拟合第三组训练样本
+
+222
+00:10:28,278 --> 00:10:33,731
+In the third iteration where we tried to modify the parameters to fit just the third training examples better,
+
+223
+00:10:33,731 --> 00:10:36,418
+可能最终会得到这个方向
+
+224
+00:10:33,731 --> 00:10:36,418
+maybe we'll end up heading in that direction.
+
+225
+00:10:36,418 --> 00:10:42,717
+然后再考虑第四个训练样本 做同样的事 然后第五第六第七 等等
+
+226
+00:10:36,418 --> 00:10:42,717
+And then we'll look at the fourth training example and we will do that. The fifth example, sixth example, 7th and so on.
+
+227
+00:10:42,717 --> 00:10:46,725
+在你运行随机梯度下降的过程中 你会发现
+
+228
+00:10:42,717 --> 00:10:46,725
+And as you run Stochastic gradient descent, what you find is that
+
+229
+00:10:46,725 --> 00:10:52,923
+一般来讲 参数是朝着全局最小值的方向被更新的 但也不一定
+
+230
+00:10:46,725 --> 00:10:52,923
+it will generally move the parameters in the direction of the global minimum, but not always.
+
+231
+00:10:52,923 --> 00:11:00,117
+所以看起来它是以某个比较随机、迂回的路径在朝全局最小值逼近
+
+232
+00:10:52,923 --> 00:11:00,117
+And so take some more random-looking, circuitous path to watch the global minimum.
+
+233
+00:11:00,117 --> 00:11:07,630
+实际上 你运行随机梯度下降 和批量梯度下降 两种方法的收敛形式是不同的
+
+234
+00:11:00,117 --> 00:11:07,630
+And in fact as you run Stochastic gradient descent it doesn't actually converge in the same same sense as Batch gradient descent does
+
+235
+00:11:07,630 --> 00:11:15,196
+实际上随机梯度下降是在某个靠近全局最小值的区域内徘徊
+
+236
+00:11:07,630 --> 00:11:15,196
+and what it ends up doing is wandering around continuously in some region that's in some region close to the global minimum,
+
+237
+00:11:15,196 --> 00:11:18,740
+而不是直接逼近全局最小值并停留在那点
+
+238
+00:11:15,196 --> 00:11:18,740
+but it doesn't just get to the global minimum and stay there.
+
+239
+00:11:18,740 --> 00:11:21,676
+但实际上这并没有多大问题
+
+240
+00:11:18,740 --> 00:11:21,676
+But in practice this isn't a problem because, you know, so
+
+241
+00:11:21,676 --> 00:11:26,788
+只要参数最终移动到某个非常靠近全局最小值的区域内
+
+242
+00:11:21,676 --> 00:11:26,788
+long as the parameters end up in some region there maybe it is pretty close to the global minimum.
+
+243
+00:11:26,788 --> 00:11:32,164
+只要参数逼近到足够靠近全局最小值 这也会得出一个较为不错的假设
+
+244
+00:11:26,788 --> 00:11:32,164
+So, as parameters end up pretty close to the global minimum, that will be a pretty good hypothesis
+
+245
+00:11:32,164 --> 00:11:36,340
+所以 通常我们用随机梯度下降法
+
+246
+00:11:32,164 --> 00:11:36,340
+and so usually running Stochastic gradient descent
+
+247
+00:11:36,340 --> 00:11:43,658
+也能得到一个很接近全局最小值的参数 对于实际应用的目的来说 已经足够了
+
+248
+00:11:36,340 --> 00:11:43,658
+we get a parameter near the global minimum and that's good enough for, you know, essentially any, most practical purposes.
+
+249
+00:11:43,658 --> 00:11:47,121
+最后一点细节 在随机梯度下降中
+
+250
+00:11:43,658 --> 00:11:47,121
+Just one final detail. In Stochastic gradient descent,
+
+251
+00:11:47,121 --> 00:11:51,099
+我们有一个外层循环 它决定了内层循环的执行次数
+
+252
+00:11:47,121 --> 00:11:51,099
+we had this outer loop repeat which says to do this inner loop multiple times.
+
+253
+00:11:51,099 --> 00:11:53,892
+所以 外层循环应该执行多少次呢
+
+254
+00:11:51,099 --> 00:11:53,892
+So, how many times do we repeat this outer loop?
+
+255
+00:11:53,892 --> 00:11:59,336
+这取决于训练样本的大小 通常一次就够了
+
+256
+00:11:53,892 --> 00:11:59,336
+Depending on the size of the training set, doing this loop just a single time may be enough.
+
+257
+00:11:59,336 --> 00:12:02,064
+最多到10次 是比较典型的
+
+258
+00:11:59,336 --> 00:12:02,064
+And up to, you know, maybe 10 times may be typical
+
+259
+00:12:02,064 --> 00:12:05,852
+所以我们可以循环执行内层1到10次
+
+260
+00:12:02,064 --> 00:12:05,852
+so we may end up repeating this inner loop anywhere from once to ten times.
+
+261
+00:12:05,852 --> 00:12:12,309
+因此 如果我们有非常大量的数据 比如美国普查的人口数据
+
+262
+00:12:05,852 --> 00:12:12,309
+So if we have a you know, truly massive data set like the this US census gave us that example
+
+263
+00:12:12,309 --> 00:12:15,260
+我说的3亿人口数据的例子
+
+264
+00:12:12,309 --> 00:12:15,260
+that I've been talking about with 300 million examples,
+
+265
+00:12:15,260 --> 00:12:19,609
+所以每次你只需要考虑一个训练样本
+
+266
+00:12:15,260 --> 00:12:19,609
+it is possible that by the time you've taken just a single pass through your training set.
+
+267
+00:12:19,609 --> 00:12:23,073
+这里的i就是从1到3亿了
+
+268
+00:12:19,609 --> 00:12:23,073
+So, this is for i equals 1 through 300 million.
+
+269
+00:12:23,073 --> 00:12:25,720
+所以可能你每次只需要考虑一个训练样本
+
+270
+00:12:23,073 --> 00:12:25,720
+It's possible that by the time you've taken a single pass through your data set
+
+271
+00:12:25,720 --> 00:12:29,872
+你就能训练出非常好的假设
+
+272
+00:12:25,720 --> 00:12:29,872
+you might already have a perfectly good hypothesis.
+
+273
+00:12:29,872 --> 00:12:36,613
+这时 由于m非常大 那么内循环只用做一次就够了
+
+274
+00:12:29,872 --> 00:12:36,613
+In which case, you know, this inner loop you might need to do only once if m is very, very large.
+
+275
+00:12:36,613 --> 00:12:43,071
+但通常来说 循环1到10次都是非常合理的
+
+276
+00:12:36,613 --> 00:12:43,071
+But in general taking anywhere from 1 through 10 passes through your data set, you know, maybe fairly common.
+
+277
+00:12:43,071 --> 00:12:45,439
+但这还是取决于你训练样本的大小
+
+278
+00:12:43,071 --> 00:12:45,439
+But really it depends on the size of your training set.
+
+279
+00:12:45,439 --> 00:12:49,413
+如果你跟批量梯度下降比较一下的话
+
+280
+00:12:45,439 --> 00:12:49,413
+And if you contrast this to Batch gradient descent.
+
+281
+00:12:49,413 --> 00:12:53,905
+批量梯度下降在一步梯度下降的过程中
+
+282
+00:12:49,413 --> 00:12:53,905
+With Batch gradient descent, after taking a pass through your entire training set,
+
+283
+00:12:53,905 --> 00:12:57,034
+就需要考虑全部的训练样本
+
+284
+00:12:53,905 --> 00:12:57,034
+you would have taken just one single gradient descent steps.
+
+285
+00:12:57,034 --> 00:13:01,983
+所以批量梯度下降就是这样微小的一次次移动
+
+286
+00:12:57,034 --> 00:13:01,983
+So one of these little baby steps of gradient descent where you just take one small gradient descent step
+
+287
+00:13:01,983 --> 00:13:05,776
+这也是为什么随机梯度下降法要快得多
+
+288
+00:13:01,983 --> 00:13:05,776
+and this is why Stochastic gradient descent can be much faster.
+
+289
+00:13:05,776 --> 00:13:10,880
+这就是随机梯度下降了
+
+290
+00:13:05,776 --> 00:13:10,880
+So, that was the Stochastic gradient descent algorithm.
+
+291
+00:13:10,880 --> 00:13:15,594
+如果你做好了 你应该能把在很多学习算法中应用大量数据了
+
+292
+00:13:10,880 --> 00:13:15,594
+And if you implement it, hopefully that will allow you to scale up many of your learning algorithms
+
+293
+00:13:15,594 --> 99:59:59,000
+并且会得到更好的算法表现
+
+294
+00:13:15,594 --> 99:59:59,000
+to much bigger data sets and get much more performance that way.
+
diff --git a/srt/17 - 3 - Mini-Batch Gradient Descent (6 min).srt b/srt/17 - 3 - Mini-Batch Gradient Descent (6 min).srt
new file mode 100644
index 00000000..159c9e19
--- /dev/null
+++ b/srt/17 - 3 - Mini-Batch Gradient Descent (6 min).srt
@@ -0,0 +1,310 @@
+1
+00:00:00,000 --> 00:00:07,306
+In the previous video, we talked about Stochastic gradient descent, and how that can be much faster than Batch gradient descent.
+在前面的视频中,我们讨论了随机梯度下降算法,并讨论了它如何比批量梯度下降算法快。(字幕翻译:中国海洋大学,仇利克)
+
+2
+00:00:07,306 --> 00:00:12,866
+In this video, let's talk about another variation on these ideas is called Mini-batch gradient descent
+这个视频,我们讨论这种思想的另外一种变种,称为微型批量梯度下降,
+
+3
+00:00:12,866 --> 00:00:16,906
+they can work sometimes even a bit faster than stochastic gradient descent.
+它们有时甚至比随机梯度下降还要快一点。
+
+4
+00:00:16,906 --> 00:00:22,046
+To summarize the algorithms we talked about so far.
+总而言之,我们迄今为止讲过的算法。
+
+5
+00:00:22,046 --> 00:00:26,619
+In Batch gradient descent we will use all m examples in each generation.
+批量梯度下降算法中,每次迭代,我们都要用到所有的m个样本。
+
+6
+00:00:26,619 --> 00:00:31,792
+Whereas in Stochastic gradient descent we will use a single example in each generation.
+而在随机梯度下降算法中,每次迭代只需使用一个样本。
+
+7
+00:00:31,792 --> 00:00:36,120
+What Mini-batch gradient descent does is somewhere in between.
+微型批量梯度下降算法介于两者之间。
+
+8
+00:00:36,120 --> 00:00:46,559
+Specifically, with this algorithm we're going to use b examples in each iteration where b is a parameter called the "mini batch size"
+特别的,微型批量梯度下降算法中,每次迭代我们将使用b个样本,b是一个称为“小批量大小”的参数。
+
+9
+00:00:46,559 --> 00:00:52,688
+so the idea is that this is somewhat in-between Batch gradient descent and Stochastic gradient descent.
+所以,它是介于批量梯度下降算法和随机梯度下降算法中间的一种算法。
+
+10
+00:00:52,688 --> 00:00:57,488
+This is just like batch gradient descent, except that I'm going to use a much smaller batch size.
+这正如批量梯度下降算法,只是我们将用一个少得多的批量大小。
+
+11
+00:00:57,488 --> 00:01:08,672
+A typical choice for the value of b might be b equals 10, lets say, and a typical range really might be anywhere from b equals 2 up to b equals 100.
+通常选择b等于10,b通常的变化范围是2到100.
+
+12
+00:01:08,672 --> 00:01:13,668
+So that will be a pretty typical range of values for the Mini-batch size.
+对微型批量梯度下降算法,这将是一个常用的取值范围。
+
+13
+00:01:13,668 --> 00:01:21,153
+And the idea is that rather than using one example at a time or m examples at a time we will use b examples at a time.
+它的思想是,既不是一次使用一个样本,也不是一次使用m个样本,而是一次使用b个样本。
+
+14
+00:01:21,153 --> 00:01:28,833
+So let me just write this out informally, we're going to get, let's say, b. For this example, let's say b equals 10.
+因此让我将它非正式的写下来,我们将得到b,比如,让b等于10.
+
+15
+00:01:28,833 --> 00:01:37,782
+So we're going to get, the next 10 examples from my training set so that may be some set of examples xi, yi.
+因此我们将从训练集中得到后续的10个样本,可能是xi, yi的一些组合。
+
+16
+00:01:37,782 --> 00:01:46,114
+If it's 10 examples then the indexing will be up to x (i+9), y (i+9)
+如果是10个样本,下标最大是x(i+9), y(i+9),
+
+17
+00:01:46,114 --> 00:01:57,794
+so that's 10 examples altogether and then we'll perform essentially a gradient descent update using these 10 examples.
+我们将使用这10个样本执行梯度下降算法,完成更新。
+
+18
+00:01:57,794 --> 00:02:19,012
+So, that's any rate times one tenth times sum over k equals i through i+9 of h subscript theta of x(k) minus y(k) times x(k)j.
+因此,k=i:i+9,theta等于theta减去一个10次求和,这个求和公式等于(hx(k)-y(k))*x(k)j.
+
+19
+00:02:19,012 --> 00:02:27,213
+And so in this expression, where summing the gradient terms over my ten examples.
+在以上表述中,是在10个样本上进行梯度求和。
+
+20
+00:02:27,229 --> 00:02:32,370
+So, that's number ten, that's, you know, my mini batch size and just i+9 again,
+因此,那是数字10,你知道的,我最小的批量大小,仅仅i+9,
+
+21
+00:02:32,370 --> 00:02:39,384
+the 9 comes from the choice of the parameter b, and then after this we will then increase, you know,
+9来自我们对参数b的选择,运算完后,我们将增加i到10,你知道的
+
+22
+00:02:39,384 --> 00:02:46,755
+i by tenth, we will go on to the next ten examples and then keep moving like this.
+我们将继续使用后续的10个样本,并像这样一直进行下去。
+
+23
+00:02:46,755 --> 00:02:50,584
+So just to write out the entire algorithm in full.
+因此,让我写下完整的算法。
+
+24
+00:02:50,584 --> 00:02:55,231
+In order to simplify the indexing for this one at the right top,
+为了简化下标
+
+25
+00:02:55,231 --> 00:02:59,843
+I'm going to assume we have a mini-batch size of ten and a training set size of a thousand,
+假设我有最小批量大小为10,训练样本大小为1000,
+
+26
+00:02:59,843 --> 00:03:05,045
+what we're going to do is have this sort of form, for i equals 1 and that in 21's the stepping,
+我们要做的是使用这种形式,i=1,在第21步,
+
+27
+00:03:05,045 --> 00:03:07,926
+in steps of 10 because we look at 10 examples at a time.
+步骤10,因为我们每次使用10个样本。
+
+28
+00:03:07,926 --> 00:03:13,648
+And then we perform this sort of gradient descent update using ten examples at a time
+然后,我们执行梯度下降算法,一次用10个样本进行更新,
+
+29
+00:03:13,648 --> 00:03:21,566
+so this 10 and this i+9 those are consequence of having chosen my mini-batch to be ten.
+10和i+9都说明我的微型批量梯度下降算法选择的是10个样本。
+
+30
+00:03:21,566 --> 00:03:27,435
+And you know, this ultimate four-loop, this ends at 991 here because
+你知道的,这个终极for循环,它在991结束,因为
+
+31
+00:03:27,435 --> 00:03:34,457
+if I have 1000 training samples then I need 100 steps of size 10 in order to get through my training set.
+如果我有1000个训练样本,我需要循环100次,每次10步才能遍历我的整个训练集。
+
+32
+00:03:34,457 --> 00:03:37,729
+So this is mini-batch gradient descent.
+这就是微型批量梯度下降算法。
+
+33
+00:03:37,729 --> 00:03:43,219
+Compared to batch gradient descent, this also allows us to make progress much faster.
+与批量梯度下降相比,微型批量梯度下降速度更快。
+
+34
+00:03:43,219 --> 00:03:49,487
+So we have again our running example of, you know, U.S. Census data with 300 million training examples,
+因此,我们有了运行实例,你知道的,3亿条美国人口普查数据的训练样本,
+
+35
+00:03:49,487 --> 00:03:55,621
+then what we're saying is after looking at just the first 10 examples we can start to make progress
+然后我们想说的是,仅仅遍历前面10个样本后,我们就可以运行算法
+
+36
+00:03:55,621 --> 00:04:00,873
+in improving the parameters theta so we don't need to scan through the entire training set.
+更新参数theta,因此,我们不需要遍历整个训练样本集。
+
+37
+00:04:00,873 --> 00:04:05,377
+We just need to look at the first 10 examples and this will start letting us make progress and then
+我们仅仅需要前面10个样本就可以运行算法,然后
+
+38
+00:04:05,377 --> 00:04:09,289
+we can look at the second ten examples and modify the parameters a little bit again and so on.
+我们通过使用后续的10个样本来更新参数theta,以此类推。
+
+39
+00:04:09,289 --> 00:04:14,186
+So, that is why Mini-batch gradient descent can be faster than batch gradient descent.
+因此,这就是为什么微型批量梯度下降算法比批量梯度下降算法要快。
+
+40
+00:04:14,186 --> 00:04:19,578
+Namely, you can start making progress in modifying the parameters after looking at just ten examples
+也就是说,我们仅仅使用前面的10个样本就可以运行算法更新参数,
+
+41
+00:04:19,578 --> 00:04:24,836
+rather than needing to wait 'till you've scan through every single training example of 300 million of them.
+而不需要等我们遍历完所有的3亿个样本后才能执行算法更新参数。
+
+42
+00:04:24,836 --> 00:04:29,699
+So, how about Mini-batch gradient descent versus Stochastic gradient descent.
+因此,微型批量梯度下降算法与随机梯度下降算法比如何呢。
+
+43
+00:04:29,699 --> 00:04:38,237
+So, why do we want to look at b examples at a time rather than look at just a single example at a time as the Stochastic gradient descent?
+为什么我们每次使用b个样本而不是像随机梯度下降算法每次使用一个样本?
+
+44
+00:04:38,237 --> 00:04:42,044
+The answer is in vectorization.
+答案是在向量化。
+
+45
+00:04:42,044 --> 00:04:47,450
+In particular, Mini-batch gradient descent is likely to outperform Stochastic gradient descent
+特别的,微型批量梯度下降算法胜过随机梯度下降算法
+
+46
+00:04:47,450 --> 00:04:50,817
+only if you have a good vectorized implementation.
+仅当你有一个好的向量化实施。
+
+47
+00:04:50,817 --> 00:04:58,571
+In that case, the sum over 10 examples can be performed in a more vectorized way
+那样的话,10个样本的总和可以以更向量化的方式执行,
+
+48
+00:04:58,571 --> 00:05:05,376
+which will allow you to partially parallelize your computation over the ten examples.
+这将允许你在10个样本上部分的实现并行计算。
+
+49
+00:05:05,376 --> 00:05:09,953
+So, in other words, by using appropriate vectorization to compute the rest of the terms,
+因此,换句话说,通过使用适当的向量化计算余下的样本,
+
+50
+00:05:09,953 --> 00:05:18,565
+you can sometimes partially use the good numerical algebra libraries and parallelize your gradient computations over the b examples,
+有时你可以部分的使用好数值代数库,在b个样本上并行你的梯度计算,
+
+51
+00:05:18,565 --> 00:05:24,152
+whereas if you were looking at just a single example of time with Stochastic gradient descent then, you know,
+然而,如果你像随机梯度下降算法一样每次仅遍历一个样本,你知道的,
+
+52
+00:05:24,152 --> 00:05:27,456
+just looking at one example at a time their isn't much to parallelize over.
+一次仅遍历一个样本并没有太多的并行计算。
+
+53
+00:05:27,456 --> 00:05:29,824
+At least there is less to parallelize over.
+至少,有很少的并行计算。
+
+54
+00:05:29,824 --> 00:05:34,866
+One disadvantage of Mini-batch gradient descent is that there is now this extra parameter b,
+微型批量梯度下降算法的缺点是,有一个额外的参数b,
+
+55
+00:05:34,866 --> 00:05:39,006
+the Mini-batch size which you may have to fiddle with, and which may therefore take time.
+你需要确定b的大小,因此可能需要费些时间。
+
+56
+00:05:39,006 --> 00:05:45,611
+But if you have a good vectorized implementation this can sometimes run even faster than Stochastic gradient descent.
+但是,如果你有一个很好地量化实现,有时它将比随机梯度下降运行的更快。
+
+57
+00:05:45,611 --> 00:05:52,937
+So that was Mini-batch gradient descent which is an algorithm that in some sense does something
+这就是微型批量梯度下降算法,某种意义上来说,它是一种介于
+
+58
+00:05:52,937 --> 00:05:57,697
+that's somewhat in between what Stochastic gradient descent does and what Batch gradient descent does.
+随机梯度下降和批量梯度下降算法中间的一种算法。
+
+59
+00:05:57,697 --> 00:06:02,626
+And if you choose their reasonable value of b. I usually use b equals 10, but,
+如果你选择了合适的参数b。我通常使b等于10,但是
+
+60
+00:06:02,626 --> 00:06:07,343
+you know, other values, anywhere from say 2 to 100, would be reasonably common.
+你知道的,另外一些值,从2到100,都将是合理的范围。
+
+61
+00:06:07,343 --> 00:06:11,917
+So we choose value of b and if you use a good vectorized implementation,
+因此我们选择b的值,如果使用一个好的量化实现,
+
+62
+00:06:11,917 --> 00:06:15,917
+sometimes it can be faster than both Stochastic gradient descent and faster than Batch gradient descent.
+有时候它会比随机梯度下降算法和批量梯度下降算法的速度都要快。
+
diff --git a/srt/17 - 4 - Stochastic Gradient Descent Convergence (12 min).srt b/srt/17 - 4 - Stochastic Gradient Descent Convergence (12 min).srt
new file mode 100644
index 00000000..78f1ba24
--- /dev/null
+++ b/srt/17 - 4 - Stochastic Gradient Descent Convergence (12 min).srt
@@ -0,0 +1,631 @@
+1
+00:00:00,493 --> 00:00:03,492
+You now know about the stochastic gradient descent algorithm.
+现在你已经知道了随机梯度下降算法
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:03,492 --> 00:00:09,907
+But when you're running the algorithm, how do you make sure that it's completely debugged and is converging okay?
+但是当你在运行这个算法时 你如何确保调试过程已经完成 并且收敛到合适的位置呢?
+
+3
+00:00:09,907 --> 00:00:15,813
+Equally important, how do you tune the learning rate alpha with Stochastic Gradient Descent.
+还有 同样重要的是 你怎样调整随机梯度下降中学习速率α的值
+
+4
+00:00:15,813 --> 00:00:25,950
+In this video we'll talk about some techniques for doing these things, for making sure it's converging and for picking the learning rate alpha.
+在这段视频中 我们会谈到一些方法来处理这些问题 确保它能收敛 以及选择合适的学习速率α
+
+5
+00:00:25,950 --> 00:00:30,600
+Back when we were using batch gradient descent, our standard way for making sure that
+回到我们之前批量梯度下降的算法 我们确定梯度下降已经收敛的一个标准方法
+
+6
+00:00:30,600 --> 00:00:36,493
+gradient descent was converging was we would plot the optimization cost function as a function of the number of iterations.
+是画出最优化的代价函数 关于迭代次数的变化
+
+7
+00:00:36,493 --> 00:00:44,366
+So that was the cost function and we would make sure that this cost function is decreasing on every iteration.
+这就是代价函数 我们要保证这个代价函数在每一次迭代中 都是下降的
+
+8
+00:00:44,366 --> 00:00:50,438
+When the training set sizes were small, we could do that because we could compute the sum pretty efficiently.
+当训练集比较小的时候 我们不难完成 因为要计算这个求和是比较方便的
+
+9
+00:00:50,438 --> 00:00:57,950
+But when you have a massive training set size then you don't want to have to pause your algorithm periodically.
+但当你的训练集非常大的时候 你不希望老是定时地暂停算法
+
+10
+00:00:57,950 --> 00:01:04,045
+You don't want to have to pause stochastic gradient descent periodically in order to compute this cost function
+来计算一遍这个求和
+
+11
+00:01:04,045 --> 00:01:07,442
+since it requires a sum of your entire training set size.
+因为这个求和计算需要考虑整个的训练集
+
+12
+00:01:07,442 --> 00:01:12,466
+And the whole point of stochastic gradient was that you wanted to start to make progress
+而随机梯度下降的算法是 你每次只考虑一个样本
+
+13
+00:01:12,466 --> 00:01:19,130
+after looking at just a single example without needing to occasionally scan through your entire training set
+然后就立刻进步一点点 不需要在算法当中 时不时地扫描一遍全部的训练集
+
+14
+00:01:19,130 --> 00:01:25,583
+right in the middle of the algorithm, just to compute things like the cost function of the entire training set.
+来计算整个训练集的代价函数
+
+15
+00:01:25,583 --> 00:01:32,472
+So for stochastic gradient descent, in order to check the algorithm is converging, here's what we can do instead.
+因此 对于随机梯度下降算法 为了检查算法是否收敛 我们可以进行下面的工作
+
+16
+00:01:32,472 --> 00:01:36,367
+Let's take the definition of the cost that we had previously.
+让我们沿用之前定义的cost函数
+
+17
+00:01:36,367 --> 00:01:42,647
+So the cost of the parameters theta with respect to a single training example is just one half of the square error on that training example.
+关于θ的cost函数 等于二分之一倍的训练误差的平方和
+
+18
+00:01:42,647 --> 00:01:49,754
+Then, while stochastic gradient descent is learning, right before we train on a specific example.
+然后 在随机梯度下降法学习的时 在我们对某一个样本进行训练前
+
+19
+00:01:49,754 --> 00:01:54,601
+So, in stochastic gradient descent we're going to look at the examples xi, yi, in order, and
+在随机梯度下降中 我们要关注样本(x(i),y(i))
+
+20
+00:01:54,601 --> 00:01:57,329
+then sort of take a little update with respect to this example.
+然后关于这个样本更新一小步 进步一点点
+
+21
+00:01:57,329 --> 00:02:04,095
+And we go on to the next example, xi plus 1, yi plus 1, and so on, right?
+然后再转向下一个样本 (x(i+1,) y(i+1))
+
+22
+00:02:04,095 --> 00:02:05,880
+That's what stochastic gradient descent does.
+随机梯度下降就是这样进行的
+
+23
+00:02:05,880 --> 00:02:15,024
+So, while the algorithm is looking at the example xi, yi, but before it has updated the parameters theta
+在算法扫描到样本(x(i),y(i)) 但在更新参数θ之前
+
+24
+00:02:15,024 --> 00:02:20,255
+using that an example, let's compute the cost of that example.
+使用这个样本 我们可以算出这个样本对应的cost函数
+
+25
+00:02:20,255 --> 00:02:23,577
+Just to say the same thing again, but using slightly different words.
+我再换一种方式表达一遍
+
+26
+00:02:23,577 --> 00:02:33,294
+A stochastic gradient descent is scanning through our training set right before we have updated theta using a specific training example x(i) comma y(i)
+当随机梯度下降法对训练集进行扫描时 在我们使用某个样本(x(i),y(i))来更新θ前
+
+27
+00:02:33,294 --> 00:02:38,198
+let's compute how well our hypothesis is doing on that training example.
+让我们来计算出 这个假设对这个训练样本的表现
+
+28
+00:02:38,198 --> 00:02:43,852
+And we want to do this before updating theta because if we've just updated theta using example,
+我要在更新θ前来完成这一步 原因是如果我们用这个样本更新θ以后
+
+29
+00:02:43,852 --> 00:02:49,061
+you know, that it might be doing better on that example than what would be representative.
+再让它在这个训练样本上预测 其表现就比实际上要更好了
+
+30
+00:02:49,061 --> 00:02:57,438
+Finally, in order to check for the convergence of stochastic gradient descent, what we can do is every, say, every thousand iterations,
+最后 为了检查随机梯度下降的收敛性 我们要做的是 每1000次迭代
+
+31
+00:02:57,438 --> 00:03:01,511
+we can plot these costs that we've been computing in the previous step.
+我们可以画出前一步中计算出的cost函数
+
+32
+00:03:01,511 --> 00:03:07,450
+We can plot those costs average over, say, the last thousand examples processed by the algorithm.
+我们把这些cost函数画出来 并对算法处理的最后1000个样本的cost值求平均值
+
+33
+00:03:07,450 --> 00:03:12,714
+And if you do this, it kind of gives you a running estimate of how well the algorithm is doing.
+如果你这样做的话 它会很有效地帮你估计出
+
+34
+00:03:12,714 --> 00:03:17,049
+on, you know, the last 1000 training examples that your algorithm has seen.
+你的算法在最后1000个样本上的表现
+
+35
+00:03:17,049 --> 00:03:23,974
+So, in contrast to computing Jtrain periodically which needed to scan through the entire training set.
+所以 我们不需要时不时地计算Jtrain 那样的话需要所有的训练样本
+
+36
+00:03:23,974 --> 00:03:27,973
+With this other procedure, well, as part of stochastic gradient descent,
+随机梯度下降法的这个步骤
+
+37
+00:03:27,973 --> 00:03:32,965
+it doesn't cost much to compute these costs as well right before updating to parameter theta.
+只需要在每次更新θ之前进行 也并不需要太大的计算量
+
+38
+00:03:32,965 --> 00:03:40,276
+And all we're doing is every thousand integrations or so, we just average the last 1,000 costs that we computed and plot that.
+要做的就是 每1000次迭代运算中 我们对最后1000个样本的cost值求平均然后画出来
+
+39
+00:03:40,276 --> 00:03:47,537
+And by looking at those plots, this will allow us to check if stochastic gradient descent is converging.
+通过观察这些画出来的图 我们就能检查出随机梯度下降是否在收敛
+
+40
+00:03:47,537 --> 00:03:51,708
+So here are a few examples of what these plots might look like.
+这是几幅画出来的图的例子
+
+41
+00:03:51,708 --> 00:03:55,519
+Suppose you have plotted the cost average over the last thousand examples,
+假如你已经画出了最后1000组样本的cost函数的平均值
+
+42
+00:03:55,519 --> 00:04:01,073
+because these are averaged over just a thousand examples, they are going to be a little bit noisy and so,
+由于它们都只是最后1000组样本的平均值 因此它们看起来会比较异常(noisy)
+
+43
+00:04:01,073 --> 00:04:03,873
+it may not decrease on every single iteration.
+因此cost的值不会在每一个迭代中都下降
+
+44
+00:04:03,873 --> 00:04:07,828
+Then if you get a figure that looks like this, So the plot is noisy
+假如你得到一种这样的图像 看起来是有噪声的
+
+45
+00:04:07,828 --> 00:04:11,721
+because it's average over, you know, just a small subset, say a thousand training examples.
+因为它是在一小部分样本 比如1000组样本中求的平均值
+
+46
+00:04:11,721 --> 00:04:17,283
+If you get a figure that looks like this, you know that would be a pretty decent run with the algorithm,
+如果你得到像这样的图 那么你应该判断这个算法是在下降的
+
+47
+00:04:17,283 --> 00:04:24,195
+maybe, where it looks like the cost has gone down and then this plateau that looks kind of flattened out, you know, starting from around that point.
+看起来代价值在下降 然后从大概这个点开始变得平缓
+
+48
+00:04:24,195 --> 00:04:29,603
+look like, this is what your cost looks like then maybe your learning algorithm has converged.
+这就是代价函数的大致走向 这基本说明你的学习算法已经收敛了
+
+49
+00:04:29,603 --> 00:04:34,252
+If you want to try using a smaller learning rate, something you might see is that
+如果你想试试更小的学习速率 那么你很有可能看到的是
+
+50
+00:04:34,252 --> 00:04:39,229
+the algorithm may initially learn more slowly so the cost goes down more slowly.
+算法的学习变得更慢了 因此代价函数的下降也变慢了
+
+51
+00:04:39,229 --> 00:04:47,585
+But then eventually you have a smaller learning rate is actually possible for the algorithm to end up at a, maybe very slightly better solution.
+但由于你使用了更小的学习速率 你很有可能会让算法收敛到一个可能更好的解
+
+52
+00:04:47,585 --> 00:04:53,426
+So the red line may represent the behavior of stochastic gradient descent using a slower, using a smaller leaning rate.
+红色的曲线代表随机梯度下降使用一个更小的学习速率
+
+53
+00:04:53,426 --> 00:05:00,594
+And the reason this is the case is because, you remember, stochastic gradient descent doesn't just converge to the global minimum,
+出现这种情况是因为 别忘了 随机梯度下降不是直接收敛到全局最小值
+
+54
+00:05:00,594 --> 00:05:05,068
+is that what it does is the parameters will oscillate a bit around the global minimum.
+而是在局部最小附近反复振荡
+
+55
+00:05:05,068 --> 00:05:09,231
+And so by using a smaller learning rate, you'll end up with smaller oscillations.
+所以使用一个更小的学习速率 最终的振荡就会更小
+
+56
+00:05:09,231 --> 00:05:12,896
+And sometimes this little difference will be negligible
+有时候这一点小的区别可以忽略
+
+57
+00:05:12,896 --> 00:05:19,686
+and sometimes with a smaller than you can get a slightly better value for the parameters.
+但有时候一点小的区别 你就会得到更好一点的参数
+
+58
+00:05:19,686 --> 00:05:22,269
+Here are some other things that might happen.
+接下来再看几种其他的情况
+
+59
+00:05:22,269 --> 00:05:27,986
+Let's say you run stochastic gradient descent and you average over a thousand examples when plotting these costs.
+假如你还是运行随机梯度下降 然后对1000组样本取cost函数的平均值 并且画出图像
+
+60
+00:05:27,986 --> 00:05:32,369
+So, you know, here might be the result of another one of these plots.
+那么这是另一种可能的图形
+
+61
+00:05:32,369 --> 00:05:34,353
+Then again, it kind of looks like it's converged.
+看起来这样还是已经收敛了
+
+62
+00:05:34,353 --> 00:05:42,119
+If you were to take this number, a thousand, and increase to averaging over 5 thousand examples.
+如果你把这个数 1000 提高到5000组样本
+
+63
+00:05:42,119 --> 00:05:47,913
+Then it's possible that you might get a smoother curve that looks more like this.
+那么可能你会得到一条更平滑的曲线
+
+64
+00:05:47,913 --> 00:05:56,547
+And by averaging over, say 5,000 examples instead of 1,000, you might be able to get a smoother curve like this.
+通过在5000个样本中求平均值 你会得到比刚才1000组样本更平滑的曲线
+
+65
+00:05:56,547 --> 00:06:00,248
+And so that's the effect of increasing the number of examples you average over.
+这是你增大平均的训练样本数的情形
+
+66
+00:06:00,248 --> 00:06:06,229
+The disadvantage of making this too big of course is that now you get one date point only every 5,000 examples.
+当然它的缺点就是 现在你的一个数据点都是5000组样本的平均结果
+
+67
+00:06:06,229 --> 00:06:12,001
+And so the feedback you get on how well your learning learning algorithm is doing is, sort of, maybe it's more delayed
+因此你所得到的关于学习算法表现的反馈 就显得有一些“延迟”
+
+68
+00:06:12,001 --> 00:06:17,681
+because you get one data point on your plot only every 5,000 examples rather than every 1,000 examples.
+因为你的每一个数据点是从5000个训练样本中得到的 而不是1000个样本
+
+69
+00:06:17,681 --> 00:06:23,911
+Along a similar vein some times you may run a gradient descent and end up with a plot that looks like this.
+沿着相似的脉络 有时候你运行梯度下降 可能也会得到这样的图像
+
+70
+00:06:23,911 --> 00:06:32,079
+And with a plot that looks like this, you know, it looks like the cost just is not decreasing at all.
+如果出现这种情况 你要知道 可能你的代价函数就没有在减小了
+
+71
+00:06:32,079 --> 00:06:34,023
+It looks like the algorithm is just not learning.
+也就是说 算法没有很好地学习
+
+72
+00:06:34,023 --> 00:06:39,261
+It's just, looks like this here a flat curve and the cost is just not decreasing.
+因为这看起来一直比较平坦 代价项并没有下降
+
+73
+00:06:39,261 --> 00:06:46,260
+But again if you were to increase this to averaging over a larger number of examples
+但同样地 如果你对这种情况时 也用更大量的样本进行平均
+
+74
+00:06:46,260 --> 00:06:49,729
+it is possible that you see something like this red line
+你很可能会观察到红线所示的情况
+
+75
+00:06:49,729 --> 00:06:55,127
+it looks like the cost actually is decreasing, it's just that the blue line averaging over 2, 3 examples,
+能看得出 实际上代价函数是在下降的 只不过蓝线用来平均的样本数量太小了
+
+76
+00:06:55,127 --> 00:07:01,374
+the blue line was too noisy so you couldn't see the actual trend in the cost actually decreasing
+并且蓝线太嘈杂 你看不出来一个确切的趋势代价是不是在下降
+
+77
+00:07:01,374 --> 00:07:06,688
+and possibly averaging over 5,000 examples instead of 1,000 may help.
+所以可能用5000组样本来平均 比用1000组样本来平均 更能看出趋势
+
+78
+00:07:06,688 --> 00:07:12,358
+Of course we averaged over a larger number examples that we've averaged here over 5,000 examples,
+当然 即使是使用一个较大的样本数量 比如我们用5000个样本来平均
+
+79
+00:07:12,358 --> 00:07:16,998
+I'm just using a different color, it is also possible that you that see a learning curve ends up looking like this.
+我用另一种颜色来表示 即使如此 你还是可能会发现 这条学习曲线是这样的
+
+80
+00:07:16,998 --> 00:07:21,197
+That it's still flat even when you average over a larger number of examples.
+它还是比较平坦 即使你用更多的训练样本
+
+81
+00:07:21,197 --> 00:07:25,908
+And as you get that, then that's maybe just a more firm verification that
+如果是这样的话 那可能就更肯定地说明
+
+82
+00:07:25,908 --> 00:07:29,287
+unfortunately the algorithm just isn't learning much for whatever reason.
+不知道出于什么原因 算法确实没怎么学习好
+
+83
+00:07:29,287 --> 00:07:34,969
+And you need to either change the learning rate or change the features or change something else about the algorithm.
+那么你就需要调整学习速率 或者改变特征量 或者改变其他的什么
+
+84
+00:07:34,969 --> 00:07:39,235
+Finally, one last thing that you might see would be if you were to plot these curves
+最后一种你可能会遇到的情况是 如果你画出曲线
+
+85
+00:07:39,235 --> 00:07:43,273
+and you see a curve that looks like this, where it actually looks like it's increasing.
+你会发现曲线是这样的 实际上是在上升
+
+86
+00:07:43,273 --> 00:07:48,066
+And if that's the case then this is a sign that the algorithm is diverging.
+这是一个很明显的信号 告诉你算法正在发散
+
+87
+00:07:48,066 --> 00:07:53,965
+And what you really should do is use a smaller value of the learning rate alpha.
+那么你要做的事 就是用一个更小一点的学习速率值
+
+88
+00:07:53,965 --> 00:07:58,143
+So hopefully this gives you a sense of the range of phenomena you might see
+好的 希望通过这几幅图 你能了解到
+
+89
+00:07:58,143 --> 00:08:02,946
+when you plot these cost average over some range of examples as well as
+当你画出cost函数在某个范围的训练样本中求平均值时 各种可能出现的现象
+
+90
+00:08:02,946 --> 00:08:07,765
+suggests the sorts of things you might try to do in response to seeing different plots.
+也告诉你 在遇到不同的情况时 应该采取怎样的措施
+
+91
+00:08:07,765 --> 00:08:15,070
+So if the plots looks too noisy, or if it wiggles up and down too much, then try increasing the number of examples
+所以如果曲线看起来噪声较大 或者老是上下振动
+
+92
+00:08:15,070 --> 00:08:18,734
+you're averaging over so you can see the overall trend in the plot better.
+那就试试增大你要平均的样本数量 这样应该就能得到比较好的变化趋势
+
+93
+00:08:18,734 --> 00:08:25,836
+And if you see that the errors are actually increasing, the costs are actually increasing, try using a smaller value of alpha.
+如果你发现代价值在上升 那么就换一个小一点的α值
+
+94
+00:08:25,836 --> 00:08:31,649
+Finally, it's worth examining the issue of the learning rate just a little bit more.
+最后还需要再说一下关于学习速率的问题
+
+95
+00:08:31,649 --> 00:08:38,922
+We saw that when we run stochastic gradient descent, the algorithm will start here and sort of meander towards the minimum
+我们已经知道 当运行随机梯度下降时 算法会从某个点开始 然后曲折地逼近最小值
+
+96
+00:08:38,922 --> 00:08:43,494
+And then it won't really converge, and instead it'll wander around the minimum forever.
+但它不会真的收敛 而是一直在最小值附近徘徊
+
+97
+00:08:43,494 --> 00:08:50,225
+And so you end up with a parameter value that is hopefully close to the global minimum that won't be exact at the global minimum.
+因此你最终得到的参数 实际上只是满足接近全局最小值 而不是真正的全局最小值
+
+98
+00:08:50,225 --> 00:08:57,991
+In most typical implementations of stochastic gradient descent, the learning rate alpha is typically held constant.
+在大多数随机梯度下降法的典型应用中 学习速率α一般是保持不变的
+
+99
+00:08:57,991 --> 00:09:02,022
+And so what you we end up is exactly a picture like this.
+因此你最终得到的结果一般来说是这个样子的
+
+100
+00:09:02,022 --> 00:09:06,523
+If you want stochastic gradient descent to actually converge to the global minimum,
+如果你想让随机梯度下降确实收敛到全局最小值
+
+101
+00:09:06,523 --> 00:09:11,825
+there's one thing which you can do which is you can slowly decrease the learning rate alpha over time.
+你可以随时间的变化减小学习速率α的值
+
+102
+00:09:11,825 --> 00:09:22,240
+So, a pretty typical way of doing that would be to set alpha equals some constant 1 divided by iteration number plus constant 2.
+所以 一种典型的方法来设置α的值 是让α等于某个常数1 除以 迭代次数加某个常数2
+
+103
+00:09:22,240 --> 00:09:28,169
+So, iteration number is the number of iterations you've run of stochastic gradient descent,
+迭代次数指的是你运行随机梯度下降的迭代次数
+
+104
+00:09:28,169 --> 00:09:29,519
+so it's really the number of training examples you've seen
+就是你算过的训练样本的数量
+
+105
+00:09:29,519 --> 00:09:34,103
+And const 1 and const 2 are additional parameters of the algorithm
+常数1和常数2是两个额外的参数
+
+106
+00:09:34,103 --> 00:09:38,160
+that you might have to play with a bit in order to get good performance.
+你需要选择一下 才能得到较好的表现
+
+107
+00:09:38,160 --> 00:09:43,004
+One of the reasons people tend not to do this is because you end up needing to spend time
+但很多人不愿意用这个办法的原因是 你最后会把问题落实到
+
+108
+00:09:43,004 --> 00:09:48,122
+playing with these 2 extra parameters, constant 1 and constant 2, and so this makes the algorithm more finicky.
+把时间花在确定常数1和常数2上 这让算法显得更苛刻
+
+109
+00:09:48,122 --> 00:09:52,113
+You know, it's just more parameters able to fiddle with in order to make the algorithm work well.
+也就是说 为了让算法更好 你要调整更多的参数
+
+110
+00:09:52,113 --> 00:09:57,246
+But if you manage to tune the parameters well, then the picture you can get is that
+但如果你能调整得到比较好的参数的话 你会得到的图形是
+
+111
+00:09:57,246 --> 00:10:02,834
+the algorithm will actually around towards the minimum, but as it gets closer
+你的算法会在最小值附近振荡 但当它越来越靠近最小值的时候
+
+112
+00:10:02,834 --> 00:10:07,024
+because you're decreasing the learning rate the meanderings will get smaller and smaller
+由于你减小了学习速率 因此这个振荡也会越来越小
+
+113
+00:10:07,024 --> 00:10:12,729
+until it pretty much just to the global minimum. I hope this makes sense, right?
+直到落到几乎靠近全局最小的地方 我想这么说能听懂吧?
+
+114
+00:10:12,729 --> 00:10:21,608
+And the reason this formula makes sense is because as the algorithm runs, the iteration number becomes large So alpha will slowly become small,
+这个公式起作用的原因是 随着算法的运行 迭代次数会越来越大 因此学习速率α会慢慢变小
+
+115
+00:10:21,608 --> 00:10:27,506
+and so you take smaller and smaller steps until it hopefully converges to the global minimum.
+因此你的每一步就会越来越小 直到最终收敛到全局最小值
+
+116
+00:10:27,506 --> 00:10:33,484
+So If you do slowly decrease alpha to zero you can end up with a slightly better hypothesis.
+所以 如果你慢慢减小α的值到0 你会最后得到一个更好一点的假设
+
+117
+00:10:33,484 --> 00:10:40,078
+But because of the extra work needed to fiddle with the constants and because frankly usually we're pretty happy
+但由于确定这两个常数需要更多的工作量 并且我们通常也对
+
+118
+00:10:40,078 --> 00:10:43,892
+with any parameter value that is, you know, pretty close to the global minimum.
+能够很接近全局最小值的参数 已经很满意了
+
+119
+00:10:43,892 --> 00:10:50,863
+Typically this process of decreasing alpha slowly is usually not done and keeping the learning rate alpha constant
+因此我们很少采用逐渐减小α的值的方法 在随机梯度下降中
+
+120
+00:10:50,863 --> 00:10:56,983
+is the more common application of stochastic gradient descent although you will see people use either version.
+你看到更多的还是让α的值为常数 虽然两种做法的人都有
+
+121
+00:10:56,983 --> 00:11:03,595
+To summarize in this video we talk about a way for approximately monitoring
+总结一下 这段视频中 我们介绍了一种方法
+
+122
+00:11:03,595 --> 00:11:08,256
+how the stochastic gradient descent is doing in terms for optimizing the cost function.
+ 近似地监测出随机梯度下降算法在最优化代价函数中的表现
+
+123
+00:11:08,256 --> 00:11:17,043
+And this is a method that does not require scanning over the entire training set periodically to compute the cost function on the entire training set.
+这种方法不需要定时地扫描整个训练集 来算出整个样本集的代价函数
+
+124
+00:11:17,043 --> 00:11:20,693
+But instead it looks at say only the last thousand examples or so.
+而是只需要对最后1000个 或者多少个样本 进行一个平均
+
+125
+00:11:20,693 --> 00:11:27,592
+And you can use this method both to make sure the stochastic gradient descent is okay and is converging
+应用这种方法 你既可以保证随机梯度下降法正在正常运转和收敛
+
+126
+00:11:27,592 --> 00:11:31,468
+or to use it to tune the learning rate alpha.
+也可以用它来调整学习速率α的大小
+
diff --git a/srt/17 - 5 - Online Learning (13 min).srt b/srt/17 - 5 - Online Learning (13 min).srt
new file mode 100644
index 00000000..3d659e98
--- /dev/null
+++ b/srt/17 - 5 - Online Learning (13 min).srt
@@ -0,0 +1,2025 @@
+1
+00:00:00,109 --> 00:00:02,030
+In this video, I'd like to talk
+在这个视频中 我将会
+
+2
+00:00:02,030 --> 00:00:03,738
+about a new large-scale
+讨论一种新的大规模的
+
+3
+00:00:03,738 --> 00:00:05,369
+machine learning setting called
+机器学习机制 叫做
+
+4
+00:00:05,369 --> 00:00:07,073
+the online learning setting.
+在线学习机制
+
+5
+00:00:07,442 --> 00:00:08,731
+The online learning setting
+在线学习机制
+
+6
+00:00:08,731 --> 00:00:10,659
+allows us to model problems
+让我们可以模型化问题
+
+7
+00:00:10,659 --> 00:00:12,074
+where we have a continuous flood
+在拥有连续一波数据
+
+8
+00:00:12,074 --> 00:00:14,064
+or a continuous stream of data
+或连续的数据流涌进来
+
+9
+00:00:14,064 --> 00:00:15,906
+coming in and we would like
+而我们又需要
+
+10
+00:00:15,906 --> 00:00:17,839
+an algorithm to learn from that.
+一个算法来从中学习的时候来模型化问题
+
+11
+00:00:18,762 --> 00:00:20,759
+Today, many of the largest
+今天 许多大型网站
+
+12
+00:00:20,759 --> 00:00:22,245
+websites, or many of the largest
+或者许多大型网络公司
+
+13
+00:00:22,245 --> 00:00:24,335
+website companies use different
+使用不同版本的
+
+14
+00:00:24,335 --> 00:00:25,901
+versions of online learning
+在线学习机制算法
+
+15
+00:00:25,901 --> 00:00:28,102
+algorithms to learn from
+从大批的涌入
+
+16
+00:00:28,117 --> 00:00:29,468
+the flood of users that keep
+又离开网站的用户身上
+
+17
+00:00:29,468 --> 00:00:31,370
+on coming to, back to the website.
+进行学习
+
+18
+00:00:31,370 --> 00:00:32,943
+Specifically, if you have
+特别要提及的是 如果你有
+
+19
+00:00:32,943 --> 00:00:34,992
+a continuous stream of data
+一个由连续的用户流引发的
+
+20
+00:00:34,992 --> 00:00:36,371
+generated by a continuous
+连续的数据流
+
+21
+00:00:36,371 --> 00:00:37,703
+stream of users coming to
+进入
+
+22
+00:00:37,703 --> 00:00:39,413
+your website, what you can
+你的网站
+
+23
+00:00:39,413 --> 00:00:40,844
+do is sometimes use an
+你能做的是使用一个
+
+24
+00:00:40,844 --> 00:00:42,632
+online learning algorithm to learn
+在线学习机制 从数据流中学习
+
+25
+00:00:42,632 --> 00:00:44,492
+user preferences from the
+用户的偏好
+
+26
+00:00:44,492 --> 00:00:46,324
+stream of data and use that
+然后使用这些信息
+
+27
+00:00:46,324 --> 00:00:47,470
+to optimize some of the
+来优化一些
+
+28
+00:00:47,470 --> 00:00:49,632
+decisions on your website.
+关于网站的决策
+
+29
+00:00:52,063 --> 00:00:54,506
+Suppose you run a shipping service,
+假定你有一个提供运输服务的公司
+
+30
+00:00:54,506 --> 00:00:56,163
+so, you know, users come and ask
+所以你知道 用户们来向你询问
+
+31
+00:00:56,163 --> 00:00:57,307
+you to help ship their package from
+把包裹从A地
+
+32
+00:00:57,307 --> 00:01:01,533
+location A to location B and suppose
+运到B地的服务
+
+33
+00:01:01,533 --> 00:01:02,717
+you run a website, where users
+同时假定你有一个网站
+
+34
+00:01:02,717 --> 00:01:04,110
+repeatedly come and they
+让用户们可多次登陆
+
+35
+00:01:04,110 --> 00:01:05,689
+tell you where they want
+然后他们告诉你
+
+36
+00:01:05,689 --> 00:01:07,291
+to send the package from, and
+他们想从哪里寄出包裹 以及
+
+37
+00:01:07,291 --> 00:01:08,523
+where they want to send it to
+包裹要寄到哪里去
+
+38
+00:01:08,523 --> 00:01:10,947
+(so the origin and destination) and
+也就是出发地与目的地
+
+39
+00:01:10,947 --> 00:01:12,748
+your website offers to ship the package
+然后你的网站开出运输包裹的
+
+40
+00:01:12,748 --> 00:01:14,515
+for some asking price,
+的服务价格
+
+41
+00:01:14,515 --> 00:01:16,092
+so I'll ship your package for $50,
+比如 我会收取$50来运输你的包裹
+
+42
+00:01:16,092 --> 00:01:17,926
+I'll ship it for $20.
+我会收取$20之类的
+
+43
+00:01:17,926 --> 00:01:19,343
+And based on the price
+然后根据
+
+44
+00:01:19,343 --> 00:01:20,922
+that you offer to the users,
+你开给用户的这个价格
+
+45
+00:01:20,922 --> 00:01:23,522
+the users sometimes chose to use a shipping service;
+用户有时会接受这个运输服务
+
+46
+00:01:23,522 --> 00:01:25,891
+that's a positive example and
+那么这就是个正样本
+
+47
+00:01:25,891 --> 00:01:28,168
+sometimes they go away and
+有时他们会走掉
+
+48
+00:01:28,168 --> 00:01:29,722
+they do not choose to
+然后他们拒绝
+
+49
+00:01:29,722 --> 00:01:31,719
+purchase your shipping service.
+购买你的运输服务
+
+50
+00:01:31,719 --> 00:01:34,552
+So let's say that we want
+所以 让我们假定我们想要一个
+
+51
+00:01:34,552 --> 00:01:36,386
+a learning algorithm to help us
+学习算法来帮助我们
+
+52
+00:01:36,386 --> 00:01:38,499
+to optimize what is the asking
+优化我们想给用户
+
+53
+00:01:38,499 --> 00:01:41,680
+price that we want to offer to our users.
+开出的价格
+
+54
+00:01:41,680 --> 00:01:43,724
+And specifically, let's say we
+而且特别的是 我们假定
+
+55
+00:01:43,724 --> 00:01:44,908
+come up with some sort of features
+我们找到了一些
+
+56
+00:01:44,908 --> 00:01:46,510
+that capture properties of the users.
+获取用户特点的方法
+
+57
+00:01:46,510 --> 00:01:49,376
+If we know anything about the demographics,
+如果我们知道一些用户的统计信息
+
+58
+00:01:49,376 --> 00:01:50,875
+they capture, you know, the origin and
+它们会获取 比如 包裹的起始地
+
+59
+00:01:50,875 --> 00:01:54,405
+destination of the package, where they want to ship the package.
+以及目的地 他们想把包裹运到哪里去
+
+60
+00:01:54,405 --> 00:01:55,635
+And what is the price
+以及我们提供给他们的
+
+61
+00:01:55,635 --> 00:01:57,911
+that we offer to them for shipping the package.
+运送包裹的价格
+
+62
+00:01:57,911 --> 00:01:59,931
+and what we want to do
+我们想要做的就是
+
+63
+00:01:59,931 --> 00:02:00,883
+is learn what is the
+学习
+
+64
+00:02:00,883 --> 00:02:02,439
+probability that they will
+在给出的价格下他们将会
+
+65
+00:02:02,439 --> 00:02:03,762
+elect to ship the
+选择
+
+66
+00:02:03,762 --> 00:02:05,457
+package, using our
+运输包裹的几率
+
+67
+00:02:05,457 --> 00:02:07,315
+shipping service given these features, and
+在已知用户特点的前提下 并且
+
+68
+00:02:07,315 --> 00:02:10,197
+again just as a reminder these
+我要再次指出
+
+69
+00:02:10,197 --> 00:02:14,121
+features X also captures the price that we're asking for.
+他们也同时获取了我们开出的价格
+
+70
+00:02:14,121 --> 00:02:15,790
+And so if we could
+所以如果我们可以
+
+71
+00:02:15,790 --> 00:02:17,486
+estimate the chance that they'll
+估计出用户选择
+
+72
+00:02:17,486 --> 00:02:19,629
+agree to use our service
+使用我们的服务时
+
+73
+00:02:19,629 --> 00:02:20,962
+for any given price, then we
+我们所开出的价格 那么我们
+
+74
+00:02:20,962 --> 00:02:21,967
+can try to pick
+可以试着去选择
+
+75
+00:02:21,967 --> 00:02:23,183
+a price so that they
+一个优化的价格 因而在这个价格下
+
+76
+00:02:23,183 --> 00:02:25,125
+have a pretty high probability of
+用户会有很大的可能性
+
+77
+00:02:25,125 --> 00:02:27,841
+choosing our website while simultaneously
+选择我们的网站
+
+78
+00:02:27,841 --> 00:02:29,188
+hopefully offering us a
+而且同时很有可能会提供给我们
+
+79
+00:02:29,188 --> 00:02:31,371
+fair return, offering us
+一个合适的回报 让我们
+
+80
+00:02:31,371 --> 00:02:34,293
+a fair profit for shipping their package.
+在提供运输服务时也能获得合适的利润
+
+81
+00:02:34,585 --> 00:02:36,489
+So if we can learn this property
+所以如果我们可以学习
+
+82
+00:02:36,489 --> 00:02:37,733
+of y equals 1 given
+y 等于 1 时的条件
+
+83
+00:02:37,733 --> 00:02:38,632
+any price and given the other
+在任何给定价格以及其他给定的
+
+84
+00:02:38,632 --> 00:02:39,660
+features we could really
+条件下y等于1的特征
+
+85
+00:02:39,660 --> 00:02:41,657
+use this to choose appropriate
+我们就真的可以利用这一些信息
+
+86
+00:02:41,657 --> 00:02:44,072
+prices as new users come to us.
+在新用户来的时候选择合适的价格
+
+87
+00:02:44,072 --> 00:02:45,907
+So in order to model
+所以为了
+
+88
+00:02:45,907 --> 00:02:47,277
+the probability of y equals 1,
+获得 y 等于 1 的概率的模型
+
+89
+00:02:47,277 --> 00:02:48,972
+what we can do is use
+我们能做的就是
+
+90
+00:02:48,972 --> 00:02:51,781
+logistic regression or neural
+用逻辑回归或者神经网络
+
+91
+00:02:51,781 --> 00:02:53,756
+network or some other algorithm like that.
+或者其他一些类似的算法
+
+92
+00:02:53,756 --> 00:02:55,889
+But let's start with logistic regression.
+但现在我们先来考虑逻辑回归
+
+93
+00:02:57,658 --> 00:02:59,583
+Now if you have a
+现在假定你有一个
+
+94
+00:02:59,583 --> 00:03:01,835
+website that just runs continuously,
+连续运行的网站
+
+95
+00:03:01,835 --> 00:03:05,342
+here's what an online learning algorithm would do.
+以下就是在线学习算法要做的
+
+96
+00:03:05,342 --> 00:03:07,478
+I'm gonna write repeat forever.
+我要写下"一直重复"
+
+97
+00:03:07,478 --> 00:03:09,730
+This just means that our website
+这只是代表着我们的网站
+
+98
+00:03:09,730 --> 00:03:11,170
+is going to, you know, keep on
+将会一直继续
+
+99
+00:03:11,170 --> 00:03:12,911
+staying up.
+保持在线学习
+
+100
+00:03:12,911 --> 00:03:14,351
+What happens on the website is
+这个网站将要发生的是
+
+101
+00:03:14,351 --> 00:03:16,465
+occasionally a user
+一个用户
+
+102
+00:03:16,465 --> 00:03:17,950
+will come and for
+偶然访问
+
+103
+00:03:17,950 --> 00:03:19,576
+the user that comes we'll get
+然后我们将会得到
+
+104
+00:03:19,576 --> 00:03:25,380
+some x,y pair corresponding to
+与其对应的一些(x,y)对
+
+105
+00:03:25,380 --> 00:03:29,096
+a customer or to a user on the website.
+这些(x,y)对是相对应于一个特定的客户或用户的
+
+106
+00:03:29,096 --> 00:03:30,884
+So the features x are, you
+所以特征 x 是指
+
+107
+00:03:30,884 --> 00:03:32,811
+know, the origin and destination specified
+客户所指定的起始地与目的地
+
+108
+00:03:32,811 --> 00:03:34,111
+by this user and the price
+以及
+
+109
+00:03:34,111 --> 00:03:35,358
+that we happened to offer to
+我们这一次提供
+
+110
+00:03:35,358 --> 00:03:37,292
+them this time around, and
+给客户的价格
+
+111
+00:03:37,292 --> 00:03:38,430
+y is either one or
+而y则取1或0
+
+112
+00:03:38,430 --> 00:03:40,148
+zero depending one whether or
+y值取决于
+
+113
+00:03:40,148 --> 00:03:41,518
+not they chose to
+客户是否选择了
+
+114
+00:03:41,518 --> 00:03:43,980
+use our shipping service.
+使用我们的运输服务
+
+115
+00:03:43,980 --> 00:03:45,419
+Now once we get this {x,y}
+现在我们一旦获得了这个{x,y}数据对
+
+116
+00:03:45,419 --> 00:03:46,813
+pair, what an online
+在线学习算法
+
+117
+00:03:46,813 --> 00:03:48,391
+learning algorithm does is then
+要做的就是
+
+118
+00:03:48,391 --> 00:03:50,690
+update the parameters theta
+更新参数θ
+
+119
+00:03:50,690 --> 00:03:54,011
+using just this example
+利用刚得到的(x,y)数据对来更新θ
+
+120
+00:03:54,011 --> 00:03:57,726
+x,y, and in particular
+具体来说
+
+121
+00:03:57,726 --> 00:03:59,839
+we would update my parameters theta
+我们将这样更新我们的参数θ
+
+122
+00:03:59,839 --> 00:04:01,842
+as Theta j get updated as Theta j
+θj 将会被更新为
+
+123
+00:04:01,842 --> 00:04:06,619
+minus the learning rate alpha times
+θj 减去学习率 α 乘以
+
+124
+00:04:06,619 --> 00:04:11,356
+my usual gradient descent
+梯度下降
+
+125
+00:04:11,356 --> 00:04:13,399
+rule for logistic regression.
+来做逻辑回归
+
+126
+00:04:13,399 --> 00:04:14,491
+So we do this for j
+然后我们对j等于0到n
+
+127
+00:04:14,491 --> 00:04:15,652
+equals zero up to n,
+重复这个步骤
+
+128
+00:04:15,652 --> 00:04:19,088
+and that's my close curly brace.
+这是我的另一边花括号
+
+129
+00:04:19,088 --> 00:04:21,218
+So, for other learning algorithms
+所以对于其他的学习算法
+
+130
+00:04:21,218 --> 00:04:22,873
+instead of writing X-Y, right, I
+不是写(x,y)对 对吧
+
+131
+00:04:22,873 --> 00:04:24,011
+was writing things like Xi,
+我之前写的是
+
+132
+00:04:24,011 --> 00:04:26,495
+Yi but
+(x(i),y(i)) 一样的数据对
+
+133
+00:04:26,495 --> 00:04:27,842
+in this online learning setting
+但在这个在线学习机制中
+
+134
+00:04:27,842 --> 00:04:29,723
+where actually discarding the notion
+我们实际上丢弃了
+
+135
+00:04:29,723 --> 00:04:31,464
+of there being a fixed training
+获取一个固定的数据集这样的概念
+
+136
+00:04:31,464 --> 00:04:32,904
+set instead we have an algorithm.
+取而代之的是 我们拥有一个算法
+
+137
+00:04:32,904 --> 00:04:34,924
+Now what happens as we get
+现在 当我们
+
+138
+00:04:34,924 --> 00:04:37,014
+an example and then we
+获取一个样本 然后我们
+
+139
+00:04:37,014 --> 00:04:38,825
+learn using that example like
+利用那个样本获取信息学习
+
+140
+00:04:38,825 --> 00:04:41,031
+so and then we throw that example away.
+然后我们丢弃这个样本
+
+141
+00:04:41,031 --> 00:04:43,098
+We discard that example and we
+我们丢弃那个样本 而且我们
+
+142
+00:04:43,098 --> 00:04:45,141
+never use it again and
+永远不会再使用它
+
+143
+00:04:45,141 --> 00:04:47,161
+so that's why we just look at one example at a time.
+这就是为什么我们在一个时间点只会处理一个样本的原因
+
+144
+00:04:47,161 --> 00:04:48,879
+We learn from that example.
+我们从样本中学习
+
+145
+00:04:48,879 --> 00:04:50,412
+We discard it.
+我们再丢弃它
+
+146
+00:04:50,412 --> 00:04:51,527
+Which is why, you know, we're
+这也就是为什么
+
+147
+00:04:51,527 --> 00:04:52,943
+also doing away with this
+我们放弃了一种拥有
+
+148
+00:04:52,943 --> 00:04:54,615
+notion of there being this
+我们放弃了一种拥有
+
+149
+00:04:54,615 --> 00:04:58,191
+sort of fixed training set indexed by i.
+固定的 由 i 来作参数的数据集的表示方法
+
+150
+00:04:58,191 --> 00:04:59,328
+And, if you really run
+而且 如果你真的运行
+
+151
+00:04:59,328 --> 00:05:01,488
+a major website where you
+一个大型网站
+
+152
+00:05:01,488 --> 00:05:03,624
+really have a continuous stream
+在这个网站里你有一个连续的
+
+153
+00:05:03,624 --> 00:05:05,737
+of users coming, then this
+用户流登陆网站 那么
+
+154
+00:05:05,737 --> 00:05:07,525
+sort of online learning algorithm
+这种在线学习算法
+
+155
+00:05:07,525 --> 00:05:10,358
+is actually a pretty reasonable algorithm.
+是一种非常合理的算法
+
+156
+00:05:10,358 --> 00:05:12,076
+Because of data is essentially
+因为数据本质上是自由的
+
+157
+00:05:12,076 --> 00:05:13,330
+free if you have so
+如果你有如此多的数据
+
+158
+00:05:13,330 --> 00:05:14,979
+much data, that data
+而数据
+
+159
+00:05:14,979 --> 00:05:17,022
+is essentially unlimited then there
+本质上是无限的 那么
+
+160
+00:05:17,022 --> 00:05:17,997
+is really may be no
+或许就真的没必要
+
+161
+00:05:17,997 --> 00:05:18,949
+need to look at a
+重复处理
+
+162
+00:05:18,949 --> 00:05:21,527
+training example more than once.
+一个样本
+
+163
+00:05:21,527 --> 00:05:22,432
+Of course if we had only
+当然 如果我们只有
+
+164
+00:05:22,432 --> 00:05:24,220
+a small number of users then
+少量的用户
+
+165
+00:05:24,220 --> 00:05:26,333
+rather than using an online learning
+那么我们就不选择像这样的在线学习算法
+
+166
+00:05:26,333 --> 00:05:27,912
+algorithm like this, you might
+你可能最好是要
+
+167
+00:05:27,912 --> 00:05:29,421
+be better off saving away all
+保存好所有的
+
+168
+00:05:29,421 --> 00:05:30,884
+your data in a fixed training
+数据 保存在一个固定的
+
+169
+00:05:30,884 --> 00:05:34,042
+set and then running some algorithm over that training set.
+数据集里 然后对这个数据集使用某种算法
+
+170
+00:05:34,042 --> 00:05:35,018
+But if you really have a continuous
+但是 如果你确实有一个连续的
+
+171
+00:05:35,018 --> 00:05:36,341
+stream of data, then an
+数据流 那么一个
+
+172
+00:05:36,341 --> 00:05:39,881
+online learning algorithm can be very effective.
+在线学习机制会非常的有效
+
+173
+00:05:39,881 --> 00:05:41,171
+I should mention also that one
+我也必须要提到一个
+
+174
+00:05:41,171 --> 00:05:43,015
+interesting effect of this sort
+这种在线学习算法
+
+175
+00:05:43,015 --> 00:05:44,073
+of online learning algorithm is
+会带来的有趣的效果 那就是
+
+176
+00:05:44,073 --> 00:05:49,391
+that it can adapt to changing user preferences.
+它可以对正在变化的用户偏好进行调适
+
+177
+00:05:51,006 --> 00:05:54,592
+And in particular, if over
+而且特别的 如果
+
+178
+00:05:54,592 --> 00:05:55,776
+time because of changes in
+随着时间变化 因为
+
+179
+00:05:55,776 --> 00:05:58,377
+the economy maybe users
+大的经济环境发生变化 用户们可能会
+
+180
+00:05:58,377 --> 00:05:59,957
+start to become more price
+开始变得对价格更敏感
+
+181
+00:05:59,957 --> 00:06:01,395
+sensitive and willing to pay,
+然后愿意支付
+
+182
+00:06:01,395 --> 00:06:03,717
+you know, less willing to pay high prices.
+你知道的 不那么愿意支付高的费用
+
+183
+00:06:03,717 --> 00:06:06,527
+Or if they become less price sensitive and they're willing to pay higher prices.
+也有可能他们变得对价格不那么敏感 然后他们愿意支付更高的价格
+
+184
+00:06:06,527 --> 00:06:08,292
+Or if different things
+又或者各种因素
+
+185
+00:06:08,292 --> 00:06:10,451
+become more important to users,
+变得对用户的影响更大了
+
+186
+00:06:10,451 --> 00:06:11,496
+if you start to have new
+如果你开始拥有
+
+187
+00:06:11,496 --> 00:06:12,587
+types of users coming to your website.
+某一种新的类型的用户涌入你的网站
+
+188
+00:06:12,587 --> 00:06:14,933
+This sort of online learning algorithm
+这样的在线学习算法
+
+189
+00:06:14,933 --> 00:06:17,278
+can also adapt to changing
+也可以根据变化着的
+
+190
+00:06:17,278 --> 00:06:18,950
+user preferences and kind
+用户偏好进行调适
+
+191
+00:06:18,950 --> 00:06:20,157
+of keep track of what your
+而且从某种程度上可以跟进
+
+192
+00:06:20,157 --> 00:06:21,991
+changing population of users
+变化着的用户群体所愿意
+
+193
+00:06:21,991 --> 00:06:24,685
+may be willing to pay for.
+支付的价格
+
+194
+00:06:24,685 --> 00:06:26,171
+And it does that because if
+而且 在线学习算法有这样的作用是因为
+
+195
+00:06:26,171 --> 00:06:28,168
+your pool of users changes,
+如果你的用户群变化了
+
+196
+00:06:28,168 --> 00:06:29,793
+then these updates to your
+那么参数θ的变化与更新
+
+197
+00:06:29,793 --> 00:06:31,953
+parameters theta will just slowly adapt
+会逐渐调适到
+
+198
+00:06:31,953 --> 00:06:33,555
+your parameters to whatever your
+你最新的用户群所应该体现出来的
+
+199
+00:06:33,555 --> 00:06:36,599
+latest pool of users looks like.
+参数
+
+200
+00:06:36,599 --> 00:06:37,781
+Here's another example of a
+这里有另一个
+
+201
+00:06:37,781 --> 00:06:40,753
+sort of application to which you might apply online learning.
+你可能会想要使用在线学习的例子
+
+202
+00:06:40,753 --> 00:06:43,472
+this is an application in product
+这是一个对于产品搜索的应用
+
+203
+00:06:43,472 --> 00:06:44,701
+search in which we want to
+在这个应用中 我们想要
+
+204
+00:06:44,701 --> 00:06:46,117
+apply learning algorithm to learn
+使用一种学习机制来学习如何
+
+205
+00:06:46,117 --> 00:06:48,973
+to give good search listings to a user.
+反馈给用户好的搜索列表
+
+206
+00:06:48,973 --> 00:06:51,156
+Let's say you run an online
+举个例子说 你有一个在线
+
+207
+00:06:51,156 --> 00:06:53,083
+store that sells phones - that
+卖电话的商铺
+
+208
+00:06:53,083 --> 00:06:55,312
+sells mobile phones or sells cell phones.
+一个卖移动电话或者手机的商铺
+
+209
+00:06:55,312 --> 00:06:56,682
+And you have a user interface
+而且你有一个用户界面
+
+210
+00:06:56,682 --> 00:06:58,284
+where a user can come to
+可以让用户登陆你的网站
+
+211
+00:06:58,284 --> 00:06:59,445
+your website and type in the
+并且键入一个
+
+212
+00:06:59,445 --> 00:07:02,626
+query like "Android phone 1080p camera".
+搜索条目 例如“安卓 手机 1080p 摄像头”
+
+213
+00:07:02,626 --> 00:07:03,509
+So 1080p is a type
+那么1080p 是指一个
+
+214
+00:07:03,509 --> 00:07:04,623
+of a specification for a
+对应于摄像头的
+
+215
+00:07:04,623 --> 00:07:05,808
+video camera that you might
+手机参数 这个参数可以出现在
+
+216
+00:07:05,808 --> 00:07:08,710
+have on a phone, a cell phone, a mobile phone.
+一部电话中 一个移动电话 或者一个手机中
+
+217
+00:07:08,710 --> 00:07:12,100
+Suppose, suppose we have a hundred phones in our store.
+假定 假定我们的商铺中有一百部电话
+
+218
+00:07:12,100 --> 00:07:13,354
+And because of the way our
+而且出于我们的网站设计
+
+219
+00:07:13,354 --> 00:07:15,321
+website is laid out, when
+当一个用户
+
+220
+00:07:15,321 --> 00:07:16,558
+a user types in a query,
+键入一个命令
+
+221
+00:07:16,558 --> 00:07:18,277
+if it was a search query, we
+如果这是一个搜索命令
+
+222
+00:07:18,277 --> 00:07:19,601
+would like to find a
+我们会想要找到一个
+
+223
+00:07:19,601 --> 00:07:20,900
+choice of ten different phones to
+合适的十部不同手机的列表
+
+224
+00:07:20,900 --> 00:07:22,921
+show what to offer to the user.
+来提供给用户
+
+225
+00:07:22,921 --> 00:07:24,987
+What we'd like to do is have
+我们想要做的是
+
+226
+00:07:24,987 --> 00:07:26,566
+a learning algorithm help us figure
+拥有一个在线学习机制来帮助我们
+
+227
+00:07:26,566 --> 00:07:28,447
+out what are the ten phones
+找到在这100部手机中
+
+228
+00:07:28,447 --> 00:07:29,771
+out of the 100 we
+哪十部手机
+
+229
+00:07:29,771 --> 00:07:31,791
+should return the user in response to
+是我们真正应该反馈给用户的
+
+230
+00:07:31,791 --> 00:07:34,531
+a user-search query like the one here.
+而且这个返回的列表是对类似这样的用户搜索条目最佳的回应
+
+231
+00:07:34,531 --> 00:07:36,695
+Here's how we can go about the problem.
+接下来要说的是一种解决问题的思路
+
+232
+00:07:37,218 --> 00:07:39,291
+For each phone and given
+对于每一个手机以及一个给定的
+
+233
+00:07:39,291 --> 00:07:41,311
+a specific user query; we
+用户搜索命令 我们
+
+234
+00:07:41,311 --> 00:07:44,120
+can construct a feature vector
+可以构建一个
+
+235
+00:07:44,120 --> 00:07:45,676
+X. So the feature
+特征矢量x 那么这个特征矢量x
+
+236
+00:07:45,676 --> 00:07:47,650
+vector X might capture different properties of the phone.
+可能会抓取手机的各种特点
+
+237
+00:07:47,650 --> 00:07:49,972
+It might capture things like,
+它可能会抓取类似于
+
+238
+00:07:49,972 --> 00:07:53,107
+how similar the user search query is in the phones.
+用户搜索命令与这部电话的类似程度有多高这样的信息
+
+239
+00:07:53,107 --> 00:07:54,059
+We capture things like how many
+我们获取类似于
+
+240
+00:07:54,059 --> 00:07:55,475
+words in the user search
+这个用户搜索命令中有多少个词
+
+241
+00:07:55,475 --> 00:07:56,172
+query match the name of
+可以与这部手机的名字相匹配
+
+242
+00:07:56,172 --> 00:07:57,356
+the phone, how many words
+或者这个搜索命令中有多少词
+
+243
+00:07:57,356 --> 00:08:01,303
+in the user search query match the description of the phone and so on.
+与这部手机的描述相匹配
+
+244
+00:08:01,303 --> 00:08:02,789
+So the features x capture
+所以特征矢量x获取
+
+245
+00:08:02,789 --> 00:08:03,672
+properties of the phone and
+手机的特点而且
+
+246
+00:08:03,672 --> 00:08:05,251
+it captures things about how
+它会获取
+
+247
+00:08:05,251 --> 00:08:06,412
+similar or how well
+这部手机与搜索命令
+
+248
+00:08:06,412 --> 00:08:10,591
+the phone matches the user query along different dimensions.
+的结果在各个方面的匹配程度
+
+249
+00:08:10,591 --> 00:08:11,868
+What we like to do is
+我们想要做的就是
+
+250
+00:08:11,868 --> 00:08:14,330
+estimate the probability that a
+估测一个概率 这个概率是指用户
+
+251
+00:08:14,330 --> 00:08:15,816
+user will click on the
+将会点进
+
+252
+00:08:15,816 --> 00:08:17,673
+link for a specific phone,
+某一个特定的手机的链接
+
+253
+00:08:17,673 --> 00:08:18,881
+because we want to show
+因为我们想要给用户展示
+
+254
+00:08:18,881 --> 00:08:20,065
+the user phones that they
+他们
+
+255
+00:08:20,065 --> 00:08:21,481
+are likely to want to
+想要买的手机
+
+256
+00:08:21,481 --> 00:08:22,921
+buy, want to show the user
+我们想要给用户提供
+
+257
+00:08:22,921 --> 00:08:24,082
+phones that they have high
+那些他们很可能
+
+258
+00:08:24,082 --> 00:08:27,240
+probability of clicking on in the web browser.
+在浏览器中点进去查看的手机
+
+259
+00:08:27,240 --> 00:08:29,562
+So I'm going to define y equals
+所以我将定义y等于1时
+
+260
+00:08:29,562 --> 00:08:30,676
+one if the user clicks on
+是指用户点击了
+
+261
+00:08:30,676 --> 00:08:31,930
+the link for a phone and
+手机的链接
+
+262
+00:08:31,930 --> 00:08:34,136
+y equals zero otherwise and
+而y等于0是指用户没有点击链接
+
+263
+00:08:34,136 --> 00:08:35,454
+what I would like to do is
+然后我们想要做的就是
+
+264
+00:08:35,454 --> 00:08:36,992
+learn the probability the user
+学习到用户
+
+265
+00:08:36,992 --> 00:08:38,246
+will click on a specific
+将会点击某一个背给出的特定的手机的概率
+
+266
+00:08:38,246 --> 00:08:39,802
+phone given, you know,
+你知道的
+
+267
+00:08:39,802 --> 00:08:41,693
+the features x, which capture properties
+特征X 获取了手机的特点
+
+268
+00:08:41,693 --> 00:08:43,819
+of the phone and how well the query matches the phone.
+以及搜索条目与手机的匹配程度
+
+269
+00:08:43,819 --> 00:08:45,700
+To give this problem a name
+如果要给这个问题命一个名
+
+270
+00:08:45,700 --> 00:08:47,720
+in the language of
+用一种运行这类网站的人们
+
+271
+00:08:47,720 --> 00:08:49,130
+people that run websites like
+所使用的语言来命名
+
+272
+00:08:49,130 --> 00:08:51,249
+this, the problem of learning this is
+这类学习问题
+
+273
+00:08:51,249 --> 00:08:53,223
+actually called the problem of
+这类问题其实被称作
+
+274
+00:08:53,223 --> 00:08:57,296
+learning the predicted click-through rate, the predicted CTR.
+学习预测的点击率 预估点击率CTR
+
+275
+00:08:57,296 --> 00:08:58,796
+It just means learning the probability
+它仅仅代表这学习
+
+276
+00:08:58,796 --> 00:09:00,491
+that the user will click on
+用户将点击某一个
+
+277
+00:09:00,491 --> 00:09:01,698
+the specific link that you
+特定的 你提供给他们的链接的概率
+
+278
+00:09:01,698 --> 00:09:03,022
+offer them, so CTR is
+所以CTR是
+
+279
+00:09:03,022 --> 00:09:06,528
+an abbreviation for click through rate.
+点击率(Click Through Rate)的简称
+
+280
+00:09:06,528 --> 00:09:07,550
+And if you can estimate the
+然后 如果你能够估计
+
+281
+00:09:07,550 --> 00:09:09,245
+predicted click-through rate for any
+任意一个特定手机的点击率
+
+282
+00:09:09,245 --> 00:09:10,847
+particular phone, what we
+我们可以做的就是
+
+283
+00:09:10,847 --> 00:09:12,171
+can do is use this to
+利用这个来
+
+284
+00:09:12,171 --> 00:09:13,819
+show the user the ten phones
+给用户展示十个
+
+285
+00:09:13,819 --> 00:09:15,770
+that are most likely to click on,
+他们最有可能点击的手机
+
+286
+00:09:15,770 --> 00:09:17,441
+because out of the hundred phones,
+因为从这一百个手机中
+
+287
+00:09:17,441 --> 00:09:20,553
+we can compute this for
+我们可以计算出
+
+288
+00:09:20,553 --> 00:09:21,737
+each of the 100 phones and
+100部手机中 每一部手机的可能的点击率
+
+289
+00:09:21,737 --> 00:09:22,759
+just select the 10 phones
+而且我们选择10部
+
+290
+00:09:22,759 --> 00:09:25,754
+that the user is most likely to click on,
+用户最有可能点击的手机
+
+291
+00:09:25,754 --> 00:09:26,892
+and this will be a pretty reasonable
+那么这就是一个非常合理的
+
+292
+00:09:26,892 --> 00:09:29,818
+way to decide what ten results to show to the user.
+来决定要展示给用户的十个搜索结果的方法
+
+293
+00:09:29,818 --> 00:09:32,186
+Just to be clear, suppose that
+更明确地说 假定
+
+294
+00:09:32,186 --> 00:09:33,440
+every time a user does
+每次用户
+
+295
+00:09:33,440 --> 00:09:35,576
+a search, we return ten results
+进行一次搜索 我们回馈给用户十个结果
+
+296
+00:09:35,576 --> 00:09:37,225
+what that will do is it
+在线学习算法会做的是
+
+297
+00:09:37,225 --> 00:09:38,990
+will actually give us ten
+它会真正地提供给我们十个
+
+298
+00:09:38,990 --> 00:09:40,870
+x,y pairs, this actually
+(x,y) 数据对 这就真的
+
+299
+00:09:40,870 --> 00:09:43,332
+gives us ten training examples every
+给了我们十个数据样本
+
+300
+00:09:43,332 --> 00:09:44,640
+time a user comes to
+每当一个用户来到
+
+301
+00:09:44,640 --> 00:09:46,257
+our website because, because for
+我们网站时就给了我们十个样本
+
+302
+00:09:46,257 --> 00:09:47,535
+the ten phone that we chose
+因为对于这十部我们选择
+
+303
+00:09:47,535 --> 00:09:48,881
+to show the user, for each
+要展示给用户的手机 对于
+
+304
+00:09:48,881 --> 00:09:49,896
+of those 10 phones we get
+这10部手机中的每一个 我们会得到
+
+305
+00:09:49,896 --> 00:09:51,389
+a feature vector X, and
+一个特征矢量x 而且
+
+306
+00:09:51,389 --> 00:09:52,737
+for each of those 10 phones we
+对于这10部手机中的任何一个手机
+
+307
+00:09:52,737 --> 00:09:54,563
+show the user we will also
+我们还会得到
+
+308
+00:09:54,563 --> 00:09:56,172
+get a value for y, we
+y的取值
+
+309
+00:09:56,172 --> 00:09:57,542
+will also observe the value
+我们也会观察这些取值
+
+310
+00:09:57,542 --> 00:09:59,517
+of y, depending on whether
+这些取值是根据
+
+311
+00:09:59,517 --> 00:10:00,925
+or not we clicked on that
+用户有没有点击
+
+312
+00:10:00,925 --> 00:10:02,465
+url or not and
+那个网页链接来决定的
+
+313
+00:10:02,465 --> 00:10:03,696
+so, one way to run a
+这样 运行此类网站的
+
+314
+00:10:03,696 --> 00:10:04,903
+website like this would be to
+一种方法就是
+
+315
+00:10:04,903 --> 00:10:06,830
+continuously show the user,
+连续给用户展示
+
+316
+00:10:06,830 --> 00:10:08,363
+you know, your ten best guesses for
+你的十个最佳猜测
+
+317
+00:10:08,363 --> 00:10:09,895
+what other phones they might like
+这十个推荐是指用户可能会喜欢的其他的手机
+
+318
+00:10:09,895 --> 00:10:11,428
+and so, each time a user
+那么 每次一个用户访问
+
+319
+00:10:11,428 --> 00:10:12,728
+comes you would get ten
+你将会得到十个
+
+320
+00:10:12,728 --> 00:10:14,493
+examples, ten x,y pairs,
+样本 十个(x,y) 数据对
+
+321
+00:10:14,493 --> 00:10:16,304
+and then use an online
+然后利用一个在线学习
+
+322
+00:10:16,304 --> 00:10:17,953
+learning algorithm to update the
+算法来更新你的参数
+
+323
+00:10:17,953 --> 00:10:20,182
+parameters using essentially 10
+更新过程中会对这十个样本利用10步
+
+324
+00:10:20,182 --> 00:10:21,691
+steps of gradient descent on these
+梯度下降法
+
+325
+00:10:21,691 --> 00:10:23,386
+10 examples, and then
+然后
+
+326
+00:10:23,386 --> 00:10:25,081
+you can throw the data away, and
+你可以丢弃你的数据了
+
+327
+00:10:25,081 --> 00:10:26,590
+if you really have a continuous
+如果你真的拥有一个连续的
+
+328
+00:10:26,590 --> 00:10:27,891
+stream of users coming to
+用户流进入
+
+329
+00:10:27,891 --> 00:10:29,354
+your website, this would be
+你的网站 这将会是
+
+330
+00:10:29,354 --> 00:10:31,095
+a pretty reasonable way to learn
+一个非常合理的学习方法
+
+331
+00:10:31,095 --> 00:10:32,395
+parameters for your algorithm
+来学习你的算法中的参数
+
+332
+00:10:32,395 --> 00:10:33,835
+so as to show the ten phones
+从而来给用户展示
+
+333
+00:10:33,835 --> 00:10:35,669
+to your users that may
+十部他们
+
+334
+00:10:35,669 --> 00:10:39,013
+be most promising and the most likely to click on.
+最有可能点击查看的手机
+
+335
+00:10:39,013 --> 00:10:40,151
+So, this is a product search
+所以 这是一个产品搜索问题
+
+336
+00:10:40,151 --> 00:10:41,498
+problem or learning to rank
+或者说是一个学习将手机排序
+
+337
+00:10:41,498 --> 00:10:44,214
+phones, learning to search for phones example.
+的问题 学习搜索手机的样例
+
+338
+00:10:44,214 --> 00:10:46,422
+So, I'll quickly mention a few others.
+接着 我会快速地提及一些其他的例子
+
+339
+00:10:46,422 --> 00:10:47,372
+One is, if you have
+其中一个例子是 如果你有
+
+340
+00:10:47,372 --> 00:10:48,231
+a website and you're trying to
+一个网站 你在尝试着
+
+341
+00:10:48,231 --> 00:10:49,439
+decide, you know, what special
+来决定 你知道的 你要给用户
+
+342
+00:10:49,439 --> 00:10:50,321
+offer to show the user,
+展示什么样的特别优惠
+
+343
+00:10:50,321 --> 00:10:53,154
+this is very similar to phones,
+这与手机那个例子非常类似
+
+344
+00:10:53,154 --> 00:10:54,710
+or if you have a
+或者你有一个
+
+345
+00:10:54,710 --> 00:10:58,216
+website and you show different users different news articles.
+网站 然后你想给不同的用户展示不同的新闻文章
+
+346
+00:10:58,216 --> 00:10:59,911
+So, if you're a news aggregator
+那么 如果你是一个新闻抓取网站
+
+347
+00:10:59,911 --> 00:11:01,374
+website, then you can
+那么你又可以
+
+348
+00:11:01,374 --> 00:11:02,303
+again use a similar system to
+使用一个类似的系统
+
+349
+00:11:02,303 --> 00:11:03,882
+select, to show to
+来选择 来展示给用户
+
+350
+00:11:03,882 --> 00:11:05,554
+the user, you know, what
+他们最有可能感兴趣的
+
+351
+00:11:05,554 --> 00:11:06,877
+are the news articles that they
+他们最有可能感兴趣的
+
+352
+00:11:06,877 --> 00:11:08,154
+are most likely to be interested
+新闻文章
+
+353
+00:11:08,154 --> 00:11:11,103
+in and what are the news articles that they are most likely to click on.
+以及那些他们最有可能点击的新闻文章
+
+354
+00:11:11,103 --> 00:11:13,495
+Closely related to special offers, will we profit from recommendations.
+与特别优惠所密切相关的是 我们将会从这些推荐中获利
+
+355
+00:11:13,495 --> 00:11:15,097
+And in fact, if you have
+而且实际上 如果你有
+
+356
+00:11:15,097 --> 00:11:17,953
+a collaborative filtering system, you
+一个协作过滤系统
+
+357
+00:11:17,953 --> 00:11:20,693
+can even imagine a collaborative filtering
+你可以想象到 一个协作过滤系统
+
+358
+00:11:20,693 --> 00:11:22,643
+system giving you additional
+可以给你更多的
+
+359
+00:11:22,643 --> 00:11:23,897
+features to feed into a
+特征 这些特征可以整合到
+
+360
+00:11:23,897 --> 00:11:25,732
+logistic regression classifier to try
+逻辑回归的分类器 从而可以尝试着
+
+361
+00:11:25,732 --> 00:11:28,100
+to predict the click through
+预测对于你可能推荐给用户的
+
+362
+00:11:28,100 --> 00:11:29,981
+rate for different products that you might recommend to a user.
+不同产品的点击率
+
+363
+00:11:29,981 --> 00:11:32,280
+Of course, I should say that
+当然 我需要说明的是
+
+364
+00:11:32,280 --> 00:11:34,207
+any of these problems could also
+这些问题中的任何一个都可以
+
+365
+00:11:34,207 --> 00:11:35,600
+have been formulated as a
+被归类到
+
+366
+00:11:35,600 --> 00:11:39,873
+standard machine learning problem, where you have a fixed training set.
+标准的 拥有一个固定的样本集的机器学习问题中
+
+367
+00:11:39,873 --> 00:11:40,894
+Maybe, you can run your
+或许 你可以运行一个
+
+368
+00:11:40,894 --> 00:11:41,823
+website for a few days and
+你自己的网站 尝试运行几天
+
+369
+00:11:41,823 --> 00:11:43,727
+then save away a training set,
+然后保存一个数据集
+
+370
+00:11:43,727 --> 00:11:44,842
+a fixed training set, and run
+一个固定的数据集 然后对其运行
+
+371
+00:11:44,842 --> 00:11:45,771
+a learning algorithm on that.
+一个学习算法
+
+372
+00:11:45,771 --> 00:11:48,696
+But these are the actual
+但是这些是实际的
+
+373
+00:11:48,696 --> 00:11:49,950
+sorts of problems, where you do
+问题 在这些问题里
+
+374
+00:11:49,950 --> 00:11:51,901
+see large companies get so
+你会看到大公司会获取
+
+375
+00:11:51,901 --> 00:11:53,712
+much data, that there's really
+如此多的数据 所以真的没有必要
+
+376
+00:11:53,712 --> 00:11:55,221
+maybe no need to save away
+来保存一个
+
+377
+00:11:55,221 --> 00:11:56,963
+a fixed training set, but instead
+固定的数据集 取而代之的是
+
+378
+00:11:56,963 --> 00:11:59,563
+you can use an online learning algorithm to just learn continuously.
+你可以使用一个在线学习算法来连续的学习
+
+379
+00:11:59,563 --> 00:12:04,091
+from the data that users are generating on your website.
+从这些用户不断产生的数据中来学习
+
+380
+00:12:05,183 --> 00:12:07,249
+So, that was the online
+所以 这就是在线学习机制
+
+381
+00:12:07,249 --> 00:12:08,990
+learning setting and as we
+然后就像我们所看到的
+
+382
+00:12:08,990 --> 00:12:10,616
+saw, the algorithm that we apply to
+我们所使用的这个算法
+
+383
+00:12:10,616 --> 00:12:12,357
+it is really very similar
+与随机梯度下降算法
+
+384
+00:12:12,357 --> 00:12:13,867
+to this schotastic gradient descent
+非常类似
+
+385
+00:12:13,867 --> 00:12:15,330
+algorithm, only instead of
+唯一的区别的是 我们不会
+
+386
+00:12:15,330 --> 00:12:16,871
+scanning through a fixed
+使用一个固定的数据集
+
+387
+00:12:16,871 --> 00:12:18,000
+training set, we're instead getting
+我们会做的是获取
+
+388
+00:12:18,000 --> 00:12:19,974
+one example from a user,
+一个用户样本
+
+389
+00:12:19,974 --> 00:12:21,290
+learning from that example, then
+从那个样本中学习 然后
+
+390
+00:12:21,290 --> 00:12:22,644
+discarding it and moving on.
+丢弃那个样本并继续下去
+
+391
+00:12:22,644 --> 00:12:25,593
+And if you have a continuous
+而且如果你对某一种应用有一个连续的
+
+392
+00:12:25,593 --> 00:12:26,777
+stream of data for some application,
+数据流
+
+393
+00:12:26,777 --> 00:12:28,356
+this sort of algorithm may be
+这样的算法可能会
+
+394
+00:12:28,356 --> 00:12:31,816
+well worth considering for your application.
+非常值得考虑
+
+395
+00:12:31,816 --> 00:12:33,952
+And of course, one advantage of
+当然 在线学习的一个优点
+
+396
+00:12:33,952 --> 00:12:36,128
+online learning is also that
+就是
+
+397
+00:12:36,128 --> 00:12:37,458
+if you have a changing pool
+如果你有一个变化的
+
+398
+00:12:37,458 --> 00:12:38,967
+of users, or if the
+用户群 又或者
+
+399
+00:12:38,967 --> 00:12:40,082
+things you're trying to predict are
+你在尝试预测的事情
+
+400
+00:12:40,082 --> 00:12:42,032
+slowly changing like your user
+在缓慢变化 就像你的用户的
+
+401
+00:12:42,032 --> 00:12:43,751
+taste is slowly changing, the online
+品味在缓慢变化 这个在线学习
+
+402
+00:12:43,751 --> 00:12:45,492
+learning algorithm can slowly
+算法可以慢慢地
+
+403
+00:12:45,492 --> 00:12:47,211
+adapt your learned hypothesis to
+调试你所学习到的假设
+
+404
+00:12:47,211 --> 00:12:49,161
+whatever the latest sets of
+将其调节更新到最新的
+
+405
+00:12:49,161 --> 00:12:51,161
+user behaviors are like as well.
+用户行为 (字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
diff --git a/srt/17 - 6 - Map Reduce and Data Parallelism (14 min).srt b/srt/17 - 6 - Map Reduce and Data Parallelism (14 min).srt
new file mode 100644
index 00000000..96aa4e59
--- /dev/null
+++ b/srt/17 - 6 - Map Reduce and Data Parallelism (14 min).srt
@@ -0,0 +1,2101 @@
+1
+00:00:00,320 --> 00:00:01,510
+In the last few videos, we talked
+在上面几个视频中,我们讨论了
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,810 --> 00:00:03,430
+about stochastic gradient descent, and,
+随机梯度下降
+
+3
+00:00:03,620 --> 00:00:05,020
+you know, other variations of the
+以及梯度下降算法的
+
+4
+00:00:05,120 --> 00:00:06,530
+stochastic gradient descent algorithm,
+其他一些变种
+
+5
+00:00:06,910 --> 00:00:09,150
+including those adaptations to online
+包括如何将其
+
+6
+00:00:09,490 --> 00:00:10,420
+learning, but all of those
+运用于在线学习
+
+7
+00:00:10,610 --> 00:00:11,810
+algorithms could be run on
+然而所有这些算法
+
+8
+00:00:12,110 --> 00:00:13,740
+one machine, or could be run on one computer.
+都只能在一台计算机上运行
+
+9
+00:00:14,800 --> 00:00:15,870
+And some machine learning problems
+但是 有些机器学习问题
+
+10
+00:00:16,310 --> 00:00:17,270
+are just too big to run
+太大以至于不可能
+
+11
+00:00:17,520 --> 00:00:19,160
+on one machine, sometimes maybe
+只在一台计算机上运行
+
+12
+00:00:19,300 --> 00:00:21,050
+you just so much data you
+有时候 它涉及的数据量如此巨大
+
+13
+00:00:21,170 --> 00:00:22,350
+just don't ever want to run
+不论你使用何种算法
+
+14
+00:00:22,670 --> 00:00:23,980
+all that data through a
+你都不希望只使用
+
+15
+00:00:24,100 --> 00:00:26,270
+single computer, no matter what algorithm you would use on that computer.
+一台计算机来处理这些数据
+
+16
+00:00:28,470 --> 00:00:29,640
+So in this video I'd
+因此 在这个视频中
+
+17
+00:00:29,740 --> 00:00:31,240
+like to talk about different approach
+我希望介绍
+
+18
+00:00:31,770 --> 00:00:33,610
+to large scale machine learning, called
+进行大规模机器学习的另一种方法
+
+19
+00:00:34,010 --> 00:00:36,190
+the map reduce approach.
+称为map reduce (映射 化简) 方法
+
+20
+00:00:37,030 --> 00:00:38,080
+And even though we have
+尽管我们
+
+21
+00:00:38,380 --> 00:00:39,400
+quite a few videos on stochastic
+用了多个视频讲解
+
+22
+00:00:39,970 --> 00:00:41,230
+gradient descent and we're going
+随机梯度下降算法
+
+23
+00:00:41,550 --> 00:00:43,100
+to spend relative less time
+而我们将只用少量时间
+
+24
+00:00:43,460 --> 00:00:45,350
+on map reduce--don't judge the
+介绍map reduce
+
+25
+00:00:45,560 --> 00:00:46,750
+relative importance of map reduce
+但是请不要根据
+
+26
+00:00:47,160 --> 00:00:48,240
+versus the gradient descent
+我们所花的时间长短
+
+27
+00:00:48,690 --> 00:00:49,590
+based on the amount amount of
+来判断哪一种技术
+
+28
+00:00:49,660 --> 00:00:51,480
+time I spend on these ideas in particular.
+更加重要
+
+29
+00:00:52,230 --> 00:00:53,380
+Many people will say that
+事实上 许多人认为
+
+30
+00:00:53,790 --> 00:00:54,840
+map reduce is at least
+map reduce方法至少是
+
+31
+00:00:55,090 --> 00:00:56,330
+an equally important, and some
+同等重要的
+
+32
+00:00:56,580 --> 00:00:57,850
+would say an even more important idea
+还有人认为map reduce方法
+
+33
+00:00:58,500 --> 00:01:00,620
+compared to gradient descent, only
+甚至比梯度下降方法更重要
+
+34
+00:01:01,460 --> 00:01:03,040
+it's relatively simpler to
+我们之所以只在
+
+35
+00:01:03,160 --> 00:01:04,620
+explain, which is why I'm
+map reduce上话少量时间
+
+36
+00:01:04,720 --> 00:01:05,580
+going to spend less time on
+只是因为它相对简单 容易解释
+
+37
+00:01:05,830 --> 00:01:07,040
+it, but using these ideas
+然而 实际上
+
+38
+00:01:07,670 --> 00:01:08,400
+you might be able to scale
+相比于随机梯度下降方法
+
+39
+00:01:09,070 --> 00:01:10,640
+learning algorithms to even
+map reduce方法
+
+40
+00:01:10,880 --> 00:01:12,520
+far larger problems than is
+能够处理
+
+41
+00:01:12,630 --> 00:01:14,530
+possible using stochastic gradient descent.
+更大规模的问题
+
+42
+00:01:18,720 --> 00:01:19,000
+Here's the idea.
+它的想法是这样的
+
+43
+00:01:19,810 --> 00:01:21,020
+Let's say we want to fit
+假设我们要
+
+44
+00:01:21,490 --> 00:01:22,960
+a linear regression model or
+拟合一个线性回归模型
+
+45
+00:01:23,140 --> 00:01:24,440
+a logistic regression model or some
+或者Logistic回归模型
+
+46
+00:01:24,540 --> 00:01:26,100
+such, and let's start again
+或者其他的什么模型
+
+47
+00:01:26,430 --> 00:01:27,660
+with batch gradient descent, so
+让我们再次从随机梯度下降算法开始吧
+
+48
+00:01:27,840 --> 00:01:30,300
+that's our batch gradient descent learning rule.
+这就是我们的随机梯度下降学习算法
+
+49
+00:01:31,240 --> 00:01:32,430
+And to keep the writing
+为了让幻灯片上的文字
+
+50
+00:01:32,850 --> 00:01:34,170
+on this slide tractable, I'm going
+更容易理解
+
+51
+00:01:34,340 --> 00:01:36,990
+to assume throughout that we have m equals 400 examples.
+我们将假定m固定为400个样本
+
+52
+00:01:37,530 --> 00:01:39,560
+Of course, by our
+当然 根据
+
+53
+00:01:39,750 --> 00:01:40,850
+standards, in terms of large scale
+大规模机器学习的标准
+
+54
+00:01:41,090 --> 00:01:42,050
+machine learning, you know m
+m等于400
+
+55
+00:01:42,170 --> 00:01:43,210
+might be pretty small and so,
+实在是太小了
+
+56
+00:01:43,770 --> 00:01:45,390
+this might be more commonly
+也许在实际问题中
+
+57
+00:01:45,870 --> 00:01:46,920
+applied to problems, where you
+你更有可能遇到
+
+58
+00:01:47,050 --> 00:01:48,190
+have maybe closer to 400
+样本大小为4亿
+
+59
+00:01:48,740 --> 00:01:49,940
+million examples, or some
+的数据
+
+60
+00:01:50,080 --> 00:01:51,310
+such, but just to
+或者其他差不多的大小
+
+61
+00:01:51,390 --> 00:01:52,330
+make the writing on the slide
+但是 为了使我们的讲解更加简单和清晰
+
+62
+00:01:52,770 --> 00:01:55,000
+simpler, I'm going to pretend we have 400 examples.
+我们假定我们只有400个样本
+
+63
+00:01:55,690 --> 00:01:57,460
+So in that case, the
+这样以来
+
+64
+00:01:57,790 --> 00:01:59,080
+batch gradient descent learning rule
+随机梯度下降学习算法中
+
+65
+00:01:59,570 --> 00:02:00,930
+has this 400 and the
+这里是400
+
+66
+00:02:01,500 --> 00:02:02,930
+sum from i equals 1 through
+以及400个样本的求和
+
+67
+00:02:03,330 --> 00:02:05,050
+400 through my 400 examples
+这里i从1取到400
+
+68
+00:02:05,590 --> 00:02:06,890
+here, and if m
+如果m很大
+
+69
+00:02:07,050 --> 00:02:09,780
+is large, then this is a computationally expensive step.
+那么这一步的计算量将会很大
+
+70
+00:02:10,890 --> 00:02:12,830
+So, what the MapReduce idea
+因此 下面我们来介绍
+
+71
+00:02:13,250 --> 00:02:14,470
+does is the following, and
+map reduce算法
+
+72
+00:02:14,890 --> 00:02:15,740
+I should say the map
+这里我必须指出
+
+73
+00:02:15,950 --> 00:02:16,940
+reduce idea is due to
+map reduce算法的基本思想
+
+74
+00:02:17,680 --> 00:02:20,190
+two researchers, Jeff Dean
+来自Jeff Dean和Sanjay Gimawat
+
+75
+00:02:20,700 --> 00:02:22,060
+and Sanjay Gimawat.
+这两位研究者
+
+76
+00:02:22,640 --> 00:02:23,490
+Jeff Dean, by the way, is
+Jeff Dean是硅谷
+
+77
+00:02:24,190 --> 00:02:26,520
+one of the most legendary engineers in
+最为传奇般的
+
+78
+00:02:26,660 --> 00:02:28,300
+all of Silicon Valley and he
+一位工程师
+
+79
+00:02:28,420 --> 00:02:29,530
+kind of built a large
+今天谷歌 (Google) 所有的服务
+
+80
+00:02:29,820 --> 00:02:31,670
+fraction of the architectural
+所依赖的后台基础架构
+
+81
+00:02:32,310 --> 00:02:34,770
+infrastructure that all of Google runs on today.
+有很大一部分是他创建的
+
+82
+00:02:36,000 --> 00:02:37,320
+But here's the map reduce idea.
+接下来我们回到 map reduce 的基本想法
+
+83
+00:02:37,850 --> 00:02:38,570
+So, let's say I have
+假设我们有一个
+
+84
+00:02:38,700 --> 00:02:39,840
+some training set, if we
+训练样本
+
+85
+00:02:39,900 --> 00:02:41,220
+want to denote by this box here
+我们将它表示为
+
+86
+00:02:41,610 --> 00:02:42,760
+of X Y pairs,
+这个方框中的一系列X~Y数据对
+
+87
+00:02:44,250 --> 00:02:47,730
+where it's X1, Y1, down
+从X1~Y1开始
+
+88
+00:02:47,990 --> 00:02:49,640
+to my 400 examples,
+涵盖我所有的400个样本
+
+89
+00:02:50,520 --> 00:02:51,660
+Xm, Ym.
+直到X400~Y400
+
+90
+00:02:52,190 --> 00:02:53,780
+So, that's my training set with 400 training examples.
+总之 这就是我的400个训练样本
+
+91
+00:02:55,060 --> 00:02:56,550
+In the MapReduce idea, one way
+根据map reduce思想
+
+92
+00:02:56,690 --> 00:02:58,190
+to do, is split this training
+一种解决方案是
+
+93
+00:02:58,570 --> 00:03:00,510
+set in to different subsets.
+将训练集划分成几个不同的子集
+
+94
+00:03:01,890 --> 00:03:02,590
+I'm going to.
+在这个例子中
+
+95
+00:03:02,950 --> 00:03:04,150
+assume for this example that
+我假定我有
+
+96
+00:03:04,290 --> 00:03:05,530
+I have 4 computers,
+4台计算机
+
+97
+00:03:06,160 --> 00:03:07,160
+or 4 machines to run in
+它们并行的
+
+98
+00:03:07,300 --> 00:03:08,670
+parallel on my training set,
+处理我的训练数据
+
+99
+00:03:08,890 --> 00:03:10,570
+which is why I'm splitting this into 4 machines.
+因此我要将数据划分成4份 分给这4台计算机
+
+100
+00:03:10,920 --> 00:03:12,290
+If you have 10 machines or
+如果你有10台计算机
+
+101
+00:03:12,400 --> 00:03:13,810
+100 machines, then you would
+或者100台计算机
+
+102
+00:03:13,970 --> 00:03:15,890
+split your training set into 10 pieces or 100 pieces or what have you.
+那么你可能会将训练数据划分成10份或者100份
+
+103
+00:03:18,040 --> 00:03:19,710
+And what the first of my
+我的4台计算机中
+
+104
+00:03:19,850 --> 00:03:20,840
+4 machines is to do,
+第一台
+
+105
+00:03:21,100 --> 00:03:23,170
+say, is use just the
+将处理第一个
+
+106
+00:03:23,270 --> 00:03:25,170
+first one quarter of my
+四分之一训练数据
+
+107
+00:03:25,300 --> 00:03:28,680
+training set--so use just the first 100 training examples.
+也就是前100个训练样本
+
+108
+00:03:30,020 --> 00:03:31,440
+And in particular, what it's
+具体来说
+
+109
+00:03:31,480 --> 00:03:32,520
+going to do is look at
+这台计算机
+
+110
+00:03:32,630 --> 00:03:34,800
+this summation, and compute
+将参与处理这个求和
+
+111
+00:03:35,490 --> 00:03:38,560
+that summation for just the first 100 training examples.
+它将对前100个训练样本进行求和运算
+
+112
+00:03:40,030 --> 00:03:40,960
+So let me write that up
+让我把公式写下来吧
+
+113
+00:03:41,110 --> 00:03:42,530
+I'm going to compute a variable
+我将计算临时变量
+
+114
+00:03:43,560 --> 00:03:46,230
+temp 1 to superscript 1
+temp 1 这里的上标1
+
+115
+00:03:46,320 --> 00:03:49,410
+the first machine J equals
+表示第一台计算机
+
+116
+00:03:50,450 --> 00:03:52,150
+sum from equals 1 through
+其下标为j 该变量等于从1到100的求和
+
+117
+00:03:52,260 --> 00:03:53,160
+100, and then I'm going to plug
+然后我在这里写的部分
+
+118
+00:03:53,500 --> 00:03:56,610
+in exactly that term there--so I have
+和这里的完全相同
+
+119
+00:03:57,260 --> 00:04:00,140
+X-theta, Xi, minus Yi
+也就是h θ Xi减Yi
+
+120
+00:04:01,800 --> 00:04:03,230
+times Xij, right?
+乘以Xij
+
+121
+00:04:03,740 --> 00:04:05,680
+So that's just that
+这其实就是
+
+122
+00:04:05,910 --> 00:04:07,460
+gradient descent term up there.
+这里的梯度下降公式中的这一项
+
+123
+00:04:08,300 --> 00:04:09,780
+And then similarly, I'm going
+然后 类似的
+
+124
+00:04:10,010 --> 00:04:11,330
+to take the second quarter
+我将用第二台计算机
+
+125
+00:04:11,600 --> 00:04:13,130
+of my data and send it
+处理我的
+
+126
+00:04:13,320 --> 00:04:14,520
+to my second machine, and
+第二个四分之一数据
+
+127
+00:04:14,690 --> 00:04:15,680
+my second machine will use
+也就是说 我的第二台计算机
+
+128
+00:04:15,900 --> 00:04:18,750
+training examples 101 through 200
+将使用第101到200号训练样本
+
+129
+00:04:19,350 --> 00:04:21,170
+and you will compute similar variables
+类似的 我们用它
+
+130
+00:04:21,720 --> 00:04:22,880
+of a temp to j which
+计算临时变量 temp 2 j
+
+131
+00:04:23,110 --> 00:04:24,450
+is the same sum for index
+也就是从101到200号
+
+132
+00:04:24,890 --> 00:04:26,620
+from examples 101 through 200.
+数据的求和
+
+133
+00:04:26,840 --> 00:04:29,680
+And similarly machines 3
+类似的 第三台和第四台
+
+134
+00:04:29,830 --> 00:04:32,720
+and 4 will use the
+计算机将会使用
+
+135
+00:04:32,830 --> 00:04:34,110
+third quarter and the fourth
+第三个和第四个
+
+136
+00:04:34,570 --> 00:04:36,550
+quarter of my training set.
+四分之一训练样本
+
+137
+00:04:37,530 --> 00:04:38,950
+So now each machine has
+这样 现在每台计算机
+
+138
+00:04:39,190 --> 00:04:40,580
+to sum over 100 instead
+不用处理400个样本
+
+139
+00:04:41,060 --> 00:04:42,570
+of over 400 examples and so
+而只用处理100个样本
+
+140
+00:04:42,760 --> 00:04:43,750
+has to do only a quarter
+它们只用完成
+
+141
+00:04:44,050 --> 00:04:45,220
+of the work and thus presumably
+四分之一的工作量
+
+142
+00:04:45,900 --> 00:04:48,000
+it could do it about four times as fast.
+这样 也许可以将运算速度提高到原来的四倍
+
+143
+00:04:49,380 --> 00:04:50,630
+Finally, after all these machines
+最后 当这些计算机
+
+144
+00:04:50,990 --> 00:04:51,740
+have done this work, I am
+全都完成了各自的工作
+
+145
+00:04:51,850 --> 00:04:53,560
+going to take these temp variables
+我会将这些临时变量
+
+146
+00:04:55,350 --> 00:04:56,480
+and put them back together.
+收集到一起
+
+147
+00:04:56,870 --> 00:04:58,400
+So I take these variables and
+我会将它们
+
+148
+00:04:58,530 --> 00:04:59,950
+send them all to a You
+送到一个
+
+149
+00:05:00,090 --> 00:05:03,080
+know centralized master server and
+中心计算服务器
+
+150
+00:05:03,300 --> 00:05:04,750
+what the master will do
+这台服务器会
+
+151
+00:05:05,140 --> 00:05:06,720
+is combine these results together.
+将这些临时变量合并起来
+
+152
+00:05:07,360 --> 00:05:08,470
+and in particular, it will
+具体来说
+
+153
+00:05:08,780 --> 00:05:10,780
+update my parameters theta
+它将根据以下公式
+
+154
+00:05:11,000 --> 00:05:13,160
+j according to theta
+来更新参数θj
+
+155
+00:05:13,410 --> 00:05:14,720
+j gets updated as theta j
+新的θj将等于
+
+156
+00:05:15,730 --> 00:05:17,560
+minus Of the
+旧的θj减去
+
+157
+00:05:17,680 --> 00:05:19,510
+learning rate alpha times one
+学习速率α乘以
+
+158
+00:05:20,120 --> 00:05:22,940
+over 400 times temp,
+400分之一
+
+159
+00:05:23,300 --> 00:05:27,410
+1, J, plus temp
+乘以临时变量 temp 1 j
+
+160
+00:05:27,760 --> 00:05:30,290
+2j plus temp 3j
+加temp 2j 加temp 3j
+
+161
+00:05:32,400 --> 00:05:35,470
+plus temp 4j and
+加temp 4j
+
+162
+00:05:35,560 --> 00:05:37,890
+of course we have to do this separately for J equals 0.
+当然 对于j等于0的情况我们需要单独处理
+
+163
+00:05:37,980 --> 00:05:39,570
+You know, up to
+这里 j从0
+
+164
+00:05:39,820 --> 00:05:41,220
+and within this number of features.
+取到特征总数n
+
+165
+00:05:42,550 --> 00:05:45,420
+So operating this equation into I hope it's clear.
+通过将这个公式拆成多行讲解 我希望大家已经理解了
+
+166
+00:05:45,670 --> 00:05:47,870
+So what this equation
+其实 这个公式计算的数值
+
+167
+00:05:50,930 --> 00:05:53,220
+is doing is exactly the
+和原先的梯度下降公式计算的数值
+
+168
+00:05:53,290 --> 00:05:54,570
+same is that when you
+是完全一样的
+
+169
+00:05:54,660 --> 00:05:56,140
+have a centralized master server
+只不过 现在我们有一个中心运算服务器
+
+170
+00:05:56,680 --> 00:05:57,950
+that takes the results, the ten
+它收集了一些部分计算结果
+
+171
+00:05:58,040 --> 00:05:58,780
+one j the ten two j
+temp 1j temp 2j
+
+172
+00:05:59,000 --> 00:05:59,850
+ten three j and ten four
+temp 3j 和 temp4j
+
+173
+00:05:59,970 --> 00:06:01,760
+j and adds them up
+把它们加了起来
+
+174
+00:06:02,030 --> 00:06:03,430
+and so of course the sum
+很显然 这四个
+
+175
+00:06:04,090 --> 00:06:04,960
+of these four things.
+临时变量的和
+
+176
+00:06:06,360 --> 00:06:07,810
+Right, that's just the sum of
+就是这个求和
+
+177
+00:06:08,060 --> 00:06:09,440
+this, plus the sum
+加上这个求和
+
+178
+00:06:09,760 --> 00:06:11,490
+of this, plus the sum
+加上这个求和
+
+179
+00:06:11,630 --> 00:06:13,000
+of this, plus the sum
+再加上这个求和
+
+180
+00:06:13,120 --> 00:06:14,290
+of that, and those four
+它们加起来的和
+
+181
+00:06:14,470 --> 00:06:15,830
+things just add up to
+其实和原先
+
+182
+00:06:15,920 --> 00:06:17,740
+be equal to this sum that
+我们使用批量梯度下降公式
+
+183
+00:06:17,880 --> 00:06:19,580
+we're originally computing a batch stream descent.
+计算的结果是一样的
+
+184
+00:06:20,590 --> 00:06:21,550
+And then we have the alpha times
+接下来 我们有
+
+185
+00:06:21,860 --> 00:06:22,910
+1 of 400, alpha times 1
+α乘以400分之一
+
+186
+00:06:23,350 --> 00:06:24,690
+of 100, and this is
+这里也是α乘以400分之一
+
+187
+00:06:25,020 --> 00:06:27,020
+exactly equivalent to the
+因此这个公式
+
+188
+00:06:27,140 --> 00:06:29,390
+batch gradient descent algorithm, only,
+完全等同于批量梯度下降公式
+
+189
+00:06:29,910 --> 00:06:30,880
+instead of needing to sum
+唯一的不同是
+
+190
+00:06:31,290 --> 00:06:32,540
+over all four hundred training
+我们原本需要在一台计算机上
+
+191
+00:06:32,810 --> 00:06:33,900
+examples on just one
+完成400个训练样本的求和
+
+192
+00:06:34,040 --> 00:06:35,280
+machine, we can instead
+而现在
+
+193
+00:06:35,760 --> 00:06:37,460
+divide up the work load on four machines.
+我们将这个工作分给了4台计算机
+
+194
+00:06:39,090 --> 00:06:40,190
+So, here's what the general
+总结来说
+
+195
+00:06:40,630 --> 00:06:43,410
+picture of the MapReduce technique looks like.
+map reduce技术是这么工作的
+
+196
+00:06:45,060 --> 00:06:46,510
+We have some training sets, and
+我们有一些训练样本
+
+197
+00:06:46,670 --> 00:06:48,200
+if we want to paralyze across four
+如果我们希望使用4台计算机
+
+198
+00:06:48,420 --> 00:06:49,100
+machines, we are going to
+并行的运行机器学习算法
+
+199
+00:06:49,170 --> 00:06:51,670
+take the training set and split it, you know, equally.
+那么我们将训练样本等分
+
+200
+00:06:52,120 --> 00:06:54,640
+Split it as evenly as we can into four subsets.
+尽量均匀的分成4份
+
+201
+00:06:56,470 --> 00:06:57,110
+Then we are going to take the
+然后 我们将这4个
+
+202
+00:06:57,180 --> 00:06:59,560
+4 subsets of the training data and send them to 4 different computers.
+训练样本的子集送给4台不同的计算机
+
+203
+00:07:00,530 --> 00:07:01,660
+And each of the 4 computers
+每一台计算机
+
+204
+00:07:02,130 --> 00:07:03,570
+can compute a summation over
+对四分之一的
+
+205
+00:07:03,950 --> 00:07:04,850
+just one quarter of the
+训练数据
+
+206
+00:07:04,910 --> 00:07:06,230
+training set, and then
+进行求和运算
+
+207
+00:07:06,340 --> 00:07:07,720
+finally take each of
+最后 这4个求和结果
+
+208
+00:07:07,780 --> 00:07:09,310
+the computers takes the results, sends
+被送到一台中心计算服务器
+
+209
+00:07:09,580 --> 00:07:12,720
+them to a centralized server, which then combines the results together.
+负责对结果进行汇总
+
+210
+00:07:13,570 --> 00:07:14,900
+So, on the previous line
+在前一张幻灯片中
+
+211
+00:07:15,190 --> 00:07:16,540
+in that example, the bulk
+在那个例子中
+
+212
+00:07:16,800 --> 00:07:17,910
+of the work in gradient descent,
+梯度下降计算
+
+213
+00:07:18,330 --> 00:07:20,140
+was computing the sum from
+的内容是对i等于1到400的
+
+214
+00:07:20,430 --> 00:07:22,270
+i equals 1 to 400 of something.
+400个样本进行求和运算
+
+215
+00:07:22,670 --> 00:07:24,110
+So more generally, sum from
+更宽泛的来讲 在梯度下降计算中
+
+216
+00:07:24,370 --> 00:07:25,570
+i equals 1 to m
+我们是对i等于1到m的m个样本
+
+217
+00:07:26,320 --> 00:07:28,180
+of that formula for gradient descent.
+进行求和
+
+218
+00:07:29,160 --> 00:07:30,430
+And now, because each of
+现在 因为这4台计算机
+
+219
+00:07:30,550 --> 00:07:31,890
+the four computers can do just
+的每一台都可以
+
+220
+00:07:32,190 --> 00:07:33,800
+a quarter of the work, potentially
+完成四分之一的计算工作
+
+221
+00:07:34,350 --> 00:07:35,940
+you can get up to a 4x speed up.
+因此你可能会得到4倍的加速
+
+222
+00:07:38,820 --> 00:07:39,980
+In particular, if there were
+特别的
+
+223
+00:07:40,190 --> 00:07:41,900
+no network latencies and
+如果没有网络延时
+
+224
+00:07:42,100 --> 00:07:42,970
+no costs of the network
+也不考虑
+
+225
+00:07:43,400 --> 00:07:44,450
+communications to send the
+通过网络来回传输数据
+
+226
+00:07:44,600 --> 00:07:45,450
+data back and forth, you can
+所消耗的时间
+
+227
+00:07:45,610 --> 00:07:47,820
+potentially get up to a 4x speed up.
+那么你可能可以得到4倍的加速
+
+228
+00:07:48,050 --> 00:07:49,410
+Of course, in practice,
+当然 在实际工作中
+
+229
+00:07:50,100 --> 00:07:52,080
+because of network latencies,
+因为网络延时
+
+230
+00:07:52,810 --> 00:07:54,500
+the overhead of combining the
+数据汇总额外消耗时间
+
+231
+00:07:54,600 --> 00:07:55,880
+results afterwards and other factors,
+以及其他的一些因素
+
+232
+00:07:56,640 --> 00:07:59,150
+in practice you get slightly less than a 4x speedup.
+你能得到的加速总是略小于4倍的
+
+233
+00:08:00,140 --> 00:08:01,280
+But, none the less, this sort
+但是 不管怎么说
+
+234
+00:08:01,360 --> 00:08:02,710
+of macro juice approach does offer
+这种map reduce算法
+
+235
+00:08:03,110 --> 00:08:04,560
+us a way to process much
+确实让我们能够处理
+
+236
+00:08:04,870 --> 00:08:05,940
+larger data sets than is
+通常单台计算机
+
+237
+00:08:06,270 --> 00:08:07,550
+possible using a single computer.
+所无法处理的大规模数据
+
+238
+00:08:08,770 --> 00:08:10,060
+If you are thinking of applying
+如果你打算
+
+239
+00:08:10,730 --> 00:08:12,200
+Map Reduce to some learning
+将map reduce技术用于
+
+240
+00:08:12,350 --> 00:08:14,260
+algorithm, in order to speed this up.
+加速某个机器学习算法
+
+241
+00:08:14,750 --> 00:08:16,160
+By paralleling the computation
+也就是说 你打算运用多台不同的计算机
+
+242
+00:08:16,900 --> 00:08:18,480
+over different computers, the key
+并行的进行计算
+
+243
+00:08:18,730 --> 00:08:20,040
+question to ask yourself is,
+那么你需要问自己一个很关键的问题
+
+244
+00:08:20,760 --> 00:08:22,190
+can your learning algorithm be expressed
+那就是 你的机器学习算法
+
+245
+00:08:22,880 --> 00:08:25,150
+as a summation over the training set?
+是否可以表示为训练样本的某种求和
+
+246
+00:08:25,440 --> 00:08:26,430
+And it turns out that many
+事实证明
+
+247
+00:08:26,670 --> 00:08:28,100
+learning algorithms can actually be
+很多机器学习算法
+
+248
+00:08:28,410 --> 00:08:29,880
+expressed as computing sums of
+的确可以表示为
+
+249
+00:08:30,170 --> 00:08:31,820
+functions over the training set and
+关于训练样本的函数求和
+
+250
+00:08:32,610 --> 00:08:34,030
+the computational expense of running
+而在处理大数据时
+
+251
+00:08:34,250 --> 00:08:35,480
+them on large data sets is
+这些算法的主要运算量
+
+252
+00:08:35,600 --> 00:08:37,810
+because they need to sum over a very large training set.
+在于对大量训练数据求和
+
+253
+00:08:38,620 --> 00:08:39,870
+So, whenever your learning algorithm
+因此 只要你的机器学习算法
+
+254
+00:08:40,200 --> 00:08:41,350
+can be expressed as a
+可以表示为
+
+255
+00:08:41,450 --> 00:08:42,410
+sum of the training set
+训练样本的一个求和
+
+256
+00:08:42,660 --> 00:08:43,760
+and whenever the bulk of the
+只要算法的
+
+257
+00:08:43,860 --> 00:08:44,810
+work of the learning algorithm
+主要计算部分
+
+258
+00:08:45,200 --> 00:08:46,170
+can be expressed as the sum
+可以表示为
+
+259
+00:08:46,320 --> 00:08:47,780
+of the training set, then map
+训练样本的求和
+
+260
+00:08:48,030 --> 00:08:49,030
+reviews might a good candidate
+那么你可以考虑使用map reduce技术
+
+261
+00:08:50,100 --> 00:08:52,830
+for scaling your learning algorithms through very, very good data sets.
+来将你的算法扩展到非常大规模的数据上
+
+262
+00:08:53,880 --> 00:08:54,910
+Lets just look at one more example.
+让我们再看一个例子
+
+263
+00:08:56,020 --> 00:08:58,120
+Let's say that we want to use one of the advanced optimization algorithm.
+假设我们想使用某种高级优化算法
+
+264
+00:08:58,410 --> 00:08:59,430
+So, things like, you
+比如说
+
+265
+00:08:59,550 --> 00:09:00,460
+know, l, b, f, g, s constant
+LBFGS算法
+
+266
+00:09:00,900 --> 00:09:02,960
+gradient and so on, and
+或者共轭梯度算法等等
+
+267
+00:09:03,070 --> 00:09:05,190
+let's say we want to train a logistic regression of the algorithm.
+假设我们想使用logistic回归算法
+
+268
+00:09:06,080 --> 00:09:08,680
+For that, we need to compute two main quantities.
+于是 我们需要计算两个值
+
+269
+00:09:09,300 --> 00:09:10,460
+One is for the advanced
+对于LBFGS算法和共轭梯度算法
+
+270
+00:09:10,960 --> 00:09:13,520
+optimization algorithms like, you know, LPF and constant gradient.
+我们需要计算的第一个值是
+
+271
+00:09:14,310 --> 00:09:15,270
+We need to provide it a
+我们需要提供一种方法
+
+272
+00:09:15,530 --> 00:09:17,210
+routine to compute the
+用于计算
+
+273
+00:09:17,460 --> 00:09:18,760
+cost function of the optimization objective.
+优化目标的成本函数值
+
+274
+00:09:20,220 --> 00:09:21,690
+And so for logistic regression, you
+比如 对于logistic回归
+
+275
+00:09:21,820 --> 00:09:22,870
+remember that a cost function
+你应该记得它的成本函数
+
+276
+00:09:23,660 --> 00:09:24,700
+has this sort of sum over
+可以表示为
+
+277
+00:09:24,960 --> 00:09:26,340
+the training set, and so
+训练样本上的这种求和
+
+278
+00:09:26,970 --> 00:09:28,980
+if youre paralizing over
+因此 如果你想在
+
+279
+00:09:29,110 --> 00:09:29,970
+ten machines, you would split
+10台计算机上并行计算
+
+280
+00:09:30,310 --> 00:09:31,640
+up the training set onto ten
+那么你需要将训练样本
+
+281
+00:09:31,910 --> 00:09:33,150
+machines and have each
+分给这10台计算机
+
+282
+00:09:33,360 --> 00:09:35,380
+of the ten machines compute the sum
+让每台计算机
+
+283
+00:09:35,860 --> 00:09:37,460
+of this quantity over just
+计算10份之一
+
+284
+00:09:37,760 --> 00:09:38,660
+one tenth of the training
+训练数据的
+
+285
+00:09:40,370 --> 00:09:40,370
+data.
+求和
+
+286
+00:09:40,670 --> 00:09:41,550
+Then, the other thing that the
+高级优化算法
+
+287
+00:09:42,110 --> 00:09:43,400
+advanced optimization algorithms need,
+还需要提供
+
+288
+00:09:43,660 --> 00:09:44,790
+is a routine to compute
+这些偏导数
+
+289
+00:09:45,190 --> 00:09:47,160
+these partial derivative terms.
+的计算方法
+
+290
+00:09:47,280 --> 00:09:48,980
+Once again, these derivative terms, for
+同样的 对于logistic回归
+
+291
+00:09:49,100 --> 00:09:50,350
+which it's a logistic regression, can
+这些偏导数
+
+292
+00:09:50,540 --> 00:09:51,840
+be expressed as a sum over
+可以表示为
+
+293
+00:09:52,010 --> 00:09:53,130
+the training set, and so once
+训练数据的求和
+
+294
+00:09:53,330 --> 00:09:54,600
+again, similar to our earlier
+因此 和之前的例子类似
+
+295
+00:09:54,950 --> 00:09:56,060
+example, you would have
+你可以让
+
+296
+00:09:56,520 --> 00:09:57,800
+each machine compute that summation
+每台计算机只计算
+
+297
+00:09:58,800 --> 00:10:01,170
+over just some small fraction of your training data.
+部分训练数据上的求和
+
+298
+00:10:02,440 --> 00:10:04,590
+And finally, having computed
+最后 当这些求和计算完成之后
+
+299
+00:10:05,050 --> 00:10:06,260
+all of these things, they could
+求和结果
+
+300
+00:10:06,400 --> 00:10:07,520
+then send their results to
+会被发送到
+
+301
+00:10:07,680 --> 00:10:09,400
+a centralized server, which can
+一台中心计算服务器上
+
+302
+00:10:09,640 --> 00:10:12,760
+then add up the partial sums.
+这台服务器将对结果进行再次求和
+
+303
+00:10:13,320 --> 00:10:14,410
+This corresponds to adding up
+这等同于
+
+304
+00:10:14,500 --> 00:10:17,000
+those tenth i or
+对临时变量temp i
+
+305
+00:10:17,550 --> 00:10:21,880
+tenth ij variables, which
+或者temp ij进行求和
+
+306
+00:10:22,100 --> 00:10:23,610
+were computed locally on machine
+而这些临时标量
+
+307
+00:10:23,980 --> 00:10:25,390
+number i, and so
+是第i台计算机算出来的
+
+308
+00:10:25,420 --> 00:10:26,800
+the centralized server can sum
+中心计算服务器
+
+309
+00:10:27,050 --> 00:10:28,220
+these things up and get
+对这些临时变量求和
+
+310
+00:10:28,450 --> 00:10:30,230
+the overall cost function
+得到了总的成本函数值
+
+311
+00:10:30,870 --> 00:10:32,750
+and get the overall partial derivative,
+以及总的偏导数值
+
+312
+00:10:33,390 --> 00:10:35,710
+which you can then pass through the advanced optimization algorithm.
+然后你可以将这两个值传给高级优化函数
+
+313
+00:10:36,890 --> 00:10:38,100
+So, more broadly, by taking
+因此 更广义的来说
+
+314
+00:10:39,080 --> 00:10:40,790
+other learning algorithms and
+通过将机器学习算法
+
+315
+00:10:41,020 --> 00:10:42,430
+expressing them in sort of
+表示为
+
+316
+00:10:42,720 --> 00:10:43,800
+summation form or by expressing
+求和的形式
+
+317
+00:10:44,340 --> 00:10:45,660
+them in terms of computing sums
+或者是
+
+318
+00:10:45,990 --> 00:10:47,100
+of functions over the training set,
+训练数据的函数求和形式
+
+319
+00:10:47,740 --> 00:10:49,290
+you can use the MapReduce technique to
+你就可以运用map reduce技术
+
+320
+00:10:49,440 --> 00:10:51,420
+parallelize other learning algorithms as well,
+来将算法并行化
+
+321
+00:10:51,710 --> 00:10:53,310
+and scale them to very large training sets.
+这样就可以处理大规模数据了
+
+322
+00:10:54,340 --> 00:10:55,850
+Finally, as one last comment,
+最后再提醒一点
+
+323
+00:10:56,390 --> 00:10:57,170
+so far we have been
+目前我们只讨论了
+
+324
+00:10:57,510 --> 00:10:59,630
+discussing MapReduce algorithms as
+运用map reduce技术
+
+325
+00:10:59,850 --> 00:11:01,400
+allowing you to parallelize over
+在多台计算机上
+
+326
+00:11:02,090 --> 00:11:03,630
+multiple computers, maybe multiple
+实现并行计算
+
+327
+00:11:03,940 --> 00:11:05,020
+computers in a computer
+也许是一个计算机集群
+
+328
+00:11:05,220 --> 00:11:08,060
+cluster or over multiple computers in the data center.
+也许是一个数据中心中的多台计算机
+
+329
+00:11:09,150 --> 00:11:10,580
+It turns out that sometimes even
+但实际上
+
+330
+00:11:10,770 --> 00:11:12,010
+if you have just a single computer,
+有时即使我们只有一台计算机
+
+331
+00:11:13,090 --> 00:11:14,390
+MapReduce can also be applicable.
+我们也可以运用map reduce技术
+
+332
+00:11:15,530 --> 00:11:16,970
+In particular, on many single
+具体来说
+
+333
+00:11:17,320 --> 00:11:18,510
+computers now, you can have
+现在的许多计算机
+
+334
+00:11:18,780 --> 00:11:20,520
+multiple processing cores.
+都是多核的
+
+335
+00:11:21,170 --> 00:11:21,860
+You can have multiple CPUs,
+你可以有多个CPU
+
+336
+00:11:22,180 --> 00:11:23,120
+and within each CPU you can
+而每个CPU
+
+337
+00:11:23,240 --> 00:11:26,170
+have multiple proc cores.
+又包括多个核
+
+338
+00:11:26,310 --> 00:11:27,170
+If you have a large training
+如果你有一个
+
+339
+00:11:27,520 --> 00:11:28,460
+set, what you can
+很大的训练样本
+
+340
+00:11:28,570 --> 00:11:29,540
+do if, say, you have
+那么你可以
+
+341
+00:11:29,740 --> 00:11:31,520
+a computer with 4
+使用一台
+
+342
+00:11:31,880 --> 00:11:33,400
+computing cores, what you
+四核的计算机
+
+343
+00:11:33,460 --> 00:11:34,390
+can do is, even on a
+即使在这样一台计算机上
+
+344
+00:11:34,550 --> 00:11:35,580
+single computer you can split the
+你依然可以
+
+345
+00:11:35,760 --> 00:11:37,680
+training sets into pieces and
+将训练样本分成几份
+
+346
+00:11:37,810 --> 00:11:39,140
+send the training set to different
+然后让每一个核
+
+347
+00:11:39,660 --> 00:11:40,960
+cores within a single box,
+处理其中一份子样本
+
+348
+00:11:41,220 --> 00:11:42,570
+like within a single desktop computer
+这样 在单台计算机
+
+349
+00:11:43,240 --> 00:11:45,070
+or a single server and use
+或者单个服务器上
+
+350
+00:11:45,370 --> 00:11:47,200
+MapReduce this way to divvy up work load.
+你也可以利用map reduce技术来划分计算任务
+
+351
+00:11:48,000 --> 00:11:49,010
+Each of the cores can then
+每一个核
+
+352
+00:11:49,200 --> 00:11:50,240
+carry out the sum over,
+可以处理
+
+353
+00:11:50,950 --> 00:11:52,000
+say, one quarter of your
+比方说四分之一
+
+354
+00:11:52,050 --> 00:11:53,440
+training set, and then they
+训练样本的求和
+
+355
+00:11:53,510 --> 00:11:55,090
+can take the partial sums and
+然后我们再将
+
+356
+00:11:55,510 --> 00:11:56,890
+combine them, in order
+这些部分和汇总
+
+357
+00:11:57,220 --> 00:11:59,360
+to get the summation over the entire training set.
+最终得到整个训练样本上的求和
+
+358
+00:11:59,750 --> 00:12:01,280
+The advantage of thinking
+相对于多台计算机
+
+359
+00:12:01,600 --> 00:12:02,880
+about MapReduce this way, as
+这样在单台计算机上
+
+360
+00:12:03,350 --> 00:12:04,760
+paralyzing over cause within a
+使用map reduce技术
+
+361
+00:12:04,900 --> 00:12:06,720
+single machine, rather than parallelizing over
+的一个优势
+
+362
+00:12:06,910 --> 00:12:08,480
+multiple machines is that,
+在于
+
+363
+00:12:09,060 --> 00:12:09,970
+this way you don't have to
+现在你不需要
+
+364
+00:12:10,100 --> 00:12:11,740
+worry about network latency, because
+担心网络延时问题
+
+365
+00:12:12,020 --> 00:12:13,380
+all the communication, all the
+因为所有的通讯
+
+366
+00:12:13,460 --> 00:12:14,810
+sending of the [xx]
+所有的来回数据传输
+
+367
+00:12:15,890 --> 00:12:18,020
+back and forth, all that happens within a single machine.
+都发生在一台计算机上
+
+368
+00:12:18,420 --> 00:12:20,170
+And so network latency becomes
+因此 相比于使用数据中心的
+
+369
+00:12:20,590 --> 00:12:21,530
+much less of an issue compared
+多台计算机
+
+370
+00:12:21,960 --> 00:12:23,050
+to if you were using this
+现在网络延时的影响
+
+371
+00:12:23,540 --> 00:12:26,080
+to over different computers within the data sensor.
+小了许多
+
+372
+00:12:27,040 --> 00:12:27,930
+Finally, one last caveat on
+最后 关于在一台多核计算机上的并行运算
+
+373
+00:12:27,990 --> 00:12:30,740
+parallelizing within a multi-core machine.
+我再提醒一点
+
+374
+00:12:31,580 --> 00:12:32,600
+Depending on the details
+这取决于你的编程实现细节
+
+375
+00:12:32,930 --> 00:12:34,290
+of your implementation, if you have a
+如果你有一台
+
+376
+00:12:34,610 --> 00:12:35,920
+multi-core machine and if you
+多核计算机
+
+377
+00:12:36,190 --> 00:12:38,130
+have certain numerical linear algebra libraries.
+并且使用了某个线性代数函数库
+
+378
+00:12:39,350 --> 00:12:40,490
+It turns out that the sum numerical linear algebra libraries
+那么请注意 某些线性代数函数库
+
+379
+00:12:41,490 --> 00:12:43,940
+that can automatically parallelize their
+会自动利用多个核
+
+380
+00:12:44,680 --> 00:12:47,500
+linear algebra operations across multiple cores within the machine.
+并行的完成线性代数运算
+
+381
+00:12:48,770 --> 00:12:50,140
+So if you're fortunate enough to
+因此 如果你幸运的
+
+382
+00:12:50,280 --> 00:12:51,300
+be using one of those numerical linear algebra
+使用了这种
+
+383
+00:12:51,710 --> 00:12:52,980
+libraries and certainly
+线性代数函数库
+
+384
+00:12:53,640 --> 00:12:55,120
+this does not apply to every single library.
+当然 并不是每个函数库都会自动并行
+
+385
+00:12:55,830 --> 00:12:57,800
+If you're using one of those libraries and.
+但如果你用了这样一个函数库
+
+386
+00:12:58,200 --> 00:13:00,680
+If you have a very good vectorizing implementation of the learning algorithm.
+并且你有一个矢量化得很好的算法实现
+
+387
+00:13:01,720 --> 00:13:02,710
+Sometimes you can just implement
+那么 有时你只需要
+
+388
+00:13:03,160 --> 00:13:05,060
+you standard learning algorithm in
+按照标准的矢量化方式
+
+389
+00:13:05,150 --> 00:13:06,460
+a vectorized fashion and not
+实现机器学习算法
+
+390
+00:13:06,710 --> 00:13:08,630
+worry about parallelization and numerical linear algebra libararies
+而不用管多核并行的问题
+
+391
+00:13:10,030 --> 00:13:12,480
+could take care of some of it for you.
+因为你的线性代数函数库会自动帮助你完成多核并行的工作
+
+392
+00:13:12,620 --> 00:13:14,710
+So you don't need to implement [xx] but.
+因此 这时你不需要使用map reduce技术
+
+393
+00:13:14,860 --> 00:13:16,570
+for other any problems, taking advantage
+但是 对于其他的问题
+
+394
+00:13:17,180 --> 00:13:18,660
+of this sort of map reducing commentation,
+使用基于map reduce的实现
+
+395
+00:13:19,240 --> 00:13:20,690
+finding and using this
+寻找并使用
+
+396
+00:13:20,880 --> 00:13:22,070
+MapReduce formulation and to
+适合map reduce的问题表述
+
+397
+00:13:22,170 --> 00:13:23,410
+paralelize a cross coarse except
+然后实现一个
+
+398
+00:13:23,890 --> 00:13:24,970
+yourself might be a
+多核并行的算法
+
+399
+00:13:25,070 --> 00:13:27,310
+good idea as well and could let you speed up your learning algorithm.
+可能是个好主意 它将会加速你的机器学习算法
+
+400
+00:13:29,860 --> 00:13:31,390
+In this video, we talked about
+在这个视频中
+
+401
+00:13:31,730 --> 00:13:33,650
+the MapReduce approach to parallelizing
+我们介绍了map reduce技术
+
+402
+00:13:34,460 --> 00:13:35,850
+machine learning by taking a
+它可以通过
+
+403
+00:13:36,070 --> 00:13:37,450
+data and spreading them across
+将数据分配到多台计算机的方式
+
+404
+00:13:37,830 --> 00:13:39,660
+many computers in the data center.
+来并行化机器学习算法
+
+405
+00:13:40,160 --> 00:13:41,930
+Although these ideas are
+实际上这种方法
+
+406
+00:13:42,290 --> 00:13:43,970
+critical to paralysing across multiple
+也可以利用
+
+407
+00:13:44,290 --> 00:13:45,400
+cores within a single computer
+单台计算机的多个核
+
+408
+00:13:46,870 --> 00:13:47,150
+as well.
+来实现并行
+
+409
+00:13:47,650 --> 00:13:48,600
+Today there are some good
+今天 网上有许多优秀的
+
+410
+00:13:49,260 --> 00:13:51,080
+open source implementations of MapReduce,
+开源map reduce实现
+
+411
+00:13:51,440 --> 00:13:52,210
+so there are many users
+实际上 一个称为Hadoop
+
+412
+00:13:52,710 --> 00:13:54,480
+in open source system called
+的开源系统
+
+413
+00:13:54,890 --> 00:13:55,820
+Hadoop and using either your
+已经拥有了众多的用户
+
+414
+00:13:56,010 --> 00:13:57,580
+own implementation or using someone
+通过自己实现map reduce算法
+
+415
+00:13:57,850 --> 00:13:59,770
+else's open source implementation, you
+或者使用别人的开源实现
+
+416
+00:13:59,920 --> 00:14:01,090
+can use these ideas to
+你就可以利用map reduce技术
+
+417
+00:14:01,410 --> 00:14:02,730
+parallelize learning algorithms and
+来并行化机器学习算法
+
+418
+00:14:03,540 --> 00:14:04,580
+get them to run on much
+这样你的算法
+
+419
+00:14:04,950 --> 00:14:05,980
+larger data sets than is
+将能够处理
+
+420
+00:14:06,320 --> 00:14:07,770
+possible using just a single machine.
+单台计算机处理不了的大数据
+
diff --git a/srt/18 - 1 - Problem Description and Pipeline (7 min).srt b/srt/18 - 1 - Problem Description and Pipeline (7 min).srt
new file mode 100644
index 00000000..55867088
--- /dev/null
+++ b/srt/18 - 1 - Problem Description and Pipeline (7 min).srt
@@ -0,0 +1,1071 @@
+1
+00:00:00,090 --> 00:00:00,950
+In this and the next few
+在这一段和下一段视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,070 --> 00:00:02,010
+videos, I want to tell
+我想向你介绍一种
+
+3
+00:00:02,160 --> 00:00:03,410
+you about a machine learning application
+机器学习的应用实例
+
+4
+00:00:04,020 --> 00:00:04,980
+example, or a machine
+或者说是
+
+5
+00:00:05,160 --> 00:00:07,670
+learning application history centered
+机器学习在一种叫
+
+6
+00:00:08,030 --> 00:00:09,630
+around an application called Photo OCR .
+照片OCR技术中的应用历史
+
+7
+00:00:10,520 --> 00:00:11,730
+There are three reasons
+我想介绍这部分内容的原因
+
+8
+00:00:12,170 --> 00:00:13,220
+why I want to do this,
+主要有以下三个
+
+9
+00:00:13,480 --> 00:00:14,350
+first I wanted to show you an
+第一 我想向你展示
+
+10
+00:00:14,770 --> 00:00:15,700
+example of how a complex
+一个复杂的机器学习系统
+
+11
+00:00:16,290 --> 00:00:18,000
+machine learning system can be put together.
+是如何被组合起来的
+
+12
+00:00:19,350 --> 00:00:20,960
+Second, once told the concepts of
+第二 我想介绍一下
+
+13
+00:00:21,170 --> 00:00:22,280
+a machine learning a type line
+机器学习流水线(machine learning pipeline)的有关概念
+
+14
+00:00:22,970 --> 00:00:24,740
+and how to allocate resources when
+以及如何分配资源
+
+15
+00:00:24,860 --> 00:00:26,550
+you're trying to decide what to do next.
+来对下一步计划作出决定
+
+16
+00:00:26,780 --> 00:00:27,700
+And this can either be in
+这既包括你需要自己
+
+17
+00:00:27,730 --> 00:00:28,950
+the context of you working
+开发一个很复杂的
+
+18
+00:00:29,380 --> 00:00:30,220
+by yourself on the big
+机器学习应用的情况
+
+19
+00:00:30,500 --> 00:00:31,690
+application Or it can
+也可能适用于
+
+20
+00:00:31,770 --> 00:00:32,980
+be the context of a team
+一个开发团队
+
+21
+00:00:33,100 --> 00:00:34,190
+of developers trying to build
+希望共同建立一个复杂的
+
+22
+00:00:34,440 --> 00:00:35,930
+a complex application together.
+机器学习应用的情况
+
+23
+00:00:37,030 --> 00:00:38,670
+And then finally, the Photo
+最后
+
+24
+00:00:39,130 --> 00:00:40,690
+OCR problem also gives
+我也想通过介绍照片OCR问题
+
+25
+00:00:40,880 --> 00:00:41,810
+me an excuse to tell you
+的机会来告诉你
+
+26
+00:00:41,880 --> 00:00:42,850
+about just a couple more interesting
+机器学习的诸多
+
+27
+00:00:43,260 --> 00:00:44,370
+ideas for machine learning.
+有意思的想法和理念
+
+28
+00:00:45,120 --> 00:00:47,300
+One is some ideas of
+其中之一是如何将机器学习
+
+29
+00:00:47,400 --> 00:00:48,250
+how to apply machine learning to
+应用到计算机视觉问题中
+
+30
+00:00:48,600 --> 00:00:50,210
+computer vision problems, and second
+第二是有关
+
+31
+00:00:50,340 --> 00:00:51,890
+is the idea of artificial data
+人工数据合成(artificial data synthesis)的概念
+
+32
+00:00:52,220 --> 00:00:53,880
+synthesis, which we'll see in a couple of videos.
+我们将在接下来的几段视频中进行介绍
+
+33
+00:00:54,820 --> 00:00:57,680
+So, let's start by talking about what is the Photo OCR problem.
+好的 那么我们就从介绍什么是照片OCR问题开始
+
+34
+00:01:00,130 --> 00:01:01,710
+Photo OCR stands for
+照片OCR表示
+
+35
+00:01:02,050 --> 00:01:03,760
+Photo Optical Character Recognition.
+照片光学字符识别(photo optical character recognition)
+
+36
+00:01:05,180 --> 00:01:06,460
+With the growth of digital photography
+随着数码摄影的日益流行
+
+37
+00:01:07,300 --> 00:01:08,740
+and more recently the growth of
+已经近年来手机中
+
+38
+00:01:09,080 --> 00:01:10,360
+camera in our cell phones
+拍照功能的逐渐成熟
+
+39
+00:01:11,140 --> 00:01:12,140
+we now have tons of visual
+我们现在很容易就会有
+
+40
+00:01:12,500 --> 00:01:13,790
+pictures that we take all over the place.
+一大堆从各地拍摄的数码照片
+
+41
+00:01:14,620 --> 00:01:15,700
+And one of the things that
+吸引众多开发人员的
+
+42
+00:01:16,150 --> 00:01:17,850
+has interested many developers is
+其中一个应用是
+
+43
+00:01:18,080 --> 00:01:19,680
+how to get our computers to
+如何让计算机更好地理解
+
+44
+00:01:19,990 --> 00:01:22,300
+understand the content of these pictures a little bit better.
+这些照片的背景环境
+
+45
+00:01:23,140 --> 00:01:24,690
+The photo OCR problem focuses
+这种照片OCR技术
+
+46
+00:01:25,300 --> 00:01:26,790
+on how to get computers to
+主要解决的问题是让计算机
+
+47
+00:01:26,980 --> 00:01:29,390
+read the text to the purest in images that we take.
+从我们拍摄的照片中读出文字的信息
+
+48
+00:01:30,730 --> 00:01:31,990
+Given an image like this it
+用这样一张照片举例说明
+
+49
+00:01:32,070 --> 00:01:32,850
+might be nice if a computer
+如果计算机能够读出
+
+50
+00:01:33,530 --> 00:01:34,480
+can read the text in this
+照片中的文字就太好了
+
+51
+00:01:34,670 --> 00:01:35,540
+image so that if you're
+这样一来如果你
+
+52
+00:01:35,650 --> 00:01:37,040
+trying to look for this
+下次想再把这张照片找出来时
+
+53
+00:01:37,220 --> 00:01:38,530
+picture again you type in
+你就输入照片中的文字
+
+54
+00:01:38,850 --> 00:01:40,220
+the words, lulu bees and
+LULA B'S
+
+55
+00:01:41,000 --> 00:01:42,910
+and have it automatically pull
+然后计算机就自动地
+
+56
+00:01:43,130 --> 00:01:44,190
+up this picture, so that
+找出这张照片来
+
+57
+00:01:44,360 --> 00:01:45,890
+you're not spending lots of
+这样你就不用花几个小时
+
+58
+00:01:45,980 --> 00:01:47,130
+time digging through your photo
+把你的相片集翻个底朝天
+
+59
+00:01:47,670 --> 00:01:49,230
+collection Maybe hundreds of
+从几百上千张照片中
+
+60
+00:01:49,490 --> 00:01:50,730
+thousands of pictures in.
+把这张找出来
+
+61
+00:01:50,870 --> 00:01:53,100
+The Photo OCR problem
+照片OCR就是解决这一问题的
+
+62
+00:01:53,450 --> 00:01:56,080
+does exactly this, and it does so in several steps.
+它有如下几个步骤
+
+63
+00:01:56,870 --> 00:01:57,790
+First, given the picture it
+首先 给定某张图片
+
+64
+00:01:58,060 --> 00:01:58,800
+has to look through the image
+它将把图像浏览一遍
+
+65
+00:01:59,480 --> 00:02:01,680
+and detect where there is text in the picture.
+然后找出这张图片中哪里有文字信息
+
+66
+00:02:03,020 --> 00:02:03,960
+And after it has done
+在完成这一步以后
+
+67
+00:02:04,160 --> 00:02:05,340
+that or if it successfully does
+成功找出有文字的位置以后
+
+68
+00:02:05,570 --> 00:02:06,750
+that it then has to
+接下来要做的就是
+
+69
+00:02:06,980 --> 00:02:09,020
+look at these text regions and
+重点关注这些文字区域
+
+70
+00:02:09,170 --> 00:02:10,530
+actually read the text in
+并且在这些区域中
+
+71
+00:02:10,670 --> 00:02:12,150
+those regions, and hopefully if
+对文字内容进行识别
+
+72
+00:02:12,250 --> 00:02:13,670
+it reads it correctly, it'll come
+如果能正确读出的话
+
+73
+00:02:15,040 --> 00:02:16,440
+up with these transcriptions of
+它会将这些内容进行转录
+
+74
+00:02:16,800 --> 00:02:18,710
+what is the text that appears in the image.
+记录下图片中出现的这些文本
+
+75
+00:02:19,480 --> 00:02:21,160
+Whereas OCR, or optical
+虽然现在OCR
+
+76
+00:02:21,440 --> 00:02:22,850
+character recognition of scanned
+或者说光学文字识别
+
+77
+00:02:23,600 --> 00:02:25,760
+documents is relatively easier
+对扫描的文档来说已经是一个
+
+78
+00:02:26,180 --> 00:02:27,840
+problem, doing OCR from
+比较简单的问题了
+
+79
+00:02:27,980 --> 00:02:29,480
+photographs today is still a
+但对于数码照片来说
+
+80
+00:02:29,750 --> 00:02:30,970
+very difficult machine learning problem,
+现在还是一个比较困难的机器学习问题
+
+81
+00:02:31,640 --> 00:02:32,730
+and you can do this.
+研究这个目的
+
+82
+00:02:33,000 --> 00:02:34,320
+Not only can this help
+不仅仅是因为
+
+83
+00:02:34,750 --> 00:02:36,390
+our computers to understand the
+这可以让计算机
+
+84
+00:02:36,450 --> 00:02:38,380
+content of our though
+通过拍摄的数码相片
+
+85
+00:02:38,500 --> 00:02:40,030
+images better, there are
+更好地理解我们所处的环境
+
+86
+00:02:40,240 --> 00:02:42,240
+also applications like helping blind
+更重要的是 它衍生了很多应用
+
+87
+00:02:42,530 --> 00:02:43,900
+people, for example, if you
+比如在帮助盲人方面
+
+88
+00:02:44,000 --> 00:02:45,010
+could provide to a blind person
+假如你能为盲人提供一种照相机
+
+89
+00:02:45,780 --> 00:02:47,210
+a camera that can look
+这种相机可以“看见”
+
+90
+00:02:47,460 --> 00:02:48,430
+at what's in front of
+他们前面有什么东西
+
+91
+00:02:48,530 --> 00:02:49,700
+them, and just tell them the
+可以告诉他们
+
+92
+00:02:49,910 --> 00:02:52,990
+words that my be on
+面前的路牌上
+
+93
+00:02:53,460 --> 00:02:55,830
+the street sign in front of them.
+写的是什么字
+
+94
+00:02:56,540 --> 00:02:57,780
+With car navigation systems.
+现在也有研究人员将照片OCR技术应用到汽车导航系统中
+
+95
+00:02:58,310 --> 00:02:59,750
+For example, imagine if your
+想象一下你的车
+
+96
+00:02:59,920 --> 00:03:00,900
+car could read the street
+能读出街道的标识
+
+97
+00:03:01,250 --> 00:03:03,480
+signs and help you navigate to your destination.
+并且将你导航至目的地
+
+98
+00:03:04,610 --> 00:03:07,260
+In order to perform photo OCR, here's what we can do.
+要实现照片OCR 我们可以这样做
+
+99
+00:03:07,970 --> 00:03:08,840
+First we can go through the
+首先我们可以通览图像
+
+100
+00:03:09,080 --> 00:03:11,490
+image and find the regions where there's text and image.
+并找出有文字的图像区域
+
+101
+00:03:11,850 --> 00:03:13,380
+So, shown here is
+这里展示的例子就是
+
+102
+00:03:13,580 --> 00:03:15,430
+one example of text and
+照片OCR系统可能会识别到的
+
+103
+00:03:15,730 --> 00:03:17,700
+image that the photo OCR system may find.
+图像中的文字信息
+
+104
+00:03:19,980 --> 00:03:21,850
+Second, given the rectangle around
+第二 通过得到的文字区域的
+
+105
+00:03:22,210 --> 00:03:23,390
+that text region, we can
+矩形轮廓
+
+106
+00:03:23,700 --> 00:03:25,930
+then do character segmentation, where
+我们可以进行字符切分
+
+107
+00:03:26,170 --> 00:03:28,210
+we might take this text box
+比如对这个文字框
+
+108
+00:03:28,490 --> 00:03:30,310
+that says "Antique Mall" and
+我们或许能认出“ANTIQUE MALL”
+
+109
+00:03:30,530 --> 00:03:31,760
+try to segment it out
+然后我们会试着将其
+
+110
+00:03:32,090 --> 00:03:34,150
+into the locations of the individual characters.
+分割成独立的字符
+
+111
+00:03:35,450 --> 00:03:37,280
+And finally, having segmented out
+最后 在成功将字段分割为
+
+112
+00:03:37,450 --> 00:03:39,050
+into individual characters, we can
+独立的字符后
+
+113
+00:03:39,320 --> 00:03:41,040
+then run a crossfire, which
+我们可以运行一个分类器
+
+114
+00:03:41,290 --> 00:03:42,950
+looks at the images of the
+输入这些可识别的字符
+
+115
+00:03:43,090 --> 00:03:44,620
+visual characters, and tries to
+然后试着识别出
+
+116
+00:03:44,760 --> 00:03:45,990
+figure out the first character's an
+第一个字母是一个A
+
+117
+00:03:46,150 --> 00:03:47,070
+A, the second character's an
+第二个字母是一个N
+
+118
+00:03:47,230 --> 00:03:48,010
+N, the third character is
+第三个字母是一个T
+
+119
+00:03:48,480 --> 00:03:49,930
+a T, and so on,
+等等
+
+120
+00:03:50,110 --> 00:03:51,130
+so that up by doing all
+因此通过完成所有这些工作
+
+121
+00:03:51,190 --> 00:03:52,350
+this how that hopefully you can then
+按理说你就能识别出
+
+122
+00:03:52,530 --> 00:03:53,610
+figure out that this phrase
+这个字段写的是
+
+123
+00:03:54,180 --> 00:03:55,670
+is Rulegee's antique mall
+LULAB'S ANTIQUE MALL
+
+124
+00:03:56,340 --> 00:03:57,760
+and similarly for some of
+然后图片中其他有文字的地方
+
+125
+00:03:57,930 --> 00:04:01,690
+the other words that appear in that image.
+也是类似的方法进行处理
+
+126
+00:04:01,980 --> 00:04:02,390
+I should say that there are some photo OCR systems
+实际上有很多照片OCR系统
+
+127
+00:04:02,910 --> 00:04:04,350
+that do even more complex things,
+会进行更为复杂的处理
+
+128
+00:04:04,680 --> 00:04:06,370
+like a bit of spelling correction at the end.
+比如在最后会进行拼写校正
+
+129
+00:04:06,640 --> 00:04:08,470
+So if, for example, your
+所以比如说
+
+130
+00:04:08,710 --> 00:04:10,730
+character segmentation and character
+假如你的字符分割
+
+131
+00:04:11,110 --> 00:04:12,450
+classification system tells you
+和分类系统告诉你
+
+132
+00:04:12,690 --> 00:04:14,390
+that it sees the
+它识别到的字是
+
+133
+00:04:14,530 --> 00:04:16,050
+word c 1 e a
+“C 1 E A N I N G”
+
+134
+00:04:16,260 --> 00:04:17,930
+n i n g. Then,
+那么
+
+135
+00:04:18,350 --> 00:04:19,570
+you know, a sort of spelling
+很多拼写修正系统
+
+136
+00:04:19,760 --> 00:04:21,910
+correction system might tell
+会告诉你
+
+137
+00:04:22,240 --> 00:04:23,270
+you that this is probably the
+这可能是单词
+
+138
+00:04:23,350 --> 00:04:24,880
+word 'cleaning', and your
+“cleaning”的拼写
+
+139
+00:04:25,340 --> 00:04:27,160
+character classification algorithm had
+你的字符分类算法
+
+140
+00:04:27,310 --> 00:04:29,650
+just mistaken the l for a 1.
+刚才把字母 l 识别成了数字 1
+
+141
+00:04:30,370 --> 00:04:31,320
+But for the purpose of what
+但在这节课中
+
+142
+00:04:31,640 --> 00:04:32,510
+we want to do in
+我们要做的事情
+
+143
+00:04:32,620 --> 00:04:33,980
+this video, let's ignore this last
+会不考虑最后这一步
+
+144
+00:04:34,620 --> 00:04:35,780
+step and just focus on the
+只关注前面三个步骤
+
+145
+00:04:36,110 --> 00:04:37,490
+system that does these three
+也就是文字检测
+
+146
+00:04:37,700 --> 00:04:39,340
+steps of text detection, character
+字符分割
+
+147
+00:04:39,660 --> 00:04:41,040
+segmentation, and character classification.
+以及字符分类
+
+148
+00:04:42,410 --> 00:04:43,790
+A system like this is
+那么像这样的一个系统
+
+149
+00:04:44,080 --> 00:04:46,010
+what we call a machine learning pipeline.
+我们把它称之为机器学习流水线(machine learning pipeline)
+
+150
+00:04:47,550 --> 00:04:49,220
+In particular, here's a picture
+具体来说 这幅图表示的
+
+151
+00:04:49,950 --> 00:04:52,220
+showing the photo OCR pipeline.
+就是照片OCR的流水线
+
+152
+00:04:53,140 --> 00:04:54,200
+We have an image, which then
+我们有一幅图像
+
+153
+00:04:54,470 --> 00:04:57,590
+fed to the text detection system
+然后传给文字检测系统
+
+154
+00:04:57,970 --> 00:04:58,960
+text regions, we then segment
+识别出文字以后
+
+155
+00:04:59,420 --> 00:05:01,350
+out the characters--the individual characters in
+我们将字段分割为独立的字符
+
+156
+00:05:01,420 --> 00:05:04,360
+the text--and then finally we recognize the individual characters.
+最后我们队单个的字母进行识别
+
+157
+00:05:05,730 --> 00:05:07,190
+In many complex machine learning
+在很多复杂的机器学习系统中
+
+158
+00:05:07,800 --> 00:05:09,050
+systems, these sorts of
+这种流水线形式
+
+159
+00:05:09,490 --> 00:05:11,400
+pipelines are common, where you
+都非常普遍
+
+160
+00:05:11,660 --> 00:05:13,450
+can have multiple modules--in this
+在流水线中会有多个不同的模块
+
+161
+00:05:13,680 --> 00:05:14,960
+example, the text detection, character
+比如在本例中
+
+162
+00:05:15,390 --> 00:05:17,820
+segmentation, character recognition modules--each of
+我们有文字检测 字符分割和字母识别
+
+163
+00:05:17,930 --> 00:05:19,170
+which may be machine learning component,
+其中每个模块都可能是一个机器学习组件
+
+164
+00:05:19,880 --> 00:05:20,740
+or sometimes it may not
+或者有时候
+
+165
+00:05:20,980 --> 00:05:22,660
+be a machine learning component but
+这些模块也不一定是机器学习组件
+
+166
+00:05:22,810 --> 00:05:23,660
+to have a set of modules
+只是一个接一个
+
+167
+00:05:24,290 --> 00:05:26,050
+that act one after another on
+连在一起的
+
+168
+00:05:26,280 --> 00:05:27,780
+some piece of data in order
+一系列数据
+
+169
+00:05:28,100 --> 00:05:29,170
+to produce the output you want,
+最终得出你希望的结果
+
+170
+00:05:29,640 --> 00:05:30,930
+which in the photo OCR example
+在照片OCR例子中
+
+171
+00:05:31,580 --> 00:05:32,690
+is to find the
+就是最终识别到的
+
+172
+00:05:32,800 --> 00:05:35,050
+transcription of the text that appeared in the image.
+图片中的文字信息
+
+173
+00:05:35,730 --> 00:05:37,370
+If you're designing a machine learning
+如果你要设计一个机器学习系统
+
+174
+00:05:37,710 --> 00:05:39,090
+system one of the
+其中你需要作出的
+
+175
+00:05:39,200 --> 00:05:41,010
+most important decisions will often
+最重要的决定
+
+176
+00:05:41,330 --> 00:05:44,350
+be what exactly is the pipeline that you want to put together.
+就是你要怎样组织好这个流水线
+
+177
+00:05:44,970 --> 00:05:46,010
+In other words, given the photo
+换句话说
+
+178
+00:05:46,530 --> 00:05:47,930
+OCR problem, how do you
+在这个照片OCR问题中
+
+179
+00:05:47,990 --> 00:05:49,390
+break this problem down into
+你应该如何将这个问题分成
+
+180
+00:05:49,770 --> 00:05:51,220
+a sequence of different modules.
+一系列不同的模块
+
+181
+00:05:51,690 --> 00:05:53,060
+And you design the pipeline
+你需要设计这个流程
+
+182
+00:05:53,820 --> 00:05:56,060
+and each the performance of each of the modules in your pipeline.
+以及你的流水线中的每一个模块
+
+183
+00:05:56,660 --> 00:05:57,610
+will often have a big
+这通常会影响到你
+
+184
+00:05:57,710 --> 00:05:59,880
+impact on the final performance of your algorithm.
+最终的算法的表现
+
+185
+00:06:01,480 --> 00:06:02,330
+If you have a team of
+如果你有一个工程师的团队
+
+186
+00:06:02,550 --> 00:06:03,610
+engineers working on a
+在完成同样类似的任务
+
+187
+00:06:03,800 --> 00:06:05,150
+problem like this is also very
+那么通常你可以让
+
+188
+00:06:05,460 --> 00:06:06,900
+common to have different
+不同的人来完成
+
+189
+00:06:07,340 --> 00:06:08,720
+individuals work on different modules.
+不同的模块
+
+190
+00:06:09,500 --> 00:06:11,480
+So I could easily imagine tech
+所以我可以假设
+
+191
+00:06:12,140 --> 00:06:13,240
+easily being the of anywhere
+文字检测这个模块
+
+192
+00:06:13,670 --> 00:06:14,610
+from 1 to 5 engineers,
+需要大概1到5个人
+
+193
+00:06:15,460 --> 00:06:16,970
+character segmentation maybe another
+字符分割部分
+
+194
+00:06:17,470 --> 00:06:19,010
+1-5 engineers, and character
+需要另外1到5个人
+
+195
+00:06:19,220 --> 00:06:20,550
+recognition being another 1-5
+字母识别部分
+
+196
+00:06:21,670 --> 00:06:23,100
+engineers, and so having a
+还需要另外1到5个人
+
+197
+00:06:23,340 --> 00:06:24,850
+pipeline like often offers a
+因此 使用流水线的方式
+
+198
+00:06:25,280 --> 00:06:26,720
+natural way to divide up
+通常提供了一个很好的办法
+
+199
+00:06:27,110 --> 00:06:30,370
+the workload amongst different members of an engineering team, as well.
+来将整个工作分给不同的组员去完成
+
+200
+00:06:31,040 --> 00:06:31,970
+Although, or course, all of
+当然 所有这些工作
+
+201
+00:06:32,090 --> 00:06:33,210
+this work could also be done
+都可以由一个人来完成
+
+202
+00:06:33,450 --> 00:06:35,910
+by just one person if that's how you want to do it.
+如果一个人也能胜任的话
+
+203
+00:06:39,090 --> 00:06:40,370
+In complex machine learning systems
+在复杂的机器学习系统中
+
+204
+00:06:41,340 --> 00:06:42,700
+the idea of a pipeline, of
+流水线的概念
+
+205
+00:06:42,870 --> 00:06:44,770
+a machine of a pipeline, is pretty pervasive.
+已经渗透到各种应用中
+
+206
+00:06:45,820 --> 00:06:47,070
+And what you just saw is
+你刚才看到的
+
+207
+00:06:47,400 --> 00:06:49,180
+a specific example of how
+只是一种照片OCR
+
+208
+00:06:49,440 --> 00:06:51,280
+a Photo OCR pipeline might work.
+流水线的运作过程
+
+209
+00:06:52,230 --> 00:06:53,590
+In the next few videos I'll
+在接下来的几段视频中
+
+210
+00:06:53,740 --> 00:06:54,590
+tell you a little bit more
+我还将继续向你介绍
+
+211
+00:06:54,650 --> 00:06:55,780
+about this pipeline, and we'll continue
+更多的一些关于流水线的内容
+
+212
+00:06:56,290 --> 00:06:57,170
+to use this as an example
+我们还将使用这个例子
+
+213
+00:06:58,120 --> 00:06:59,860
+to illustrate--I think--a few more
+来展示机器学习中其他一些
+
+214
+00:07:00,280 --> 00:07:01,400
+key concepts of machine learning.
+非常重要的概念
+
diff --git a/srt/18 - 2 - Sliding Windows (15 min).srt b/srt/18 - 2 - Sliding Windows (15 min).srt
new file mode 100644
index 00000000..2ba120f5
--- /dev/null
+++ b/srt/18 - 2 - Sliding Windows (15 min).srt
@@ -0,0 +1,2290 @@
+1
+00:00:00,370 --> 00:00:01,590
+In the previous video, we talked
+在上一个视频中,我们讨论了(字幕翻译:中国海洋大学,王玺)
+
+2
+00:00:01,890 --> 00:00:04,570
+about the photo OCR pipeline and how that worked.
+OCR管道及其工作原理
+
+3
+00:00:05,480 --> 00:00:06,370
+In which we would take an image
+在OCR管道中我们可以取一张图
+
+4
+00:00:07,050 --> 00:00:08,070
+and pass the image Through a
+(pass the image,不懂)通过
+
+5
+00:00:08,130 --> 00:00:10,010
+sequence of machine learning
+一系列的机器学习
+
+6
+00:00:10,280 --> 00:00:11,680
+components in order to
+组件来
+
+7
+00:00:11,890 --> 00:00:13,820
+try to read the text that appears in an image.
+尝试读取图片上的文字
+
+8
+00:00:14,590 --> 00:00:15,820
+In this video I like to tell
+今天的视频里,我打算讲
+
+9
+00:00:16,210 --> 00:00:17,360
+A little bit more about how the
+多一点关于流水线
+
+10
+00:00:17,780 --> 00:00:20,310
+individual components of the pipeline works.
+的每个组件的工作原理
+
+11
+00:00:21,270 --> 00:00:24,070
+In particular most of this video will center around the discussion.
+特别的,本视频将会把重点放在讨论
+
+12
+00:00:24,680 --> 00:00:25,950
+of whats called a sliding windows.
+滑动窗口上
+
+13
+00:00:26,750 --> 00:00:31,570
+The first stage
+滤波的
+
+14
+00:00:32,000 --> 00:00:33,390
+of the filter was the
+第一步是
+
+15
+00:00:33,730 --> 00:00:35,090
+Text detection where we look
+确定文字位置,例如我们现在
+
+16
+00:00:35,330 --> 00:00:36,640
+at an image like this and try
+看到的这一幅图片,尝试去
+
+17
+00:00:37,020 --> 00:00:39,320
+to find the regions of text that appear in this image.
+找到图片中文字出现的区域
+
+18
+00:00:39,850 --> 00:00:42,490
+Text detection is an unusual problem in computer vision.
+文字识别对于计算机来说,是一个不寻常的问题
+
+19
+00:00:43,220 --> 00:00:44,820
+Because depending on the length
+因为根据你要尝试
+
+20
+00:00:45,140 --> 00:00:46,150
+of the text you're trying to
+找到的文字的长度
+
+21
+00:00:46,290 --> 00:00:47,870
+find, these rectangles that you're
+这些你要寻找的矩形
+
+22
+00:00:47,970 --> 00:00:49,600
+trying to find can have different aspect ratios
+具有不同的长宽比例
+
+23
+00:00:51,100 --> 00:00:52,060
+So in order to talk
+所以为了讲述如何
+
+24
+00:00:52,220 --> 00:00:53,550
+about detecting things in images
+在图片中发现事物
+
+25
+00:00:54,300 --> 00:00:55,860
+let's start with a simpler example
+我们首先从一个简单点的例子开始
+
+26
+00:00:56,550 --> 00:01:00,080
+of pedestrian detection and we'll then later go back to apply
+即行人检测,然后我们讲如何将
+
+27
+00:01:00,460 --> 00:01:02,300
+Ideas that were developed
+行人检测中的思路用到
+
+28
+00:01:02,570 --> 00:01:04,840
+in pedestrian detection and apply them to text detection.
+文字识别中去
+
+29
+00:01:06,280 --> 00:01:08,010
+So in pedestrian detection you want
+在行人检测中
+
+30
+00:01:08,360 --> 00:01:09,440
+to take an image that looks
+你取一张类似这样的图片
+
+31
+00:01:09,600 --> 00:01:11,010
+like this and find the
+目的就是寻找
+
+32
+00:01:11,160 --> 00:01:12,920
+individual pedestrians that appear in the image.
+图片中的行人
+
+33
+00:01:13,260 --> 00:01:14,440
+So there's one pedestrian that we
+我们找到一个人,
+
+34
+00:01:14,520 --> 00:01:15,550
+found, there's a second
+两个,
+
+35
+00:01:15,780 --> 00:01:17,920
+one, a third one a fourth one, a fifth one.
+三个,四个,五个
+
+36
+00:01:18,290 --> 00:01:19,390
+And a sixth one.
+六个
+
+37
+00:01:19,560 --> 00:01:20,990
+This problem is maybe slightly
+这个问题与文字识别相比,
+
+38
+00:01:21,320 --> 00:01:22,770
+simpler than text detection just
+简单的地方在于:
+
+39
+00:01:23,100 --> 00:01:24,200
+for the reason that the aspect
+你要识别的东西
+
+40
+00:01:24,560 --> 00:01:27,490
+ratio of most pedestrians are pretty similar.
+具有相似的长宽比
+
+41
+00:01:28,170 --> 00:01:29,280
+Just using a fixed aspect
+仅仅使用一个固定
+
+42
+00:01:29,630 --> 00:01:31,960
+ratio for these rectangles that we're trying to find.
+的长宽比就基本可以了
+
+43
+00:01:32,420 --> 00:01:33,610
+So by aspect ratio I mean
+aspect ratio意思是
+
+44
+00:01:33,920 --> 00:01:36,420
+the ratio between the height and the width of these rectangles.
+矩形的高度和宽度之比
+
+45
+00:01:37,820 --> 00:01:38,190
+They're all the same.
+它们都是一样的
+
+46
+00:01:38,650 --> 00:01:40,120
+for different pedestrians but for
+对于不同的行人来说,但是
+
+47
+00:01:40,490 --> 00:01:42,650
+text detection the height
+对于文字来说
+
+48
+00:01:43,030 --> 00:01:44,560
+and width ratio is different
+不同行的文字
+
+49
+00:01:44,960 --> 00:01:45,830
+for different lines of text
+具有不同的比例
+
+50
+00:01:46,460 --> 00:01:47,940
+Although for pedestrian detection, the
+对于行人检测,尽管行人距离
+
+51
+00:01:48,020 --> 00:01:49,250
+pedestrians can be different distances
+摄像头的距离可能
+
+52
+00:01:49,810 --> 00:01:51,250
+away from the camera and
+不同,因此
+
+53
+00:01:51,390 --> 00:01:52,730
+so the height of these rectangles
+矩形的高度
+
+54
+00:01:53,380 --> 00:01:55,600
+can be different depending on how far away they are.
+不一致
+
+55
+00:01:55,990 --> 00:01:57,090
+but the aspect ratio is the same.
+但比例还是维持不变的
+
+56
+00:01:57,720 --> 00:01:58,880
+In order to build a pedestrian
+为了建立一个行人检测系统
+
+57
+00:01:59,440 --> 00:02:02,460
+detection system here's how you can go about it.
+你需要这么做
+
+58
+00:02:02,520 --> 00:02:03,650
+Let's say that we decide to
+例如我们决定要
+
+59
+00:02:03,970 --> 00:02:06,100
+standardize on this aspect
+使用82*36的比例
+
+60
+00:02:06,690 --> 00:02:08,010
+ratio of 82 by 36
+来进行标准化
+
+61
+00:02:08,180 --> 00:02:10,040
+and we could
+当然我们也可以
+
+62
+00:02:10,330 --> 00:02:11,510
+have chosen some rounded number
+选择一些近似的数字
+
+63
+00:02:12,020 --> 00:02:14,000
+like 80 by 40 or something, but 82 by 36 seems alright.
+比如80*40,但82*36看上去是可行的
+
+64
+00:02:16,110 --> 00:02:17,280
+What we would do is then go
+我们将要做的是
+
+65
+00:02:17,650 --> 00:02:20,420
+out and collect large training sets of positive and negative examples.
+出去搜集一些正例和反例
+
+66
+00:02:21,240 --> 00:02:22,790
+Here are examples of 82
+这里有一些
+
+67
+00:02:22,900 --> 00:02:24,230
+X 36 image patches that do
+符合比例的图片
+
+68
+00:02:24,360 --> 00:02:26,230
+contain pedestrians and here are
+以及一些不符合比例
+
+69
+00:02:26,550 --> 00:02:28,360
+examples of images that do not.
+的图片
+
+70
+00:02:29,470 --> 00:02:30,710
+On this slide I show 12
+在这个幻灯片里我展示了12个
+
+71
+00:02:31,050 --> 00:02:33,170
+positive examples of y=1
+正例,用y=1表示
+
+72
+00:02:33,730 --> 00:02:34,990
+and 12 examples of y=0.
+12个反例用y=0表示
+
+73
+00:02:36,410 --> 00:02:37,790
+In a more typical pedestrian detection
+在一个更典型的行人检测应用中,
+
+74
+00:02:38,180 --> 00:02:39,200
+application, we may have
+我们可以会有
+
+75
+00:02:39,500 --> 00:02:40,880
+anywhere from a 1,000 training
+从1000
+
+76
+00:02:41,230 --> 00:02:42,210
+examples up to maybe
+到10000
+
+77
+00:02:42,300 --> 00:02:44,410
+10,000 training examples, or
+个数目的例子,或者
+
+78
+00:02:44,460 --> 00:02:45,360
+even more if you can
+更多,如果你能够
+
+79
+00:02:45,510 --> 00:02:47,180
+get even larger training sets.
+获取到更大的训练集合
+
+80
+00:02:47,460 --> 00:02:48,590
+And what you can do, is then train
+然后,你可以
+
+81
+00:02:48,910 --> 00:02:50,160
+in your network or some
+在你的网络中训练,或者使用
+
+82
+00:02:50,510 --> 00:02:52,420
+other learning algorithm to
+其他学习算法
+
+83
+00:02:52,610 --> 00:02:54,570
+take this input, an image
+来接收这个输入,一个82*36
+
+84
+00:02:54,970 --> 00:02:56,710
+patch of dimension 82 by
+的小图块
+
+85
+00:02:56,850 --> 00:02:59,180
+36, and to classify 'y'
+来划分y
+
+86
+00:02:59,710 --> 00:03:01,070
+and to classify that image patch
+来划分每个图块是否
+
+87
+00:03:01,510 --> 00:03:03,850
+as either containing a pedestrian or not.
+包含一个行人
+
+88
+00:03:05,250 --> 00:03:06,250
+So this gives you a way
+So 这给了你一个
+
+89
+00:03:06,470 --> 00:03:08,050
+of applying supervised learning in
+应用监督学习的方法
+
+90
+00:03:08,210 --> 00:03:09,290
+order to take an image
+来对一个图块进行处理
+
+91
+00:03:09,530 --> 00:03:12,420
+patch can determine whether or not a pedestrian appears in that image capture.
+判断其是否包含有行人
+
+92
+00:03:14,310 --> 00:03:15,190
+Now, lets say we get
+现在,假设我们得到
+
+93
+00:03:15,400 --> 00:03:16,520
+a new image, a test set
+一个新的图片,一个测试集合
+
+94
+00:03:16,850 --> 00:03:17,920
+image like this and we
+图片(类似这个)
+
+95
+00:03:18,030 --> 00:03:20,240
+want to try to find a pedestrian's picture image.
+我们尝试寻找一个行人的图片
+
+96
+00:03:21,520 --> 00:03:22,340
+What we would do is start
+我们首先
+
+97
+00:03:22,670 --> 00:03:25,140
+by taking a rectangular patch of this image.
+在图片中选取一个矩形块
+
+98
+00:03:25,580 --> 00:03:26,800
+Like that shown up here, so
+像这里标注的,
+
+99
+00:03:26,900 --> 00:03:27,930
+that's maybe a 82 X
+例如这是图片中的一个
+
+100
+00:03:28,010 --> 00:03:29,440
+36 patch of this image,
+82*36的图块
+
+101
+00:03:30,270 --> 00:03:31,530
+and run that image patch through
+在我们的分类器里
+
+102
+00:03:31,830 --> 00:03:33,660
+our classifier to determine whether
+运行这个图块,验证
+
+103
+00:03:33,840 --> 00:03:34,900
+or not there is a
+是否图块中
+
+104
+00:03:34,980 --> 00:03:36,310
+pedestrian in that image patch,
+是否有行人
+
+105
+00:03:36,620 --> 00:03:38,100
+and hopefully our classifier will
+期望我们的分类器返回
+
+106
+00:03:38,260 --> 00:03:40,600
+return y equals 0 for that patch, since there is no pedestrian.
+0或者1,对应是否有行人
+
+107
+00:03:42,020 --> 00:03:42,900
+Next, we then take that green
+接下来,我们将
+
+108
+00:03:43,140 --> 00:03:44,380
+rectangle and we slide it
+绿色矩形
+
+109
+00:03:44,490 --> 00:03:45,680
+over a bit and then
+滑动一点
+
+110
+00:03:45,940 --> 00:03:47,180
+run that new image patch
+然后通过
+
+111
+00:03:47,560 --> 00:03:49,700
+through our classifier to decide if there's a pedestrian there.
+我们的分类器来决定是否有行人。
+
+112
+00:03:50,760 --> 00:03:51,740
+And having done that, we then
+完成后,我们
+
+113
+00:03:51,920 --> 00:03:53,070
+slide the window further to the
+滑动窗口向右
+
+114
+00:03:53,160 --> 00:03:54,160
+right and run that patch
+再次
+
+115
+00:03:54,420 --> 00:03:56,690
+through the classifier again.
+运行分类器
+
+116
+00:03:56,970 --> 00:03:57,850
+The amount by which you shift
+每次矩形
+
+117
+00:03:58,280 --> 00:03:59,770
+the rectangle over each time
+移动距离
+
+118
+00:04:00,260 --> 00:04:01,720
+is a parameter, that's sometimes
+是一个参数,有时
+
+119
+00:04:02,190 --> 00:04:04,000
+called the step size of the
+称之为步长
+
+120
+00:04:04,070 --> 00:04:06,020
+parameter, sometimes also called
+有时也被称为
+
+121
+00:04:06,380 --> 00:04:08,970
+the slide parameter, and if
+滑动参数,如果
+
+122
+00:04:09,120 --> 00:04:11,050
+you step this one pixel at a time.
+你一次移动一个像素
+
+123
+00:04:11,210 --> 00:04:12,020
+So you can use the step size
+所以你可以使用步长
+
+124
+00:04:12,360 --> 00:04:14,020
+or stride of 1, that usually
+为1,通常
+
+125
+00:04:14,340 --> 00:04:15,560
+performs best, but is
+表现最好,但
+
+126
+00:04:15,700 --> 00:04:16,960
+more computational expensive, and
+计算成本较高,如果
+
+127
+00:04:17,430 --> 00:04:18,940
+so using a step size of
+使用步长
+
+128
+00:04:19,090 --> 00:04:20,010
+maybe 4 pixels at a
+为4像素
+
+129
+00:04:20,210 --> 00:04:20,970
+time, or eight pixels at a
+或8像素
+
+130
+00:04:21,250 --> 00:04:22,350
+time or some large number of
+或一些更大的数
+
+131
+00:04:22,550 --> 00:04:23,600
+pixels might be more common,
+可能更常见
+
+132
+00:04:24,010 --> 00:04:25,320
+since you're then moving the
+因为你每次
+
+133
+00:04:25,430 --> 00:04:26,570
+rectangle a little bit
+你移动的距离
+
+134
+00:04:26,700 --> 00:04:28,570
+more each time.
+可以更大
+
+135
+00:04:28,870 --> 00:04:30,090
+So, using this process, you continue
+所以,使用这个程序,你继续
+
+136
+00:04:30,870 --> 00:04:32,310
+stepping the rectangle over to
+向右移动矩形
+
+137
+00:04:32,340 --> 00:04:33,160
+the right a bit at a
+每次一点点距离
+
+138
+00:04:33,370 --> 00:04:34,450
+time and running each of
+然后运行分类器
+
+139
+00:04:34,520 --> 00:04:35,780
+these patches through a classifier,
+对图块进行分类
+
+140
+00:04:36,620 --> 00:04:38,220
+until eventually, as you
+直到最后,随着
+
+141
+00:04:38,900 --> 00:04:42,080
+slide this window over the
+你在图片的不同位置
+
+142
+00:04:42,150 --> 00:04:43,340
+different locations in the image,
+滑动这个矩形
+
+143
+00:04:43,550 --> 00:04:44,680
+first starting with the first
+首先从第一行
+
+144
+00:04:44,850 --> 00:04:46,080
+row and then we
+然后我们
+
+145
+00:04:46,160 --> 00:04:47,580
+go further rows in
+滑动到下一行
+
+146
+00:04:47,710 --> 00:04:49,100
+the image, you would
+你使用某个某个步长
+
+147
+00:04:49,290 --> 00:04:50,490
+then run all of
+对这些不同的图块
+
+148
+00:04:50,550 --> 00:04:52,070
+these different image patches at
+应用某个步长
+
+149
+00:04:52,240 --> 00:04:53,330
+some step size or some
+通过分类器
+
+150
+00:04:53,430 --> 00:04:54,990
+stride through your classifier.
+进行分类
+
+151
+00:04:56,990 --> 00:04:57,870
+Now, that was a pretty
+现在,这是一个相当
+
+152
+00:04:57,970 --> 00:04:59,870
+small rectangle, that would only
+小的矩形,这只会
+
+153
+00:05:00,310 --> 00:05:02,310
+detect pedestrians of one specific size.
+检测一个特定大小的行人。
+
+154
+00:05:02,780 --> 00:05:04,210
+What we do next is
+接下来我们做什么
+
+155
+00:05:04,470 --> 00:05:05,990
+start to look at larger image patches.
+开始使用更大的图块
+
+156
+00:05:06,730 --> 00:05:08,270
+So now let's take larger images
+现在让我们以更大的图片
+
+157
+00:05:08,610 --> 00:05:09,700
+patches, like those shown here
+块,如图
+
+158
+00:05:10,310 --> 00:05:11,960
+and run those through the classifier as well.
+然后也通过分类器运行
+
+159
+00:05:13,540 --> 00:05:14,320
+And by the way when I say
+当我说
+
+160
+00:05:14,600 --> 00:05:15,830
+take a larger image patch, what
+以较大的图块,
+
+161
+00:05:16,080 --> 00:05:17,780
+I really mean is when you
+我的意思是当你
+
+162
+00:05:17,860 --> 00:05:18,850
+take an image patch like this,
+选取这样的图块,
+
+163
+00:05:19,490 --> 00:05:20,720
+what you're really doing is taking
+你真正做的是
+
+164
+00:05:20,880 --> 00:05:22,110
+that image patch, and resizing
+选择图像块,并调整大小
+
+165
+00:05:22,800 --> 00:05:24,750
+it down to 82 X 36, say.
+下降到82×36,
+
+166
+00:05:25,000 --> 00:05:26,260
+So you take this larger
+所以你拿这个更大的
+
+167
+00:05:26,550 --> 00:05:28,180
+patch and re-size it to
+块和调整其大小
+
+168
+00:05:28,300 --> 00:05:29,800
+be a smaller image and then
+成为更小的图,然后
+
+169
+00:05:29,970 --> 00:05:31,260
+the smaller re-sized image
+用这个图块
+
+170
+00:05:31,600 --> 00:05:32,620
+that is what you
+在分类器中
+
+171
+00:05:32,990 --> 00:05:35,340
+would pass through your classifier to try and decide if there is a pedestrian in that patch.
+运行,然后决定是否有行人。
+
+172
+00:05:37,230 --> 00:05:38,310
+And finally you can do
+最后你可以
+
+173
+00:05:38,470 --> 00:05:39,530
+this at an even larger
+在一个更大
+
+174
+00:05:39,930 --> 00:05:41,870
+scales and run
+规模做这一步
+
+175
+00:05:42,080 --> 00:05:43,830
+that side of Windows to
+运行滑动窗口知直到
+
+176
+00:05:43,980 --> 00:05:45,920
+the end And after
+结束,经过
+
+177
+00:05:45,980 --> 00:05:47,480
+this whole process hopefully your algorithm
+这个过程,希望你的算法
+
+178
+00:05:48,040 --> 00:05:49,670
+will detect whether theres pedestrian
+将检测到是否有行人
+
+179
+00:05:50,140 --> 00:05:52,070
+appears in the image, so
+在图中出现,所以
+
+180
+00:05:52,470 --> 00:05:53,850
+thats how you train a
+这就是你如何训练一个
+
+181
+00:05:54,290 --> 00:05:55,630
+the classifier, and then
+分类器,然后
+
+182
+00:05:55,890 --> 00:05:57,360
+use a sliding windows classifier,
+使用滑动窗口分类,
+
+183
+00:05:57,920 --> 00:05:59,820
+or use a sliding windows detector in
+或使用一个滑动窗口检测器
+
+184
+00:05:59,970 --> 00:06:01,740
+order to find pedestrians in the image.
+去寻找图像中的行人。
+
+185
+00:06:03,070 --> 00:06:04,050
+Let's have a turn to the
+让我们转向
+
+186
+00:06:04,150 --> 00:06:05,910
+text detection example and talk
+文本检测的例子,讨论
+
+187
+00:06:06,100 --> 00:06:07,490
+about that stage in our
+那个阶段,在我们
+
+188
+00:06:07,790 --> 00:06:09,330
+photo OCR pipeline, where our
+的照片OCR管道,我们
+
+189
+00:06:09,570 --> 00:06:11,340
+goal is to find the text regions in unit.
+的目标是找到一个个的文本区域。
+
+190
+00:06:13,250 --> 00:06:15,010
+similar to pedestrian detection you
+与行人检测类似,你
+
+191
+00:06:15,250 --> 00:06:16,730
+can come up with a label
+能拿到具有标签的
+
+192
+00:06:17,030 --> 00:06:18,410
+training set with positive examples
+的正例集合
+
+193
+00:06:19,060 --> 00:06:20,930
+and negative examples with examples
+和负例集合
+
+194
+00:06:21,530 --> 00:06:23,810
+corresponding to regions where text appears.
+对应文字出现的区域
+
+195
+00:06:24,300 --> 00:06:27,290
+So instead of trying to detect pedestrians, we're now trying to detect texts.
+所以不再进行行人检测,我们现在尝试检测文本。
+
+196
+00:06:28,130 --> 00:06:29,670
+And so positive examples are going
+正面的样本是
+
+197
+00:06:29,770 --> 00:06:31,640
+to be patches of images where there is text.
+具有文字的图块
+
+198
+00:06:31,970 --> 00:06:33,330
+And negative examples is going
+负面的样本是
+
+199
+00:06:33,380 --> 00:06:36,000
+to be patches of images where there isn't text.
+没有文字的
+
+200
+00:06:36,330 --> 00:06:37,530
+Having trained this we can
+训练完之后我们
+
+201
+00:06:38,030 --> 00:06:39,450
+now apply it to a
+可以将其应用到
+
+202
+00:06:39,870 --> 00:06:41,190
+new image, into a test
+新的图片上,一个测试集
+
+203
+00:06:42,460 --> 00:06:42,910
+set image.
+图片
+
+204
+00:06:43,310 --> 00:06:44,900
+So here's the image that we've been using as example.
+这是我们作为例子的图片
+
+205
+00:06:46,040 --> 00:06:47,300
+Now, last time we run,
+现在,上次我们运行
+
+206
+00:06:47,440 --> 00:06:48,400
+for this example we are going
+这个样本,我们将
+
+207
+00:06:48,560 --> 00:06:50,300
+to run a sliding windows at
+运行一个滑动窗口
+
+208
+00:06:50,640 --> 00:06:52,030
+just one fixed scale just
+在一个固定的比例
+
+209
+00:06:52,370 --> 00:06:54,360
+for purpose of illustration, meaning that
+只是为了进行说明,意味着
+
+210
+00:06:54,450 --> 00:06:56,000
+I'm going to use just one rectangle size.
+我将使用一个矩形的大小。
+
+211
+00:06:56,790 --> 00:06:58,110
+But lets say I run my little
+但如果我
+
+212
+00:06:58,350 --> 00:07:00,070
+sliding windows classifier on lots
+使用这个小的滑动窗口分类器在
+
+213
+00:07:00,170 --> 00:07:01,570
+of little image patches like
+在大量类似这样的小图块上
+
+214
+00:07:01,630 --> 00:07:04,340
+this if I
+如果我
+
+215
+00:07:04,430 --> 00:07:05,430
+do that, what Ill end
+这样做,我将
+
+216
+00:07:05,530 --> 00:07:06,670
+up with is a result
+得到一个结果
+
+217
+00:07:07,040 --> 00:07:08,530
+like this where the white
+像这样,白色
+
+218
+00:07:08,900 --> 00:07:10,700
+region show where my
+区域显示我的
+
+219
+00:07:10,940 --> 00:07:12,190
+text detection system has found
+文本检测系统发现了
+
+220
+00:07:12,210 --> 00:07:15,960
+text and so the axis' of these two figures are the same.
+文本,这两个图的坐标是一样的
+
+221
+00:07:16,390 --> 00:07:17,700
+So there is a region
+有一个这样的区域
+
+222
+00:07:18,110 --> 00:07:19,200
+up here, of course also
+在这里,当然也
+
+223
+00:07:19,230 --> 00:07:20,710
+a region up here, so the
+有一个区域在这里,所以
+
+224
+00:07:20,840 --> 00:07:22,040
+fact that this black up here
+黑色区域
+
+225
+00:07:22,850 --> 00:07:24,390
+represents that the classifier
+表示分类器
+
+226
+00:07:24,840 --> 00:07:25,940
+does not think it's found any
+并不认为它发现
+
+227
+00:07:26,170 --> 00:07:28,100
+texts up there, whereas the
+了文字,而
+
+228
+00:07:28,170 --> 00:07:29,630
+fact that there's a lot
+事实上有很多
+
+229
+00:07:29,810 --> 00:07:31,300
+of white stuff here, that reflects that
+白色,这反映出
+
+230
+00:07:31,540 --> 00:07:33,260
+classifier thinks that it's found a bunch of texts.
+分类器认为发现了一串文本。
+
+231
+00:07:33,520 --> 00:07:34,310
+over there on the image.
+在图上的这个位置
+
+232
+00:07:35,040 --> 00:07:35,700
+What i have done on this
+我在图片左下角
+
+233
+00:07:35,780 --> 00:07:36,870
+image on the lower left is
+做的操作是
+
+234
+00:07:37,070 --> 00:07:38,820
+actually use white to
+其实用白色
+
+235
+00:07:38,970 --> 00:07:41,050
+show where the classifier thinks it has found text.
+显示分类器认为它找到了文字
+
+236
+00:07:41,810 --> 00:07:43,280
+And different shades of grey
+深浅不同的灰色
+
+237
+00:07:43,880 --> 00:07:45,560
+correspond to the probability that
+对应分类器输出
+
+238
+00:07:45,670 --> 00:07:46,750
+was output by the classifier,
+的概率
+
+239
+00:07:47,110 --> 00:07:48,000
+so like the shades of grey
+所以类似灰色阴影
+
+240
+00:07:48,520 --> 00:07:49,860
+corresponds to where it
+对应它
+
+241
+00:07:49,930 --> 00:07:50,750
+thinks it might have found text
+认为它可能找到了文本
+
+242
+00:07:51,210 --> 00:07:53,900
+but has lower confidence the bright
+但是不够确定
+
+243
+00:07:54,260 --> 00:07:55,980
+white response to whether the classifier,
+亮的白色对应
+
+244
+00:07:57,440 --> 00:07:58,400
+up with a very high
+一个很高的
+
+245
+00:07:58,660 --> 00:08:00,470
+probability, estimated probability of
+概率,在那个位置有行人存在
+
+246
+00:08:00,630 --> 00:08:03,110
+there being pedestrians in that location.
+的概率估计
+
+247
+00:08:04,110 --> 00:08:05,270
+We aren't quite done yet because
+我们还没完全结束因为
+
+248
+00:08:05,690 --> 00:08:06,580
+what we actually want to do
+我们真正想做的
+
+249
+00:08:06,830 --> 00:08:08,620
+is draw rectangles around all
+是在所有文字周围
+
+250
+00:08:08,850 --> 00:08:09,780
+the region where this text
+绘制
+
+251
+00:08:10,490 --> 00:08:12,540
+in the image, so were
+矩形,所以
+
+252
+00:08:12,650 --> 00:08:13,540
+going to take one more step
+我们进一步
+
+253
+00:08:13,840 --> 00:08:14,990
+which is we take the output
+输出分类器的
+
+254
+00:08:15,230 --> 00:08:16,880
+of the classifier and apply
+输出和应用
+
+255
+00:08:17,290 --> 00:08:19,280
+to it what is called an expansion operator.
+所谓的膨胀算子
+
+256
+00:08:20,750 --> 00:08:22,250
+So what that does is, it
+那么它所做的是,它
+
+257
+00:08:22,430 --> 00:08:24,270
+take the image here,
+取这里的图,
+
+258
+00:08:25,450 --> 00:08:26,700
+and it takes each of
+它把每个白色块
+
+259
+00:08:26,800 --> 00:08:28,200
+the white blobs, it takes each
+它把每一个
+
+260
+00:08:28,270 --> 00:08:30,590
+of the white regions and it expands that white region.
+白色区域扩大。
+
+261
+00:08:31,460 --> 00:08:32,460
+Mathematically, the way you
+从数学上,你实现它的方式
+
+262
+00:08:32,610 --> 00:08:34,110
+implement that is, if you
+是,如果
+
+263
+00:08:34,270 --> 00:08:35,280
+look at the image on the right, what
+看右边的图像,
+
+264
+00:08:35,690 --> 00:08:36,780
+we're doing to create the
+我们正在创造的
+
+265
+00:08:36,930 --> 00:08:38,110
+image on the right is, for every
+右边的这个图像,对于每一个像素
+
+266
+00:08:38,370 --> 00:08:39,510
+pixel we are going
+我们将要
+
+267
+00:08:39,610 --> 00:08:40,790
+to ask, is it withing
+问,它是否和
+
+268
+00:08:41,370 --> 00:08:42,960
+some distance of a
+左边图中的白色像素
+
+269
+00:08:43,100 --> 00:08:44,650
+white pixel in the left image.
+有一段距离
+
+270
+00:08:45,430 --> 00:08:46,800
+And so, if a specific pixel
+如果是的话,如果一个特定的像素
+
+271
+00:08:47,220 --> 00:08:48,420
+is within, say, five pixels
+距离左边中的
+
+272
+00:08:48,950 --> 00:08:50,280
+or ten pixels of a white
+白色像素,比如
+
+273
+00:08:50,610 --> 00:08:52,310
+pixel in the leftmost image, then
+5个或者10个像素距离,然后
+
+274
+00:08:52,540 --> 00:08:55,020
+we'll also color that pixel white in the rightmost image.
+我们会在右边的图像中,把这个像素也染成白色
+
+275
+00:08:56,190 --> 00:08:57,010
+And so, the effect of this
+这样做的效果是
+
+276
+00:08:57,300 --> 00:08:58,350
+is, we'll take each of the
+把左边图中
+
+277
+00:08:58,730 --> 00:08:59,630
+white blobs in the leftmost
+每个白色区块
+
+278
+00:09:00,030 --> 00:09:01,370
+image and expand them a
+扩大一点
+
+279
+00:09:01,500 --> 00:09:02,200
+bit, grow them a little
+让其增长一点
+
+280
+00:09:02,670 --> 00:09:04,110
+bit, by seeing whether the
+通过检查
+
+281
+00:09:04,170 --> 00:09:05,420
+nearby pixels, the white pixels,
+邻近像素,白色像素
+
+282
+00:09:05,900 --> 00:09:07,980
+and then coloring those nearby pixels in white as well.
+然后把它们也染成白色
+
+283
+00:09:08,430 --> 00:09:09,900
+Finally, we are just about done.
+最后,我们就要完成了
+
+284
+00:09:10,180 --> 00:09:11,210
+We can now look at this
+现在我们可以看看这个
+
+285
+00:09:11,480 --> 00:09:12,900
+right most image and just
+最右边的图像就
+
+286
+00:09:13,210 --> 00:09:14,650
+look at the connecting components
+看连接组件
+
+287
+00:09:15,320 --> 00:09:16,700
+and look at the as white
+看白色
+
+288
+00:09:16,990 --> 00:09:19,350
+regions and draw bounding boxes around them.
+区域,在其周围绘制边框。
+
+289
+00:09:20,260 --> 00:09:20,990
+And in particular, if we look at
+特别的,如果我们看一看
+
+290
+00:09:21,390 --> 00:09:22,850
+all the white regions, like
+所有的白色区域,比如
+
+291
+00:09:23,080 --> 00:09:24,750
+this one, this one, this
+这个,这个,这个
+
+292
+00:09:24,990 --> 00:09:26,670
+one, and so on, and
+,等等
+
+293
+00:09:27,030 --> 00:09:27,810
+if we use a simple heuristic
+如果我们用一个简单的启发式方法
+
+294
+00:09:28,390 --> 00:09:30,240
+to rule out rectangles whose aspect
+来排除那些比例
+
+295
+00:09:30,660 --> 00:09:32,760
+ratios look funny because we
+滑稽的矩形,因为我们
+
+296
+00:09:32,870 --> 00:09:34,460
+know that boxes around text
+知道文本周围的框
+
+297
+00:09:34,730 --> 00:09:36,130
+should be much wider than they are tall.
+应该是宽远大于高
+
+298
+00:09:37,110 --> 00:09:38,310
+And so if we ignore the
+所以如果我们忽略
+
+299
+00:09:38,410 --> 00:09:39,990
+thin, tall blobs like this one
+窄的,高的图块,像这个,
+
+300
+00:09:40,230 --> 00:09:42,120
+and this one, and
+和这一个
+
+301
+00:09:42,190 --> 00:09:43,390
+we discard these ones because
+我们丢弃这些
+
+302
+00:09:43,880 --> 00:09:45,490
+they are too tall and thin, and
+因为它们太高,太窄
+
+303
+00:09:45,660 --> 00:09:46,780
+we then draw a the rectangles
+我们在那些
+
+304
+00:09:47,470 --> 00:09:48,440
+around the ones whose aspect
+比例符合文字的
+
+305
+00:09:48,840 --> 00:09:50,420
+ratio thats a height
+图块周围
+
+306
+00:09:50,610 --> 00:09:51,800
+to what ratio looks like for
+绘制矩形框
+
+307
+00:09:51,950 --> 00:09:53,310
+text regions, then we
+然后我们
+
+308
+00:09:53,380 --> 00:09:55,070
+can draw rectangles, the bounding
+可以绘制矩形,包围
+
+309
+00:09:55,450 --> 00:09:56,660
+boxes around this text
+在这些
+
+310
+00:09:56,970 --> 00:09:58,500
+region, this text region, and
+文字区域,
+
+311
+00:09:58,610 --> 00:10:00,550
+that text region, corresponding to
+分别对应
+
+312
+00:10:01,060 --> 00:10:02,180
+the Lula B's antique mall logo,
+Lula Bs 商场标志,
+
+313
+00:10:02,650 --> 00:10:04,690
+the Lula B's, and this little open sign.
+Lula B,以及这个小“Open”的标志
+
+314
+00:10:05,840 --> 00:10:06,000
+Of over there.
+那边
+
+315
+00:10:07,100 --> 00:10:09,550
+This example by the actually misses one piece of text.
+这个例子中其实遗漏了一段文字
+
+316
+00:10:09,860 --> 00:10:12,550
+This is very hard to read, but there is actually one piece of text there.
+这个不好读取,但事实上那边是有一块文字的
+
+317
+00:10:13,080 --> 00:10:14,710
+That says [xx] are corresponding
+这个[ XX ]对应
+
+318
+00:10:14,950 --> 00:10:16,180
+to this but the aspect ratio
+这个区域,但是纵横比
+
+319
+00:10:16,530 --> 00:10:17,960
+looks wrong so we discarded that one.
+看起来是错误,所以我们丢弃掉
+
+320
+00:10:19,100 --> 00:10:20,240
+So you know it's ok
+所以你知道它是好的
+
+321
+00:10:20,530 --> 00:10:21,460
+on this image, but in
+在这个图中,但在
+
+322
+00:10:21,660 --> 00:10:22,760
+this particular example the classifier
+这个特殊的例子中,分类器
+
+323
+00:10:23,290 --> 00:10:24,400
+actually missed one piece of text.
+遗漏了一段文字。
+
+324
+00:10:24,760 --> 00:10:25,780
+It's very hard to read because
+这是很难读的,因为
+
+325
+00:10:25,960 --> 00:10:26,900
+there's a piece of text
+有一段文字
+
+326
+00:10:27,240 --> 00:10:28,700
+written against a transparent window.
+写在了透明的窗户上
+
+327
+00:10:29,750 --> 00:10:31,200
+So that's text detection
+那么这就是文本检测
+
+328
+00:10:32,430 --> 00:10:33,120
+using sliding windows.
+使用滑动窗口
+
+329
+00:10:33,800 --> 00:10:35,300
+And having found these rectangles
+发现包含文字
+
+330
+00:10:36,100 --> 00:10:37,010
+with the text in it, we
+的矩形,我们
+
+331
+00:10:37,110 --> 00:10:38,240
+can now just cut out
+现在可以切出
+
+332
+00:10:38,450 --> 00:10:39,890
+these image regions and then
+这些图像区域,然后
+
+333
+00:10:40,070 --> 00:10:42,100
+use later stages of pipeline to try to meet the texts.
+使用后期的管道来满足文本。
+
+334
+00:10:45,390 --> 00:10:46,820
+Now, you recall that the
+现在,你回忆一下
+
+335
+00:10:46,880 --> 00:10:48,360
+second stage of pipeline was
+管道的第二步是
+
+336
+00:10:48,570 --> 00:10:50,620
+character segmentation, so given an
+字符分割,所以给一个
+
+337
+00:10:50,890 --> 00:10:52,530
+image like that shown on top,
+如上部所示的图片
+
+338
+00:10:52,790 --> 00:10:55,660
+how do we segment out the individual characters in this image?
+我们如何分割出此图像中的单个字符?
+
+339
+00:10:56,580 --> 00:10:57,460
+So what we can do is
+所以我们所能做的就是
+
+340
+00:10:57,910 --> 00:10:59,590
+again use a supervised learning
+再次使用有监督的学习
+
+341
+00:11:00,010 --> 00:11:01,020
+algorithm with some set of
+算法,以及一些
+
+342
+00:11:01,100 --> 00:11:01,990
+positive and some set of
+正例和反例
+
+343
+00:11:02,100 --> 00:11:03,810
+negative examples, what were
+我们将要做的是
+
+344
+00:11:03,880 --> 00:11:04,840
+going to do is look in
+检查这些
+
+345
+00:11:04,900 --> 00:11:06,160
+the image patch and try
+图像块并且尝试
+
+346
+00:11:06,390 --> 00:11:08,110
+to decide if there
+决定在图块的
+
+347
+00:11:08,370 --> 00:11:09,690
+is split between two characters
+的中间是否存在
+
+348
+00:11:10,700 --> 00:11:12,070
+right in the middle of that image match.
+两个字符的分割
+
+349
+00:11:13,030 --> 00:11:14,100
+So for initial positive examples.
+所以对于初始的正例
+
+350
+00:11:14,960 --> 00:11:17,040
+This first cross example, this image
+第一跨的样本,这个图块
+
+351
+00:11:17,290 --> 00:11:18,590
+patch looks like the
+看起来
+
+352
+00:11:18,650 --> 00:11:20,050
+middle of it is indeed
+中间的确
+
+353
+00:11:21,320 --> 00:11:22,890
+the middle has splits between two
+存在两个字符的分割
+
+354
+00:11:23,110 --> 00:11:24,120
+characters and the second example
+第二例
+
+355
+00:11:24,680 --> 00:11:25,770
+again this looks like a
+仍然看起来像一个
+
+356
+00:11:25,950 --> 00:11:27,370
+positive example, because if I split
+正面的样本,因为如果我
+
+357
+00:11:27,840 --> 00:11:29,020
+two characters by putting a
+在正中间放一条线
+
+358
+00:11:29,160 --> 00:11:31,190
+line right down the middle, that's the right thing to do.
+来分割字符,这样做是正确的。
+
+359
+00:11:31,350 --> 00:11:33,310
+So, these are positive examples, where
+等,这些都是正面的样本,
+
+360
+00:11:33,510 --> 00:11:35,370
+the middle of the image represents
+图像的中间表示
+
+361
+00:11:35,970 --> 00:11:36,930
+a gap or a split
+两个字符之间的
+
+362
+00:11:37,960 --> 00:11:40,320
+between two distinct characters, whereas
+沟壑或分割,而
+
+363
+00:11:40,560 --> 00:11:41,870
+the negative examples, well, you
+负面的样本,很好,你
+
+364
+00:11:42,010 --> 00:11:43,160
+know, you don't want to split
+知道,你不想分裂
+
+365
+00:11:43,690 --> 00:11:44,810
+two characters right in the
+把两个字从中间
+
+366
+00:11:44,900 --> 00:11:46,610
+middle, and so
+分割,所以
+
+367
+00:11:46,820 --> 00:11:48,160
+these are negative examples because
+这些都是负面的样本,因为
+
+368
+00:11:48,460 --> 00:11:50,660
+they don't represent the midpoint between two characters.
+不代表两个字符之间的中点。
+
+369
+00:11:51,760 --> 00:11:52,490
+So what we will do
+所以我们要做的
+
+370
+00:11:52,650 --> 00:11:53,940
+is, we will train a classifier,
+是:我们会训练一个分类器,
+
+371
+00:11:54,500 --> 00:11:55,910
+maybe using new network, maybe
+也许利用新的网络,也许
+
+372
+00:11:56,180 --> 00:11:58,000
+using a different learning algorithm, to
+使用不同的学习算法,
+
+373
+00:11:58,120 --> 00:12:01,420
+try to classify between the positive and negative examples.
+尝试将正面和负面的样本进行区分。
+
+374
+00:12:02,770 --> 00:12:03,980
+Having trained such a classifier,
+有这样一个分类器的训练,
+
+375
+00:12:04,320 --> 00:12:06,030
+we can then run this on
+我们可以在
+
+376
+00:12:06,690 --> 00:12:07,830
+this sort of text that our
+我们的文字检测系统中
+
+377
+00:12:07,940 --> 00:12:09,410
+text detection system has pulled out.
+分离出的文字上运行这个分类器
+
+378
+00:12:09,590 --> 00:12:10,970
+As we start by looking at
+我们先来看看
+
+379
+00:12:11,130 --> 00:12:12,080
+that rectangle, and we ask,
+那个矩形,我们问,
+
+380
+00:12:12,230 --> 00:12:13,280
+"Gee, does it look
+“哎呀,它看起来
+
+381
+00:12:13,510 --> 00:12:15,000
+like the middle of
+像那个绿色矩形的
+
+382
+00:12:15,100 --> 00:12:16,600
+that green rectangle, does it
+中间?,它
+
+383
+00:12:16,680 --> 00:12:18,470
+look like the midpoint between two characters?".
+看起来像两个字符之间的中点?”。
+
+384
+00:12:18,980 --> 00:12:20,220
+And hopefully, the classifier will
+希望,分类器
+
+385
+00:12:20,320 --> 00:12:21,760
+say no, then we slide
+说不,然后我们滑动
+
+386
+00:12:22,170 --> 00:12:23,280
+the window over and this
+窗户,这是
+
+387
+00:12:23,410 --> 00:12:24,850
+is a one dimensional sliding
+一维的滑动
+
+388
+00:12:25,200 --> 00:12:26,410
+window classifier, because were
+窗口分类器,因为
+
+389
+00:12:26,500 --> 00:12:27,820
+going to slide the window only
+将滑动窗口
+
+390
+00:12:28,470 --> 00:12:29,560
+in one straight line from
+在一条直线上
+
+391
+00:12:29,780 --> 00:12:32,070
+left to right, theres no different rows here.
+从左到右,这时没有不同的行
+
+392
+00:12:32,270 --> 00:12:34,420
+There's only one row here.
+只有一行
+
+393
+00:12:34,520 --> 00:12:36,160
+But now, with the classifier in
+但是,在这个位置时,
+
+394
+00:12:36,240 --> 00:12:37,250
+this position, we ask, well,
+我们使用分类器,我们问,好,
+
+395
+00:12:37,490 --> 00:12:38,700
+should we split those two characters
+我们应该分开这两个字符
+
+396
+00:12:39,570 --> 00:12:41,580
+or should we put a split right down the middle of this rectangle.
+或者我们应该在这个矩形的中间进行分割
+
+397
+00:12:41,950 --> 00:12:43,040
+And hopefully, the classifier will
+希望,分类器
+
+398
+00:12:43,190 --> 00:12:44,720
+output y equals one, in
+输出Y=1,在
+
+399
+00:12:44,780 --> 00:12:46,460
+which case we will decide to
+这种情况下我们会决定
+
+400
+00:12:46,630 --> 00:12:49,690
+draw a line down there, to try to split two characters.
+画一条线在那里,试图分割这两个字符
+
+401
+00:12:50,710 --> 00:12:51,620
+Then we slide the window over
+然后再次滑动窗口
+
+402
+00:12:51,870 --> 00:12:53,440
+again, classifier says, don't
+分类器说,不
+
+403
+00:12:53,650 --> 00:12:55,020
+close the gap, slide over again,
+关闭间隙,再次滑过,
+
+404
+00:12:55,300 --> 00:12:56,580
+classifier says yes, do split
+分类器说yes,进行分割
+
+405
+00:12:57,230 --> 00:12:58,830
+there and so
+等等
+
+406
+00:12:59,200 --> 00:13:00,410
+on, and we slowly slide the
+然后慢慢地滑动
+
+407
+00:13:00,560 --> 00:13:01,770
+classifier over to the
+分类器向
+
+408
+00:13:01,920 --> 00:13:03,310
+right and hopefully it will
+右边,希望它可以将
+
+409
+00:13:03,380 --> 00:13:05,160
+classify this as another positive example and
+这个作为另一个正面的样本,
+
+410
+00:13:05,770 --> 00:13:07,470
+so on.
+等等
+
+411
+00:13:08,010 --> 00:13:09,180
+And we will slide this window
+我们向右滑动每一步
+
+412
+00:13:09,820 --> 00:13:10,990
+over to the right, running the
+运行分类器
+
+413
+00:13:11,160 --> 00:13:12,670
+classifier at every step, and
+希望它可以
+
+414
+00:13:12,800 --> 00:13:13,800
+hopefully it will tell us,
+告诉我们,
+
+415
+00:13:14,210 --> 00:13:15,070
+you know, what are the right locations
+你知道,应该在哪里
+
+416
+00:13:16,190 --> 00:13:17,820
+to split these characters up into,
+对字符进行分割
+
+417
+00:13:18,290 --> 00:13:20,410
+just split this image up into individual characters.
+将这个图像分成单个字符
+
+418
+00:13:21,090 --> 00:13:22,450
+And so thats 1D sliding
+这就是一维滑动
+
+419
+00:13:22,810 --> 00:13:24,190
+windows for character segmentation.
+窗口字符分割
+
+420
+00:13:25,520 --> 00:13:28,430
+So, here's the overall photo OCR pipe line again.
+所以,这里是OCR照片管道的全部
+
+421
+00:13:29,120 --> 00:13:30,280
+In this video we've talked about
+在这段视频中我们已经谈到了
+
+422
+00:13:30,780 --> 00:13:32,170
+the text detection step, where
+文本检测步骤,其中
+
+423
+00:13:32,360 --> 00:13:34,570
+we use sliding windows to detect text.
+我们使用滑动窗口检测文本。
+
+424
+00:13:35,200 --> 00:13:36,390
+And we also use a one-dimensional
+我们也使用一个一维的
+
+425
+00:13:37,070 --> 00:13:38,420
+sliding windows to do character
+滑动窗来做
+
+426
+00:13:38,790 --> 00:13:40,160
+segmentation to segment out,
+字符分割
+
+427
+00:13:40,730 --> 00:13:42,860
+you know, this text image in division of characters.
+你知道,这个文本图像的字符分割。
+
+428
+00:13:43,900 --> 00:13:44,770
+The final step through the
+通过管道的最后一步
+
+429
+00:13:44,810 --> 00:13:46,040
+pipeline is the character classification
+是字符分类
+
+430
+00:13:46,720 --> 00:13:48,150
+step and that step you might
+步骤和这一步你或许
+
+431
+00:13:48,370 --> 00:13:49,750
+already be much more familiar
+更加熟悉
+
+432
+00:13:50,020 --> 00:13:51,490
+with the early videos
+与早期的关于
+
+433
+00:13:52,080 --> 00:13:54,470
+on supervised learning
+监督学习的视频
+
+434
+00:13:55,170 --> 00:13:56,440
+where you can apply a standard
+你可以应用一个标准的
+
+435
+00:13:56,940 --> 00:13:58,150
+supervised learning within maybe
+监督学习也许
+
+436
+00:13:58,360 --> 00:13:59,250
+on your network or maybe something
+在你的网络上或者一些
+
+437
+00:13:59,570 --> 00:14:00,650
+else in order to
+其他的,为了
+
+438
+00:14:00,860 --> 00:14:02,100
+take it's input, an image
+把它的输入,类似
+
+439
+00:14:02,980 --> 00:14:05,030
+like that and classify which alphabet
+这样的图片,将其
+
+440
+00:14:05,480 --> 00:14:07,120
+or which 26 characters A
+分类到26个字母
+
+441
+00:14:07,230 --> 00:14:08,320
+to Z, or maybe we should
+或者36个,
+
+442
+00:14:08,570 --> 00:14:09,670
+have 36 characters if you
+如果包含
+
+443
+00:14:09,780 --> 00:14:11,140
+have the numerical digits as
+数字的话
+
+444
+00:14:11,270 --> 00:14:12,650
+well, the multi class
+多
+
+445
+00:14:13,080 --> 00:14:14,410
+classification problem where you
+分类器问题
+
+446
+00:14:14,510 --> 00:14:15,690
+take it's input and image
+它的输入
+
+447
+00:14:16,050 --> 00:14:17,390
+contained a character and decide
+一个包含文字的图片
+
+448
+00:14:18,140 --> 00:14:20,450
+what is the character that appears in that image?
+出现在图像中的字符是什么?
+
+449
+00:14:21,080 --> 00:14:22,460
+So that was the photo OCR
+那么这就是照片OCR
+
+450
+00:14:23,730 --> 00:14:24,750
+pipeline and how you can
+以及你如何
+
+451
+00:14:24,910 --> 00:14:26,140
+use ideas like sliding windows
+使用滑动窗口
+
+452
+00:14:26,520 --> 00:14:27,960
+classifiers in order to
+分类器来
+
+453
+00:14:28,100 --> 00:14:29,790
+put these different components to
+使用这些不同的组件
+
+454
+00:14:30,060 --> 00:14:31,570
+develop a photo OCR system.
+开发照片OCR系统
+
+455
+00:14:32,430 --> 00:14:33,570
+In the next few videos we
+在接下来的几个视频我们
+
+456
+00:14:33,680 --> 00:14:34,930
+keep on using the problem of
+继续使用照片OCR问题
+
+457
+00:14:35,150 --> 00:14:36,550
+photo OCR to explore somewhat
+探索一些
+
+458
+00:14:36,960 --> 00:14:39,070
+interesting issues surrounding building an application like this.
+有趣的问题,建立类似的应用。
+
diff --git a/srt/18 - 3 - Getting Lots of Data and Artificial Data (16 min).srt b/srt/18 - 3 - Getting Lots of Data and Artificial Data (16 min).srt
new file mode 100644
index 00000000..22358272
--- /dev/null
+++ b/srt/18 - 3 - Getting Lots of Data and Artificial Data (16 min).srt
@@ -0,0 +1,2549 @@
+1
+00:00:00,090 --> 00:00:01,270
+I've seen over and over that
+我多次看到(字幕翻译:中国海洋大学,刘竞)
+
+2
+00:00:01,570 --> 00:00:03,160
+one of the most reliable ways to
+一个最可靠的得到
+
+3
+00:00:03,300 --> 00:00:04,800
+get a high performance machine learning
+一个高性能的机器学习
+
+4
+00:00:05,040 --> 00:00:06,170
+system is to take
+系统的方法是
+
+5
+00:00:06,550 --> 00:00:07,860
+a low bias learning algorithm
+采取一个低偏差的机器学习算法
+
+6
+00:00:08,750 --> 00:00:10,220
+and to train it on a massive training set.
+并且在用大量的数据集去训练它。
+
+7
+00:00:11,230 --> 00:00:12,830
+But where did you get so much training data from?
+但是你该从哪里去获得这么多的训练数据呢?
+
+8
+00:00:13,510 --> 00:00:14,440
+Turns out that the machine earnings
+机器收益证明了
+
+9
+00:00:14,820 --> 00:00:16,520
+there's a fascinating idea called artificial
+有一个非常吸引人的思想叫做
+
+10
+00:00:17,220 --> 00:00:19,000
+data synthesis, this doesn't
+人工数据合成,这种思想并不
+
+11
+00:00:19,370 --> 00:00:20,740
+apply to every single problem, and
+适用于每个单独的问题,当把它
+
+12
+00:00:20,980 --> 00:00:22,120
+to apply to a specific
+运用于一个具体的特定
+
+13
+00:00:22,360 --> 00:00:25,060
+problem, often takes some thought and innovation and insight.
+问题时,需要经过一些思考,改进和洞察力。
+
+14
+00:00:25,780 --> 00:00:27,170
+But if this idea applies
+但是假如这个思想应用在
+
+15
+00:00:27,580 --> 00:00:29,120
+to your machine, only problem, it
+在你的机器上,唯一的问题是,
+
+16
+00:00:29,230 --> 00:00:30,270
+can sometimes be a an
+有时,为你的学习算法
+
+17
+00:00:30,510 --> 00:00:31,600
+easy way to get a
+获得一个巨大的
+
+18
+00:00:31,680 --> 00:00:33,470
+huge training set to give to your learning algorithm.
+训练集将是很容易的。
+
+19
+00:00:33,900 --> 00:00:35,520
+The idea of artificial
+人工数据合成
+
+20
+00:00:36,230 --> 00:00:38,410
+data synthesis comprises of two
+包含两个
+
+21
+00:00:38,590 --> 00:00:40,210
+variations, main the first
+变化形式,第一个最主要的
+
+22
+00:00:40,650 --> 00:00:41,940
+is if we are essentially creating
+是我们是否有必要
+
+23
+00:00:42,520 --> 00:00:44,940
+data from [xx], creating new data from scratch.
+从[xx]中去生成数据,也就是从头开始去创新数据。
+
+24
+00:00:45,380 --> 00:00:46,700
+And the second is if
+第二种是我们
+
+25
+00:00:46,930 --> 00:00:48,350
+we already have it's small
+是否已经有了算法的一小
+
+26
+00:00:48,590 --> 00:00:49,970
+label training set and we
+部分标签训练集
+
+27
+00:00:50,210 --> 00:00:51,490
+somehow have amplify that training
+并且以某种方式扩充了训练集
+
+28
+00:00:51,840 --> 00:00:52,680
+set or use a small training
+或者是把一小部分训练集
+
+29
+00:00:52,980 --> 00:00:54,390
+set to turn that into
+转换成了
+
+30
+00:00:54,660 --> 00:00:56,290
+a larger training set and in
+一个较大的数据集
+
+31
+00:00:56,450 --> 00:00:58,120
+this video we'll go over both those ideas.
+在这一个视频中,我们将仔细学习这两种思路。
+
+32
+00:01:00,350 --> 00:01:02,220
+To talk about the artificial data
+在讲到人工数据
+
+33
+00:01:02,440 --> 00:01:04,030
+synthesis idea, let's use
+合成思想时,让我们
+
+34
+00:01:04,330 --> 00:01:06,930
+the character portion of
+借用一下成像光学字符识别
+
+35
+00:01:07,090 --> 00:01:08,470
+the photo OCR pipeline, we
+管道中的字符部分,
+
+36
+00:01:08,690 --> 00:01:09,710
+want to take it's input image
+我们想采用它的输入图像
+
+37
+00:01:10,060 --> 00:01:11,370
+and recognize what character it is.
+去识别出它是什么字符。
+
+38
+00:01:13,330 --> 00:01:14,690
+If we go out and collect
+假如我们出去收集到
+
+39
+00:01:14,880 --> 00:01:16,270
+a large label data set,
+一个大的标签数据集,
+
+40
+00:01:16,890 --> 00:01:17,980
+here's what it is and what it look like.
+也就是它是什么和它像什么。
+
+41
+00:01:18,580 --> 00:01:21,770
+For this particular example, I've chosen a square aspect ratio.
+在这一个特殊的例子中,我选择了一个正方形的长宽比。
+
+42
+00:01:22,130 --> 00:01:23,250
+So we're taking square image patches.
+所以我们采用正方形的图像块。
+
+43
+00:01:24,180 --> 00:01:25,110
+And the goal is to take
+我们的目标是获得
+
+44
+00:01:25,770 --> 00:01:27,420
+an image patch and recognize the
+一个图像块并且识别出
+
+45
+00:01:27,530 --> 00:01:29,270
+character in the middle of that image patch.
+图像块中央的字符。
+
+46
+00:01:31,090 --> 00:01:31,990
+And for the sake of simplicity,
+为了简单,
+
+47
+00:01:32,660 --> 00:01:33,740
+I'm going to treat these images
+我打算把这些图像当做
+
+48
+00:01:34,240 --> 00:01:36,380
+as grey scale images, rather than color images.
+灰度图像来处理,不是把它们当做彩色图像。
+
+49
+00:01:36,870 --> 00:01:38,550
+It turns out that using color
+事实证明把它们当做彩色图像来处理
+
+50
+00:01:38,930 --> 00:01:41,180
+doesn't seem to help that much for this particular problem.
+对于这个具体的问题而言看起来帮助不大。
+
+51
+00:01:42,190 --> 00:01:43,530
+So given this image patch, we'd
+对于这个给定的图像块,
+
+52
+00:01:43,660 --> 00:01:44,890
+like to recognize that that's a
+我们会把它识别为"T"。
+
+53
+00:01:45,010 --> 00:01:46,230
+T. Given this image patch,
+对于这个图像块,
+
+54
+00:01:46,550 --> 00:01:47,920
+we'd like to recognize that it's an 'S'.
+我们会把它识别为一个"S".
+
+55
+00:01:49,540 --> 00:01:50,740
+Given that image patch we
+对于这个图像块,我们
+
+56
+00:01:50,850 --> 00:01:52,950
+would like to recognize that as an 'I' and so on.
+会把它识别为一个“I”,等等。
+
+57
+00:01:54,110 --> 00:01:55,310
+So all of these, our
+因此,对于所有
+
+58
+00:01:55,450 --> 00:01:57,240
+examples of row images, how
+这些行图像的例子,
+
+59
+00:01:57,380 --> 00:01:59,460
+can we come up with a much larger training set?
+我们该如何得到一个更大的训练集呢?
+
+60
+00:02:00,000 --> 00:02:01,580
+Modern computers often have a
+现代计算机通常有一个
+
+61
+00:02:01,640 --> 00:02:03,700
+huge font library and
+庞大的字体库,
+
+62
+00:02:03,890 --> 00:02:05,330
+if you use a word processing
+并且假如你使用一个字处理
+
+63
+00:02:05,950 --> 00:02:07,090
+software, depending on what word
+软件,主要看你使用的
+
+64
+00:02:07,240 --> 00:02:08,580
+processor you use, you might
+是什么字处理软件,你可能
+
+65
+00:02:08,800 --> 00:02:09,980
+have all of these fonts and
+有所有这些字体,
+
+66
+00:02:10,120 --> 00:02:12,490
+many, many more Already stored inside.
+并且还有更多的已经存储在里面了。
+
+67
+00:02:12,950 --> 00:02:14,350
+And, in fact, if you go different websites, there
+并且,事实上,假如你去不同的网站,
+
+68
+00:02:14,680 --> 00:02:16,280
+are, again, huge, free font
+网上还有其它的大的,
+
+69
+00:02:16,690 --> 00:02:18,200
+libraries on the internet we
+免费的字体库,从那里
+
+70
+00:02:18,370 --> 00:02:19,960
+can download many, many different
+我们能下载许多许多不同
+
+71
+00:02:20,250 --> 00:02:22,580
+types of fonts, hundreds or perhaps thousands of different fonts.
+类型的字体,几百甚至是几千种不同的字体。
+
+72
+00:02:23,960 --> 00:02:25,180
+So if you want more
+所以,假如你想
+
+73
+00:02:25,570 --> 00:02:27,020
+training examples, one thing you
+得到更多训练实例,一件
+
+74
+00:02:27,100 --> 00:02:28,340
+can do is just take
+你可以做的事情正是
+
+75
+00:02:28,870 --> 00:02:30,220
+characters from different fonts
+得到不同的字体的字符,
+
+76
+00:02:31,240 --> 00:02:32,870
+and paste these characters against
+并且把这些字符粘贴到
+
+77
+00:02:33,290 --> 00:02:35,890
+different random backgrounds.
+任意不同的背景下。
+
+78
+00:02:36,730 --> 00:02:39,500
+So you might take this ---- and paste that c against a random background.
+所以,你可以得到这些字符C并且把它粘在任意的背景下。
+
+79
+00:02:40,680 --> 00:02:41,640
+If you do that you now have
+假如你做了这些,那么你现在
+
+80
+00:02:42,060 --> 00:02:43,830
+a training example of an
+就有了一个关于
+
+81
+00:02:44,080 --> 00:02:45,260
+image of the character C.
+字符C的图像的训练样例。
+
+82
+00:02:46,360 --> 00:02:47,500
+So after some amount of
+所以在完成一定数量的
+
+83
+00:02:47,570 --> 00:02:48,920
+work, you know this,
+工作之后,你就会发现
+
+84
+00:02:48,980 --> 00:02:49,710
+and it is a little bit of
+合成这些逼真的数据
+
+85
+00:02:49,830 --> 00:02:51,760
+work to synthisize realistic looking data.
+只有很少的工作要做。
+
+86
+00:02:52,180 --> 00:02:53,080
+But after some amount of work,
+在完成一定量的工作之后,
+
+87
+00:02:53,700 --> 00:02:56,130
+you can get a synthetic training set like that.
+你可以得到像那样的合成的训练集。
+
+88
+00:02:57,180 --> 00:02:59,910
+Every image shown on the right was actually a synthesized image.
+在右侧显示的图像实际上是一个合成的图像。
+
+89
+00:03:00,360 --> 00:03:02,080
+Where you take a font,
+当你采用一个字体时,
+
+90
+00:03:02,810 --> 00:03:04,240
+maybe a random font downloaded off
+可能是一个从网上下载的字体,
+
+91
+00:03:04,400 --> 00:03:05,680
+the web and you paste
+你把基于这种字体的
+
+92
+00:03:06,160 --> 00:03:07,320
+an image of one character
+一个字符的图像或者
+
+93
+00:03:07,800 --> 00:03:08,870
+or a few characters from that font
+是几个字符的图像
+
+94
+00:03:09,570 --> 00:03:11,440
+against this other random background image.
+粘贴到另一个任意的背景图像下。
+
+95
+00:03:12,140 --> 00:03:12,840
+And then apply maybe a little
+可以应用一点
+
+96
+00:03:13,540 --> 00:03:15,160
+blurring operators -----of app
+模糊算子,比如
+
+97
+00:03:15,680 --> 00:03:17,380
+finder, distortions that app
+应用程序取景器,失真,
+
+98
+00:03:17,620 --> 00:03:18,650
+finder, meaning just the sharing
+应用程序取景器,意味着共享,
+
+99
+00:03:19,350 --> 00:03:20,740
+and scaling and little rotation
+缩放和轻微的
+
+100
+00:03:21,000 --> 00:03:22,260
+operations and if you
+旋转操作,假如你
+
+101
+00:03:22,370 --> 00:03:23,330
+do that you get a synthetic
+做了这些,你得到一个
+
+102
+00:03:23,580 --> 00:03:25,520
+training set, on what the one shown here.
+合成的训练集,就是这里显示的这个。
+
+103
+00:03:26,510 --> 00:03:28,050
+And this is work,
+这种工作,
+
+104
+00:03:28,530 --> 00:03:29,640
+grade, it is, it takes
+它也是有好有坏的,
+
+105
+00:03:29,970 --> 00:03:31,460
+thought at work, in order to
+为了使合成的数据更逼真,
+
+106
+00:03:31,700 --> 00:03:33,250
+make the synthetic data look realistic,
+在工作中是需要花费心思的,
+
+107
+00:03:34,020 --> 00:03:34,710
+and if you do a sloppy
+如果在生成合成数据的工作中
+
+108
+00:03:35,120 --> 00:03:36,200
+job in terms of how
+你没有认真去做,
+
+109
+00:03:36,250 --> 00:03:38,910
+you create the synthetic data then it actually won't work well.
+那么所生成的合成数据实际上是不能有效工作的。
+
+110
+00:03:39,620 --> 00:03:40,600
+But if you look at
+但是,假如你看到合成的
+
+111
+00:03:40,790 --> 00:03:43,940
+the synthetic data looks remarkably similar to the real data.
+数据与真实数据非常相似,
+
+112
+00:03:45,120 --> 00:03:46,850
+And so by using synthetic data
+那么使用合成的数据,
+
+113
+00:03:47,340 --> 00:03:48,550
+you have essentially an unlimited
+你就必然能为
+
+114
+00:03:48,990 --> 00:03:50,970
+supply of training examples for
+你的人工训练合成提供
+
+115
+00:03:51,310 --> 00:03:53,060
+artificial training synthesis And
+无限的训练数据样例。
+
+116
+00:03:53,150 --> 00:03:54,110
+so, if you use this
+因此,假如你使用
+
+117
+00:03:54,330 --> 00:03:55,820
+source synthetic data, you have
+这个合成数据源,你就必然
+
+118
+00:03:56,150 --> 00:03:58,100
+essentially unlimited supply of
+就可以利用这些无限的
+
+119
+00:03:58,560 --> 00:04:00,000
+label data to create
+标签数据为字符识别
+
+120
+00:04:00,140 --> 00:04:01,610
+a improvised learning algorithm
+问题生成一个
+
+121
+00:04:02,300 --> 00:04:03,990
+for the character recognition problem.
+学习算法。
+
+122
+00:04:05,120 --> 00:04:06,540
+So this is an example of
+所以这是一个
+
+123
+00:04:07,000 --> 00:04:08,500
+artificial data synthesis where youre
+数据合成的例子,
+
+124
+00:04:09,040 --> 00:04:10,870
+basically creating new data from
+你基本是从零开始
+
+125
+00:04:11,080 --> 00:04:13,780
+scratch, you just generating brand new images from scratch.
+产生数据,也就是你从零开始产生新的图像。
+
+126
+00:04:14,880 --> 00:04:16,450
+The other main approach to artificial data
+另一个主要的人工
+
+127
+00:04:16,710 --> 00:04:18,210
+synthesis is where you
+数据合成方式是
+
+128
+00:04:18,370 --> 00:04:19,610
+take a examples that you
+你使用一个当前
+
+129
+00:04:19,740 --> 00:04:20,780
+currently have, that we take
+已经有的样例,也就是说
+
+130
+00:04:21,020 --> 00:04:22,430
+a real example, maybe from
+我们有一个真实的样例,
+
+131
+00:04:22,700 --> 00:04:24,130
+real image, and you create
+可能是一个真实的图像,
+
+132
+00:04:24,770 --> 00:04:26,130
+additional data, so as to
+你产生附加的数据,
+
+133
+00:04:26,380 --> 00:04:27,900
+amplify your training set.
+以扩充你的训练集。
+
+134
+00:04:28,070 --> 00:04:28,810
+So here is an image of a compared
+这是一个真实图像
+
+135
+00:04:28,910 --> 00:04:30,490
+to a from a real image,
+的对比图像,
+
+136
+00:04:31,410 --> 00:04:32,550
+not a synthesized image, and
+不是一个合成的图像,
+
+137
+00:04:32,630 --> 00:04:33,790
+I have overlayed this with
+我在上面覆盖了网格线
+
+138
+00:04:33,880 --> 00:04:35,750
+the grid lines just for the purpose of illustration.
+只是为了说明问题。
+
+139
+00:04:36,430 --> 00:04:36,880
+Actually have these ----.
+实际上有这许多。
+
+140
+00:04:36,970 --> 00:04:39,030
+So what you
+所以你能做
+
+141
+00:04:39,100 --> 00:04:40,110
+can do is then take this
+的是把字母放在
+
+142
+00:04:40,480 --> 00:04:41,500
+alphabet here, take this image
+这里,向这图像中
+
+143
+00:04:42,240 --> 00:04:43,760
+and introduce artificial warpings[sp?]
+引入一些人工的拉伸
+
+144
+00:04:44,290 --> 00:04:45,810
+or artificial distortions into the
+或者是一些
+
+145
+00:04:46,040 --> 00:04:47,030
+image so they can
+人工的失真,
+
+146
+00:04:47,220 --> 00:04:48,240
+take the image a and turn
+经过这些操作,可以把
+
+147
+00:04:48,430 --> 00:04:50,060
+that into 16 new examples.
+字母A变成这16个新的样例。
+
+148
+00:04:51,110 --> 00:04:52,000
+So in this way you can
+所以采用这种办法,
+
+149
+00:04:52,450 --> 00:04:53,740
+take a small label training set
+你可以得到一个小的标签训练集
+
+150
+00:04:54,090 --> 00:04:55,360
+and amplify your training set
+并且你扩充你的训练集,
+
+151
+00:04:56,180 --> 00:04:57,190
+to suddenly get a lot
+突然得到
+
+152
+00:04:57,300 --> 00:05:00,020
+more examples, all of it.
+更多样例,所有的这些图像。
+
+153
+00:05:01,210 --> 00:05:02,360
+Again, in order to do
+此外,在这一应用中
+
+154
+00:05:02,560 --> 00:05:03,940
+this for application, it does
+所做的这些,
+
+155
+00:05:04,120 --> 00:05:05,060
+take thought and it does
+需要花费心思,
+
+156
+00:05:05,140 --> 00:05:06,270
+take insight to figure out
+需要洞察力去
+
+157
+00:05:06,430 --> 00:05:07,840
+what our reasonable sets of
+判断出合理的失真
+
+158
+00:05:08,420 --> 00:05:09,460
+distortions, or whether these
+操作集,或者是这些操作
+
+159
+00:05:09,720 --> 00:05:11,000
+are ways that amplify and multiply
+是否是扩充和增加
+
+160
+00:05:11,470 --> 00:05:12,760
+your training set, and for
+训练集的方法,
+
+161
+00:05:13,070 --> 00:05:15,130
+the specific example of
+对于字符识别这一
+
+162
+00:05:15,260 --> 00:05:17,170
+character recognition, introducing these
+具体的例子,引入这些
+
+163
+00:05:17,480 --> 00:05:18,310
+warping seems like a natural
+拉伸看起来是一个
+
+164
+00:05:18,780 --> 00:05:19,910
+choice, but for a
+很自然的选择,但是对于不同
+
+165
+00:05:20,090 --> 00:05:21,970
+different learning machine application, there may
+机器学习应用来说,可能
+
+166
+00:05:22,080 --> 00:05:24,180
+be different the distortions that might make more sense.
+另外一些不同的失真将会更合理。
+
+167
+00:05:24,860 --> 00:05:25,600
+Let me just show one example
+让我给大家展示一个
+
+168
+00:05:26,180 --> 00:05:28,750
+from the totally different domain of speech recognition.
+完全不同的语音识别领域的问题。
+
+169
+00:05:30,230 --> 00:05:31,480
+So the speech recognition, let's say
+对于语音识别,假如说
+
+170
+00:05:31,580 --> 00:05:33,450
+you have audio clips and you
+你有音频片段,
+
+171
+00:05:33,600 --> 00:05:35,010
+want to learn from the audio
+你想从中
+
+172
+00:05:35,350 --> 00:05:37,240
+clip to recognize what were
+识别出哪些单词出现在了
+
+173
+00:05:37,460 --> 00:05:38,780
+the words spoken in that clip.
+语音片段中。
+
+174
+00:05:39,510 --> 00:05:41,340
+So let's see how one labeled training example.
+因此,让我们来看看是如何给训练样例加标签的。
+
+175
+00:05:42,290 --> 00:05:43,190
+So let's say you have one
+那么,让我们说假定你已经有了一个
+
+176
+00:05:43,400 --> 00:05:45,000
+labeled training example, of someone
+加了标签的训练样例,就是
+
+177
+00:05:45,330 --> 00:05:46,660
+saying a few specific words.
+某个人在说一些特定的单词。
+
+178
+00:05:46,860 --> 00:05:48,720
+So let's play that audio clip here.
+因此,让我们播放一下这个语音片段。
+
+179
+00:05:49,150 --> 00:05:51,230
+0 -1-2-3-4-5.
+0,1,2,3,4,5.
+
+180
+00:05:51,570 --> 00:05:53,810
+Alright, so someone
+好吧,有人在
+
+181
+00:05:54,220 --> 00:05:55,110
+counting from 0 to 5,
+从0数到5.
+
+182
+00:05:55,450 --> 00:05:57,180
+and so you want to
+然后你想要应用
+
+183
+00:05:57,290 --> 00:05:58,460
+try to apply a learning algorithm
+一个学习算法去试图
+
+184
+00:05:59,380 --> 00:06:01,320
+to try to recognize the words said in that.
+识别出那个人说了哪些单词。
+
+185
+00:06:02,040 --> 00:06:04,030
+So, how can we amplify the data set?
+那么,我们该如何扩充数据集呢?
+
+186
+00:06:04,390 --> 00:06:05,340
+Well, one thing we do is
+很好,我们可以做的一件事情就是
+
+187
+00:06:06,020 --> 00:06:09,180
+introduce additional audio distortions into the data set.
+引入附加的语音失真到数据集中。
+
+188
+00:06:09,970 --> 00:06:10,960
+So here I'm going to
+所以,我将加入
+
+189
+00:06:11,640 --> 00:06:14,700
+add background sounds to simulate a bad cell phone connection.
+加入一些背景声音去模拟一个较差的手机通话连接。
+
+190
+00:06:15,360 --> 00:06:16,800
+When you hear beeping sounds, that's
+当你听到蜂鸣声,实现上
+
+191
+00:06:16,980 --> 00:06:17,710
+actually part of the audio
+这是音频记录的一部分,
+
+192
+00:06:17,740 --> 00:06:20,350
+track, that's nothing wrong with the speakers, I'm going to play this now.
+不是说话者的错误,现在我开始播放了。
+
+193
+00:06:20,580 --> 00:06:21,379
+0-1-2-3-4-5.
+0,1,2,3,4,5.
+
+194
+00:06:21,380 --> 00:06:22,260
+Right, so you can listen
+好了,你只可以听到
+
+195
+00:06:22,640 --> 00:06:24,890
+to that sort of audio clip and
+那种音频片段, 并且
+
+196
+00:06:25,720 --> 00:06:28,600
+recognize the sounds,
+识别出声音,
+
+197
+00:06:28,960 --> 00:06:30,800
+that seems like another useful training
+这看起来像是另外一种
+
+198
+00:06:31,370 --> 00:06:33,230
+example to have, here's another example, noisy background.
+值得拥有的训练样例。这是另外一种例子,吵杂的背景。
+
+199
+00:06:34,890 --> 00:06:36,870
+Zero, one, two, three
+0,1,2,3
+
+200
+00:06:37,560 --> 00:06:39,060
+four five you know
+4,5,在背景中还有
+
+201
+00:06:39,090 --> 00:06:40,280
+of cars driving past, people walking
+行使的汽车经过,人在走路。
+
+202
+00:06:40,580 --> 00:06:42,200
+in the background, here's another
+这是另外
+
+203
+00:06:42,450 --> 00:06:43,880
+one, so taking the original
+一个,使用原来的干净的音频片段,
+
+204
+00:06:44,430 --> 00:06:45,980
+clean audio clip so
+原来那个音频片段
+
+205
+00:06:46,090 --> 00:06:47,810
+taking the clean audio of
+有人在说
+
+206
+00:06:47,990 --> 00:06:48,960
+someone saying 0 1 2 3
+0,1,2,3
+
+207
+00:06:49,090 --> 00:06:50,490
+4 5 we can then automatically
+4,5,然后,我们能自动
+
+208
+00:06:51,790 --> 00:06:54,090
+synthesize these additional training
+合成这些附加的
+
+209
+00:06:54,470 --> 00:06:55,850
+examples and thus amplify
+训练集,把单个的训练集
+
+210
+00:06:56,410 --> 00:06:57,860
+one training example into maybe four different training examples.
+扩展到可能四种不同的训练样例。
+
+211
+00:07:00,110 --> 00:07:00,940
+So let me play this final
+所以让我也播放一下最后
+
+212
+00:07:01,300 --> 00:07:03,180
+example, as well.
+这个样例。
+
+213
+00:07:03,340 --> 00:07:07,180
+0-1 3-4-5 So by
+0,1,3,4,5. 因此,通过
+
+214
+00:07:07,530 --> 00:07:08,510
+taking just one labelled example,
+使用一个已经加了标签的样例,
+
+215
+00:07:09,000 --> 00:07:10,260
+we have to go through the effort
+我们不得不通过努力
+
+216
+00:07:10,360 --> 00:07:11,760
+to collect just one labelled example
+去以收集另一个带
+
+217
+00:07:11,950 --> 00:07:13,270
+fall of the 01205, and by
+标签的样例01205,通过
+
+218
+00:07:14,140 --> 00:07:16,520
+synthesizing additional distortions,
+合成附加的失真,
+
+219
+00:07:17,290 --> 00:07:18,560
+by introducing different background sounds,
+通过引入不同的背景声音,
+
+220
+00:07:19,000 --> 00:07:20,240
+we've now multiplied this one
+我们把一个样例扩充
+
+221
+00:07:20,370 --> 00:07:21,810
+example into many more examples.
+成许多更多的样例。
+
+222
+00:07:23,420 --> 00:07:24,480
+Much work by just automatically
+最的做多的工作就是自动
+
+223
+00:07:25,270 --> 00:07:27,090
+adding these different background sounds
+添加许多不同的背景声音
+
+224
+00:07:27,680 --> 00:07:30,510
+to the clean audio Just
+到干净的音频中。对于引入失真
+
+225
+00:07:30,740 --> 00:07:31,980
+one word of warning about synthesizing
+去合成数据,
+
+226
+00:07:33,190 --> 00:07:35,220
+data by introducing distortions: if
+只有一点要提醒的是
+
+227
+00:07:35,310 --> 00:07:36,630
+you try to do this
+假如你尝试
+
+228
+00:07:36,810 --> 00:07:38,580
+yourself, the distortions you
+自己去完成,你引入的失真
+
+229
+00:07:39,020 --> 00:07:40,300
+introduce should be representative the source
+应该对于背景声音
+
+230
+00:07:40,660 --> 00:07:42,010
+of noises, or distortions, that
+或者失真而言具有代表性,
+
+231
+00:07:42,110 --> 00:07:43,680
+you might see in the test set.
+这一点你会在测试集中发现。
+
+232
+00:07:44,010 --> 00:07:45,350
+So, for the character recognition example,
+所以,对于字符识别的例子,
+
+233
+00:07:45,930 --> 00:07:47,230
+you know, the working things
+你知道,开始引入的
+
+234
+00:07:47,440 --> 00:07:48,620
+begin introduced are actually kind
+可工作的事物实际上是
+
+235
+00:07:48,770 --> 00:07:49,980
+of reasonable, because an image
+某种合理的事物,因为
+
+236
+00:07:50,340 --> 00:07:51,510
+A that looks like that, that's,
+一个A的图像就是像那个样子,
+
+237
+00:07:52,000 --> 00:07:53,020
+could be an image that
+也就是说,是一个我们实际上可以在
+
+238
+00:07:53,210 --> 00:07:55,170
+we could actually see in a test set.Reflect
+一个测试集中看到的那样的图像。
+
+239
+00:07:55,370 --> 00:07:57,180
+a fact And, you know, that
+回想一下这一事实。 你知道,
+
+240
+00:07:57,380 --> 00:08:00,200
+image on the upper-right, that
+右上边的那个图像是一个
+
+241
+00:08:00,350 --> 00:08:01,800
+could be an image that we could imagine seeing.
+在我们想象中能看到的图像。
+
+242
+00:08:03,280 --> 00:08:04,570
+And for audio, well, we do
+对于音频而言,
+
+243
+00:08:04,740 --> 00:08:06,560
+wanna recognize speech, even against
+我们想要识别出的讲话,
+
+244
+00:08:06,970 --> 00:08:07,990
+a bad self internal connection, against
+即使是在一个差的在各种不同类型的
+
+245
+00:08:08,480 --> 00:08:09,440
+different types of background noise, and
+背景噪音中的自我内部连接,
+
+246
+00:08:09,590 --> 00:08:10,920
+so for the audio, we're again
+因此,对于音频,我们
+
+247
+00:08:11,230 --> 00:08:12,800
+synthesizing examples are actually
+再次合成样例,它们能够
+
+248
+00:08:13,530 --> 00:08:14,770
+representative of the sorts of
+代表不同类型的
+
+249
+00:08:14,850 --> 00:08:15,830
+examples that we want to
+我们想要区分开的样例,
+
+250
+00:08:15,990 --> 00:08:17,360
+classify, that we want to recognize correctly.
+而且我们也想要能够正确的识别。
+
+251
+00:08:18,770 --> 00:08:20,660
+In contrast, usually it does
+相反,通常当你
+
+252
+00:08:20,770 --> 00:08:21,940
+not help perhaps you actually
+把一个有意义的数据当作噪音
+
+253
+00:08:22,170 --> 00:08:23,760
+a meaning as noise to your data.
+加入到你的数据中,对工作没有多大帮助。
+
+254
+00:08:24,420 --> 00:08:25,170
+I'm not sure you can see
+我不确定你能明白这个,
+
+255
+00:08:25,440 --> 00:08:26,400
+this, but what we've done
+但是我们这里已经做的
+
+256
+00:08:26,620 --> 00:08:28,050
+here is taken the image, and
+就是拍摄了照片,
+
+257
+00:08:28,210 --> 00:08:29,540
+for each pixel, in each
+并且对于4幅
+
+258
+00:08:29,720 --> 00:08:30,710
+of these 4 images, has just
+图中的每一个像素,都加入
+
+259
+00:08:30,990 --> 00:08:32,970
+added some random Gaussian noise to each pixel.
+一些随机的高斯噪音。
+
+260
+00:08:33,240 --> 00:08:34,690
+To each pixel, is the
+对于每一个像素,
+
+261
+00:08:35,060 --> 00:08:36,370
+pixel brightness, it would
+也就是像素亮度,
+
+262
+00:08:36,500 --> 00:08:38,880
+just add some, you know, maybe Gaussian random noise to each pixel.
+加入一些可能是高斯随机噪音。
+
+263
+00:08:39,360 --> 00:08:40,940
+So it's just a totally meaningless noise, right?
+所以,它只是一个完全没有意义的噪音,对吧?
+
+264
+00:08:41,650 --> 00:08:43,280
+And so, unless you're expecting
+因此,除非你期望在你
+
+265
+00:08:43,800 --> 00:08:45,510
+to see these sorts of pixel
+的测试集中看到这种
+
+266
+00:08:45,910 --> 00:08:46,830
+wise noise in your test
+像素间的对比噪音,
+
+267
+00:08:46,910 --> 00:08:48,190
+set, this sort of
+那么这种纯随机的无意义
+
+268
+00:08:48,660 --> 00:08:51,540
+purely random meaningless noise is less likely to be useful.
+的噪音将很可能就是无用的。
+
+269
+00:08:52,880 --> 00:08:53,750
+But the process of artificial
+但是这种人工
+
+270
+00:08:54,250 --> 00:08:55,570
+data synthesis it is you
+数据合成的过程是你
+
+271
+00:08:55,640 --> 00:08:56,660
+know a little bit of
+知道一点技巧,
+
+272
+00:08:56,710 --> 00:08:57,850
+an art as well and sometimes
+并且有时候你
+
+273
+00:08:58,140 --> 00:09:00,250
+you just have to try it and see if it works.
+仅仅是尝试一下,看它是否起作用。
+
+274
+00:09:01,280 --> 00:09:02,060
+But if you're trying to
+但是当你在决定
+
+275
+00:09:02,140 --> 00:09:03,170
+decide what sorts of distortions
+该添加哪种
+
+276
+00:09:03,870 --> 00:09:04,720
+to add, you know, do
+失真时,
+
+277
+00:09:04,820 --> 00:09:06,260
+think about what other meaningful
+考虑其他你
+
+278
+00:09:06,670 --> 00:09:08,180
+distortions you might add that
+可能添加的有意义的失真,
+
+279
+00:09:08,660 --> 00:09:09,720
+will cause you to generate additional
+可能会导致你生成一些附加
+
+280
+00:09:10,110 --> 00:09:11,370
+training examples that are at
+训练样例,它们相对于你希望在
+
+281
+00:09:11,880 --> 00:09:13,410
+least somewhat representative of the
+测试集中见到的图像而言,不具有
+
+282
+00:09:13,480 --> 00:09:15,830
+sorts of images you expect to see in your test sets.
+代表性.
+
+283
+00:09:18,100 --> 00:09:19,000
+Finally, to wrap up this
+最后,在完成这一视频时,
+
+284
+00:09:19,150 --> 00:09:19,920
+video, I just wanna say
+我想说
+
+285
+00:09:20,140 --> 00:09:21,420
+a couple of words, more about
+几句话,更多的是关于
+
+286
+00:09:21,790 --> 00:09:23,360
+this idea of getting loss
+通过数据合成去解决
+
+287
+00:09:23,600 --> 00:09:25,610
+of data via artificial data synthesis.
+数据亏损问题。
+
+288
+00:09:26,920 --> 00:09:28,780
+As always, before expending a lot
+像以前一样,在作出
+
+289
+00:09:29,170 --> 00:09:30,280
+of effort, you know, figuring out
+努力之前,应该想到
+
+290
+00:09:30,450 --> 00:09:32,020
+how to create artificial training
+如何生成人工训练
+
+291
+00:09:33,060 --> 00:09:34,140
+examples, it's often a good
+样例,这是一个很好的
+
+292
+00:09:34,220 --> 00:09:35,310
+practice is to make sure
+做法,它可以确保
+
+293
+00:09:35,650 --> 00:09:36,540
+that you really have a low biased
+你真正有一个非常低偏差的
+
+294
+00:09:36,920 --> 00:09:38,350
+crossfire, and having a
+问题,有更多的
+
+295
+00:09:38,460 --> 00:09:40,320
+lot more training data will be of help.
+训练数据将会更有帮助。
+
+296
+00:09:41,010 --> 00:09:41,840
+And standard way to do
+一个标准物方法是
+
+297
+00:09:41,970 --> 00:09:42,810
+this is to plot the learning
+绘制一个学习
+
+298
+00:09:43,030 --> 00:09:43,970
+curves, and make sure that
+曲线,确定你只有
+
+299
+00:09:44,130 --> 00:09:44,920
+you only have a low
+一个低偏差
+
+300
+00:09:45,000 --> 00:09:47,470
+as well, high variance falsifier.
+同时也是高方差的伪造器。
+
+301
+00:09:47,760 --> 00:09:48,650
+Or if you don't have a low
+或者如果你没有一个低
+
+302
+00:09:48,720 --> 00:09:50,090
+bias falsifier, you know,
+偏差的伪造器,那么,
+
+303
+00:09:50,160 --> 00:09:51,040
+one other thing that's worth trying
+另外一件值得尝试的事情
+
+304
+00:09:51,450 --> 00:09:53,270
+is to keep increasing the number
+就是持续增加
+
+305
+00:09:53,540 --> 00:09:54,440
+of features that your classifier
+你的分类器的特征数量,
+
+306
+00:09:54,600 --> 00:09:55,650
+has, increasing the number of
+增加你的
+
+307
+00:09:55,740 --> 00:09:56,710
+hidden units in your network,
+网络的隐藏单元的数量,
+
+308
+00:09:57,180 --> 00:09:58,470
+saying, until you actually have a
+也就是说,直到你实际上有了
+
+309
+00:09:58,540 --> 00:10:00,000
+low bias falsifier, and only
+一个低偏差的伪造器,
+
+310
+00:10:00,310 --> 00:10:01,820
+then, should you put
+并且是直到那时,你才应该
+
+311
+00:10:02,040 --> 00:10:04,020
+the effort into creating a
+花费力气去生成
+
+312
+00:10:04,260 --> 00:10:05,760
+large, artificial training set, so
+大量的,人工的训练集,所以
+
+313
+00:10:05,860 --> 00:10:06,660
+what you really want to avoid
+你真正想要避免的
+
+314
+00:10:06,870 --> 00:10:07,930
+is to, you know, spend
+就是,花费一
+
+315
+00:10:08,110 --> 00:10:08,890
+a whole week or spend a few
+整个星期或者是花费
+
+316
+00:10:09,090 --> 00:10:10,370
+months figuring out how
+几个月去想该
+
+317
+00:10:10,540 --> 00:10:11,720
+to get a great artificially
+如何去得到一个大的
+
+318
+00:10:12,450 --> 00:10:13,260
+synthesized data set.
+人工合成的数据集。
+
+319
+00:10:13,820 --> 00:10:15,520
+Only to realize afterward, that,
+只实现后面的工作,
+
+320
+00:10:15,760 --> 00:10:17,410
+you know, your learning algorithm, performance
+你知道的,即使当你有大量的训练集,
+
+321
+00:10:18,030 --> 00:10:20,730
+doesn't improve that much, even when you're given a huge training set.
+你的学习算法的性能也不可能改进太多。
+
+322
+00:10:22,190 --> 00:10:23,060
+So that's about my usual advice
+所以,这也是我通常的
+
+323
+00:10:23,420 --> 00:10:24,690
+about of a testing that
+关于测试的一个建议,
+
+324
+00:10:25,030 --> 00:10:26,290
+you really can make use
+在你花费很大的力气去
+
+325
+00:10:26,530 --> 00:10:27,760
+of a large training set before
+寻找大的训练集以前,
+
+326
+00:10:28,080 --> 00:10:30,530
+spending a lot of effort going out to get that large training set.
+你可以真正利用这样一个大的训练集。
+
+327
+00:10:31,960 --> 00:10:33,280
+Second is, when i'm working
+第二,就是当我
+
+328
+00:10:33,590 --> 00:10:35,250
+on machine learning problems, one question
+在研究机器学习问题时,我经常问
+
+329
+00:10:35,690 --> 00:10:37,520
+I often ask the team
+和我一起工作的小组,
+
+330
+00:10:37,880 --> 00:10:39,210
+I'm working with, often ask my
+我也经常问我的学生
+
+331
+00:10:39,430 --> 00:10:40,550
+students, which is, how much work
+的一个问题,就是获得我当前拥有的数据
+
+332
+00:10:40,620 --> 00:10:42,810
+would it be to get 10 times as much date as we currently had.
+的10倍的数据将会花费多少工作。
+
+333
+00:10:46,720 --> 00:10:47,850
+When I face a new machine
+当我面临一个新的
+
+334
+00:10:48,200 --> 00:10:49,760
+learning application very often I
+机器学习应用时,通常
+
+335
+00:10:49,980 --> 00:10:50,940
+will sit down with a team
+我会和整个小组一起坐下来,
+
+336
+00:10:51,210 --> 00:10:52,440
+and ask exactly this question,
+仔细询问这个问题,
+
+337
+00:10:52,920 --> 00:10:53,870
+I've asked this question over and
+我已经反复的问过这个问题,
+
+338
+00:10:53,970 --> 00:10:55,870
+over and over and I've
+并且,我一直也很
+
+339
+00:10:56,000 --> 00:10:57,540
+been very surprised how often
+惊讶问题的答案
+
+340
+00:10:58,390 --> 00:10:59,660
+this answer has been that.
+通常都是那样。
+
+341
+00:11:00,010 --> 00:11:01,070
+You know, it's really not that hard,
+你知道,它其实没有那么难,
+
+342
+00:11:01,680 --> 00:11:02,670
+maybe a few days of work
+可能至多是
+
+343
+00:11:02,930 --> 00:11:03,930
+at most, to get ten times
+几天的工作,就可以获得
+
+344
+00:11:04,250 --> 00:11:05,300
+as much data as we currently
+10倍于当前
+
+345
+00:11:05,450 --> 00:11:06,650
+have for a machine
+我们的机器
+
+346
+00:11:06,810 --> 00:11:08,820
+running application and very
+运行应用的数据,并且
+
+347
+00:11:09,080 --> 00:11:09,830
+often if you can get
+通常是假如你有当前
+
+348
+00:11:09,950 --> 00:11:11,030
+ten times as much data there
+的十倍的数据,那么
+
+349
+00:11:11,270 --> 00:11:13,680
+will be a way to make your algorithm do much better.
+将会有办法使你的算法工作得更好。
+
+350
+00:11:14,060 --> 00:11:15,040
+So, you know, if you
+因此,假如
+
+351
+00:11:15,260 --> 00:11:16,510
+ever join the product team
+你曾经参加了
+
+352
+00:11:17,820 --> 00:11:18,880
+working on some machine learning
+过专门从事研发一些
+
+353
+00:11:19,110 --> 00:11:20,430
+application product this is
+机器学习应用产品的小组,
+
+354
+00:11:20,550 --> 00:11:21,710
+a very good questions ask yourself
+这是一个很好的你应该问自己、
+
+355
+00:11:22,290 --> 00:11:23,500
+ask the team don't be
+问小组的问题,不要太惊讶
+
+356
+00:11:23,650 --> 00:11:25,120
+too surprised if after a
+假如经过几分钟
+
+357
+00:11:25,240 --> 00:11:26,530
+few minutes of brainstorming if your
+的讨论后,你的小组就想到
+
+358
+00:11:26,650 --> 00:11:27,520
+team comes up with a
+了可以得到
+
+359
+00:11:27,660 --> 00:11:28,950
+way to get literally ten
+毫不夸张的十倍于
+
+360
+00:11:29,200 --> 00:11:30,250
+times this much data, in
+现在拥有的数据的办法,
+
+361
+00:11:30,380 --> 00:11:31,320
+which case, I think you would
+在这种情况下,
+
+362
+00:11:31,430 --> 00:11:32,330
+be a hero to that team,
+你将成为小组的英雄,
+
+363
+00:11:32,940 --> 00:11:34,000
+because with 10 times as
+因为如果有了十倍多的数据,
+
+364
+00:11:34,240 --> 00:11:35,360
+much data, I think you'll really
+并且从这么多的数据中学习,
+
+365
+00:11:35,450 --> 00:11:38,460
+get much better performance, just from learning from so much data.
+我认为你的算法将会有更好的性能。
+
+366
+00:11:39,650 --> 00:11:44,500
+So there are several waysand
+因此,有几个方法
+
+367
+00:11:47,450 --> 00:11:48,510
+that comprised both the ideas
+可以包括利用
+
+368
+00:11:48,970 --> 00:11:50,440
+of generating data from
+随机字体
+
+369
+00:11:50,640 --> 00:11:53,050
+scratch using random fonts and so on.
+来从头产生数据等思想。
+
+370
+00:11:53,570 --> 00:11:54,430
+As well as the second idea
+同样的,第二个思想
+
+371
+00:11:54,840 --> 00:11:56,600
+of taking an existing example and
+是利用现有的例子,
+
+372
+00:11:56,670 --> 00:11:58,100
+and introducing distortions that amplify
+引入一些失真
+
+373
+00:11:58,280 --> 00:12:00,910
+to enlarge the training set A
+来增加、扩大训练集。
+
+374
+00:12:01,090 --> 00:12:02,150
+couple of other examples of
+一些其它获得得更多
+
+375
+00:12:02,280 --> 00:12:03,130
+ways to get a lot more
+数据的方法
+
+376
+00:12:03,270 --> 00:12:04,610
+data are to collect the
+的例子就是就是收集数据
+
+377
+00:12:04,670 --> 00:12:06,600
+data or to label them yourself.
+或者自己给他们加标签。
+
+378
+00:12:07,600 --> 00:12:09,090
+So one useful calculation that
+因此一个我经常
+
+379
+00:12:09,210 --> 00:12:11,580
+I often do is, you know,
+做的有用的计算就是
+
+380
+00:12:11,780 --> 00:12:13,320
+how many minutes, how many
+获得一定数量
+
+381
+00:12:13,520 --> 00:12:15,140
+hours does it take to
+的样例需要
+
+382
+00:12:15,350 --> 00:12:16,420
+get a certain number of
+多少分钟、多少小时,
+
+383
+00:12:16,610 --> 00:12:17,780
+examples, so actually sit down and
+所以实际上是
+
+384
+00:12:17,900 --> 00:12:19,410
+figure out, you know, suppose it
+坐下来估计一下,假如
+
+385
+00:12:19,550 --> 00:12:21,830
+takes me ten seconds to
+给一个样例加标签
+
+386
+00:12:22,060 --> 00:12:23,990
+label one example then
+需要十分秒钟,然后,再假设
+
+387
+00:12:24,120 --> 00:12:25,820
+and, suppose that, for
+在我们的应用中,
+
+388
+00:12:26,190 --> 00:12:29,050
+our application, currently we
+我们当前
+
+389
+00:12:29,190 --> 00:12:31,500
+have 1000 labeled examples examples
+有1000个标记过的样例。
+
+390
+00:12:31,620 --> 00:12:32,730
+so ten times as
+所以得到
+
+391
+00:12:32,860 --> 00:12:34,090
+much of that would be
+十倍数据的时间
+
+392
+00:12:34,200 --> 00:12:35,940
+if n were equal to ten thousand.
+n就是一万秒。
+
+393
+00:12:37,440 --> 00:12:40,260
+A second way to
+第二个获得
+
+394
+00:12:40,400 --> 00:12:41,530
+get a lot of data is
+更多数据的办法就是
+
+395
+00:12:41,800 --> 00:12:43,540
+to just collect the data and you label it yourself.
+收集数据,然后自己标记它。
+
+396
+00:12:44,510 --> 00:12:45,380
+So what I mean by this is
+所以,我这么说的意思是
+
+397
+00:12:45,690 --> 00:12:46,970
+I will often set down and
+我经常会坐下来
+
+398
+00:12:47,240 --> 00:12:48,570
+do a calculation to figure
+并且计算出
+
+399
+00:12:48,950 --> 00:12:50,190
+out how much time, you
+需要多少时间,
+
+400
+00:12:50,350 --> 00:12:51,140
+know just like how many hours
+比如你应该知道
+
+401
+00:12:52,640 --> 00:12:54,000
+will it take, how many
+这需要多少小时,也就是
+
+402
+00:12:54,200 --> 00:12:55,130
+hours or how many days will
+你或者其它人
+
+403
+00:12:55,230 --> 00:12:56,890
+it take for me or
+坐下来并且
+
+404
+00:12:57,020 --> 00:12:58,400
+for someone else to just sit
+收集十倍于
+
+405
+00:12:58,640 --> 00:12:59,870
+down and collect ten times
+当前所拥有的
+
+406
+00:13:00,190 --> 00:13:01,490
+as much data, as we have
+数据,并给他们
+
+407
+00:13:01,800 --> 00:13:03,560
+currently, by collecting the data ourselves and labeling them ourselves.
+加标签需要多少小时,多少天。
+
+408
+00:13:05,260 --> 00:13:06,550
+So, for example, that, for
+例如,
+
+409
+00:13:06,630 --> 00:13:08,200
+our machine learning application, currently
+我们的机器学习应用,当前
+
+410
+00:13:08,690 --> 00:13:10,180
+we have 1,000 examples, so M 1,000.
+我们有1000个样例,所以M等于1000.
+
+411
+00:13:12,010 --> 00:13:12,750
+That what we do is sit
+所以我们做的就是
+
+412
+00:13:12,870 --> 00:13:14,500
+down and ask, how long does
+就下来,问一下,实际收集
+
+413
+00:13:14,720 --> 00:13:16,930
+it take me really to collect and label one example.
+并给一个样例加标签需要多少时间。
+
+414
+00:13:17,340 --> 00:13:18,480
+And sometimes maybe it will
+有时它可能会
+
+415
+00:13:18,600 --> 00:13:19,510
+take you, you know ten
+花费你十秒
+
+416
+00:13:19,790 --> 00:13:22,100
+seconds to label
+去给一个新样例
+
+417
+00:13:23,310 --> 00:13:25,120
+one new example, and so
+加标签,因此,如果我需要
+
+418
+00:13:25,520 --> 00:13:27,720
+if I want 10 X as many examples, I'd do a calculation.
+十倍多的样例,我就会做一个计算。
+
+419
+00:13:28,360 --> 00:13:30,400
+If it takes me 10 seconds to get one training example.
+假如我需要10秒去得到一个训练样例,
+
+420
+00:13:31,370 --> 00:13:32,340
+If I wanted to get 10
+如果我需要10倍多数据,
+
+421
+00:13:32,580 --> 00:13:35,320
+times as much data, then I need 10,000 examples.
+那么我需要10000个样例。
+
+422
+00:13:35,830 --> 00:13:38,470
+So I do the calculation, how long
+所以我做一下计算,看一下
+
+423
+00:13:38,770 --> 00:13:40,380
+is it gonna take to label,
+加标签需要多少时间,也就是手工
+
+424
+00:13:40,840 --> 00:13:42,640
+to manually label 10,000 examples,
+给10000个样例加标签需要多少时间。
+
+425
+00:13:43,340 --> 00:13:45,280
+if it takes me 10 seconds to label 1 example.
+假如给一个样例加标签需要10秒。
+
+426
+00:13:47,070 --> 00:13:47,940
+So when you do this calculation,
+所以当你计算的时候,
+
+427
+00:13:48,840 --> 00:13:49,920
+often I've seen many you
+我经常看到许多人
+
+428
+00:13:50,390 --> 00:13:51,780
+would be surprised, you know,
+会很吃惊,时间多么短,
+
+429
+00:13:51,870 --> 00:13:53,140
+how little, or sometimes a
+有时只是
+
+430
+00:13:53,240 --> 00:13:54,730
+few days at work, sometimes a
+几天的工作,
+
+431
+00:13:54,880 --> 00:13:55,560
+small number of days of work,
+有时只是少数的几天,
+
+432
+00:13:55,780 --> 00:13:57,180
+well I've seen many teams be very
+我已经看过许多小组
+
+433
+00:13:57,500 --> 00:13:59,160
+surprised that sometimes how
+感到很吃惊,
+
+434
+00:13:59,340 --> 00:14:00,280
+little work it could be,
+获得更多的数据
+
+435
+00:14:00,410 --> 00:14:01,200
+to just get a lot more
+需要的工作是多么少,
+
+436
+00:14:01,370 --> 00:14:02,510
+data, and let that be
+把这个
+
+437
+00:14:02,580 --> 00:14:03,470
+a way to give your learning
+做为大力提高
+
+438
+00:14:03,580 --> 00:14:04,310
+app to give you a huge boost
+提高你机器学习应用
+
+439
+00:14:04,640 --> 00:14:06,350
+in performance, and necessarily, you
+性能的方法,你知道这是必然的,
+
+440
+00:14:06,450 --> 00:14:07,550
+know, sometimes when you've just
+有时当你实际
+
+441
+00:14:07,790 --> 00:14:08,900
+managed to do this, you
+做到这一点了,
+
+442
+00:14:09,190 --> 00:14:10,780
+will be a hero and whatever product
+你将是一个英雄,无论是
+
+443
+00:14:11,360 --> 00:14:12,520
+development, whatever team you're working
+什么产品开发,
+
+444
+00:14:12,910 --> 00:14:14,150
+on, because this can
+无论你在哪个小组工作,
+
+445
+00:14:14,320 --> 00:14:15,760
+be a great way to get much better performance.
+因为这是一个非常好提高性能的方法。
+
+446
+00:14:17,650 --> 00:14:19,490
+Third and finally, one sometimes
+第三个也是最后一个,
+
+447
+00:14:20,020 --> 00:14:21,230
+good way to get a
+有时一个获得更多
+
+448
+00:14:21,450 --> 00:14:22,650
+lot of data is to use
+数据的方法
+
+449
+00:14:23,080 --> 00:14:24,350
+what's now called crowd sourcing.
+叫做众包。
+
+450
+00:14:25,280 --> 00:14:26,350
+So today, there are a
+现在,有一些
+
+451
+00:14:26,520 --> 00:14:27,270
+few websites or a few
+网站,或者是一些服务,
+
+452
+00:14:27,460 --> 00:14:29,520
+services that allow you
+允许你
+
+453
+00:14:29,920 --> 00:14:32,210
+to hire people on
+在网上雇一些人,
+
+454
+00:14:32,350 --> 00:14:33,410
+the web to, you know, fairly
+你知道的,可以非常便宜的
+
+455
+00:14:33,730 --> 00:14:36,140
+inexpensively label large training sets for you.
+帮你标记大量的训练集。
+
+456
+00:14:36,810 --> 00:14:37,870
+So this idea of crowd
+所以众包的思想,
+
+457
+00:14:38,190 --> 00:14:39,460
+sourcing, or crowd sourced
+或者叫众包
+
+458
+00:14:39,950 --> 00:14:41,390
+data labeling, is something
+标记数据,
+
+459
+00:14:41,810 --> 00:14:43,180
+that has, is obviously, like
+很明显有点像
+
+460
+00:14:43,340 --> 00:14:45,200
+an entire academic literature,
+一个完整的学术文化,
+
+461
+00:14:45,660 --> 00:14:47,040
+has some of it's own complications and
+有它自己的诸多问题,
+
+462
+00:14:47,210 --> 00:14:49,390
+so on, pertaining to labeler reliability.
+它很依赖于加标记的人的可靠性。
+
+463
+00:14:50,440 --> 00:14:51,470
+Maybe, you know, hundreds of thousands
+可能,你知道,世界上
+
+464
+00:14:51,860 --> 00:14:53,420
+of labelers, around the
+有成千上万的
+
+465
+00:14:53,580 --> 00:14:55,530
+world, working fairly inexpensively to
+标记人员在廉价地工作,
+
+466
+00:14:55,630 --> 00:14:56,810
+help label data for you,
+帮你给数据加标签,
+
+467
+00:14:57,030 --> 00:14:58,580
+and that I've just had mentioned,
+正如我刚才所说的,
+
+468
+00:14:58,930 --> 00:15:00,120
+there's this one alternative as well.
+这也是一个可供选择的办法。
+
+469
+00:15:00,390 --> 00:15:02,170
+And probably Amazon Mechanical Turk
+亚马逊土耳其机器人
+
+470
+00:15:02,510 --> 00:15:03,750
+systems is probably the most
+系统可能是当前
+
+471
+00:15:03,900 --> 00:15:05,860
+popular crowd sourcing option right now.
+最流行的众包选择。
+
+472
+00:15:06,860 --> 00:15:08,070
+This is often quite a
+假如你想
+
+473
+00:15:08,220 --> 00:15:10,040
+bit of work to
+得到高质量的标签,
+
+474
+00:15:10,190 --> 00:15:10,940
+get to work, if you want
+这个通常是
+
+475
+00:15:11,150 --> 00:15:12,520
+to get very high quality labels,
+很容易完成的工作,
+
+476
+00:15:12,780 --> 00:15:14,160
+but is sometimes an
+但是这有时也是一个
+
+477
+00:15:14,240 --> 00:15:15,760
+option worth considering as well.
+很值得考虑的选择。
+
+478
+00:15:17,330 --> 00:15:18,870
+If you want to try to
+如果你想
+
+479
+00:15:19,320 --> 00:15:21,000
+hire many people, fairly inexpensively
+在网上花费不多
+
+480
+00:15:21,810 --> 00:15:24,220
+on the web, our labels launch miles of data for you.
+雇佣很多人,我们的标记可以做为数据投放到市场。
+
+481
+00:15:26,320 --> 00:15:27,570
+So this video, we
+所以,这一个视频,
+
+482
+00:15:27,660 --> 00:15:28,840
+talked about the idea of
+我们讨论了
+
+483
+00:15:29,100 --> 00:15:30,870
+artificial data synthesis of
+利用人工数据合成的方法
+
+484
+00:15:31,120 --> 00:15:32,440
+either creating new data
+从头开始
+
+485
+00:15:32,750 --> 00:15:34,400
+from scratch, looking, using
+生成新的数据,察看夯基金
+
+486
+00:15:34,640 --> 00:15:35,400
+the ramming funds as an example,
+并以它为例,
+
+487
+00:15:35,830 --> 00:15:37,710
+or by amplifying an
+或者是使用
+
+488
+00:15:37,790 --> 00:15:38,980
+existing training set, by taking
+现有的训练集,引入
+
+489
+00:15:39,420 --> 00:15:41,340
+existing label examples and
+
+490
+00:15:41,560 --> 00:15:42,980
+introducing distortions to it,
+一些失真,以生成另外
+
+491
+00:15:43,240 --> 00:15:44,880
+to sort of create extra label examples.
+以生成额外的标签样例。
+
+492
+00:15:46,010 --> 00:15:47,450
+And finally, one thing that
+最后一件
+
+493
+00:15:47,630 --> 00:15:48,810
+I hope you remember from this
+我希望你通过这个视频
+
+494
+00:15:49,120 --> 00:15:49,970
+video this idea of if
+能记住的事情就是
+
+495
+00:15:50,540 --> 00:15:51,540
+you are facing a machine learning
+假如你正面对
+
+496
+00:15:51,830 --> 00:15:54,350
+problem, it is often worth doing two things.
+一个机器学习问题,有两件事情是值得做的。
+
+497
+00:15:54,660 --> 00:15:55,830
+One just a sanity check,
+一个就是通过学习曲线做一个完整的检查,
+
+498
+00:15:56,160 --> 00:15:58,600
+with learning curves, that having more data would help.
+有更多的数据将会更有力。
+
+499
+00:15:59,520 --> 00:16:00,340
+And second, assuming that that's the case,
+第二个就是,假如情况是这样,
+
+500
+00:16:00,730 --> 00:16:01,780
+I will often seat down and
+我通常会坐下来,
+
+501
+00:16:01,850 --> 00:16:03,670
+ask yourself seriously: what would
+认真的问自己,
+
+502
+00:16:04,050 --> 00:16:05,150
+it take to get ten times as
+获取到十倍于
+
+503
+00:16:05,260 --> 00:16:06,510
+much creative data as you
+当前的创造
+
+504
+00:16:06,630 --> 00:16:08,450
+currently have, and not always,
+出来的数据,不是经常,
+
+505
+00:16:08,960 --> 00:16:10,440
+but sometimes, you may be
+但是有时,你会
+
+506
+00:16:10,640 --> 00:16:12,310
+surprised by how easy that
+吃惊的发现,
+
+507
+00:16:12,580 --> 00:16:13,990
+turns out to be, maybe
+事实上这很简单,可能需要几天,
+
+508
+00:16:14,060 --> 00:16:15,020
+a few days, a few weeks at
+可能是几个周的工作,
+
+509
+00:16:15,150 --> 00:16:16,160
+work, and that can be
+并且这是一个很好的方法,
+
+510
+00:16:16,260 --> 00:16:18,700
+a great way to give your learning algorithm a huge boost in performance
+它可以大幅提高你的机器学习方法的性能。
+
diff --git a/srt/18 - 4 - Ceiling Analysis_ What Part of the Pipeline to Work on Next (14 min).srt b/srt/18 - 4 - Ceiling Analysis_ What Part of the Pipeline to Work on Next (14 min).srt
new file mode 100644
index 00000000..d34856f7
--- /dev/null
+++ b/srt/18 - 4 - Ceiling Analysis_ What Part of the Pipeline to Work on Next (14 min).srt
@@ -0,0 +1,2145 @@
+1
+00:00:00,090 --> 00:00:01,140
+in earlier videos, I have
+在前面的视频中 (字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,260 --> 00:00:02,510
+said over and over that, when
+我不止一次地说过
+
+3
+00:00:02,650 --> 00:00:03,980
+you are developing machine learning system,
+在你开发机器学习系统时
+
+4
+00:00:04,770 --> 00:00:06,630
+one of the most valuable resources is
+你最宝贵的资源
+
+5
+00:00:06,810 --> 00:00:08,050
+your time as the developer
+就是你的时间
+
+6
+00:00:08,490 --> 00:00:09,820
+in terms of picking what
+作为一个开发者
+
+7
+00:00:09,950 --> 00:00:11,520
+to work on next.
+你需要正确选择下一步的工作
+
+8
+00:00:11,950 --> 00:00:12,710
+Or, you have a team of developers
+或者也许你有一个开发团队
+
+9
+00:00:13,300 --> 00:00:14,610
+or a team of engineers working together
+或者一个工程师小组
+
+10
+00:00:15,090 --> 00:00:16,620
+on a machine learning system, again
+共同开发一个机器学习系统
+
+11
+00:00:16,930 --> 00:00:18,420
+one of the most valuable resources is
+同样 最宝贵的还是
+
+12
+00:00:18,990 --> 00:00:20,790
+the time of the engineers or the developers working on the system.
+开发系统所花费的时间
+
+13
+00:00:22,420 --> 00:00:23,340
+And what you really want to
+你需要尽量避免的
+
+14
+00:00:23,430 --> 00:00:25,340
+avoid is that you or
+情况是你或者
+
+15
+00:00:25,360 --> 00:00:26,410
+your colleagues or your friends spend
+你的同事 你的朋友
+
+16
+00:00:26,680 --> 00:00:27,560
+a lot of time working on
+花费了大量时间
+
+17
+00:00:27,970 --> 00:00:29,510
+some component, only to realize
+在某一个模块上
+
+18
+00:00:30,470 --> 00:00:31,540
+after weeks or months of
+在几周甚至几个月的努力以后
+
+19
+00:00:31,620 --> 00:00:33,070
+time spent, that all that
+才意识到所有这些付出的劳动
+
+20
+00:00:33,310 --> 00:00:35,090
+work, you know, just doesn't
+都对你最终系统的表现
+
+21
+00:00:35,380 --> 00:00:38,120
+make a huge difference on the performance of the final system.
+并没有太大的帮助
+
+22
+00:00:39,350 --> 00:00:40,430
+In this video, what I'd
+在这段视频中
+
+23
+00:00:40,550 --> 00:00:42,960
+like to to is, to talk about something called ceiling analysis.
+我将介绍一下关于上限分析(ceiling analysis)的内容
+
+24
+00:00:44,510 --> 00:00:45,760
+When you or your team
+当你自己或你跟
+
+25
+00:00:46,280 --> 00:00:47,270
+are working on a pipeline
+你的团队在设计某个
+
+26
+00:00:47,520 --> 00:00:48,860
+machine learning system, this can
+机器学习系统的流水线时
+
+27
+00:00:49,020 --> 00:00:50,380
+sometimes give you a very
+这种方式通常能
+
+28
+00:00:50,630 --> 00:00:51,650
+strong signal, a very strong
+提供一种很有价值的信号
+
+29
+00:00:52,340 --> 00:00:53,730
+guidance, on what parts
+或者说很有用的导向
+
+30
+00:00:54,150 --> 00:00:56,550
+of the pipeline might be the best use of your time to work on.
+告诉你流水线中的哪个部分最值得你花时间去完成
+
+31
+00:00:59,740 --> 00:01:01,700
+To talk about ceiling analysis, I'm
+为了介绍上限分析
+
+32
+00:01:01,860 --> 00:01:03,140
+going to keep on using the
+我将继续使用之前用过的
+
+33
+00:01:03,690 --> 00:01:04,910
+example of the photo
+照片OCR流水线的例子
+
+34
+00:01:05,640 --> 00:01:06,870
+OCR pipeline and I said
+在之前的课程中
+
+35
+00:01:07,170 --> 00:01:08,270
+earlier each of these
+我讲过这些方框
+
+36
+00:01:08,480 --> 00:01:09,900
+boxes text detection, character
+文字检测、字符分割
+
+37
+00:01:10,200 --> 00:01:12,140
+segmentation, character recognition, each
+字符识别
+
+38
+00:01:12,310 --> 00:01:13,730
+of these boxes can have even
+这每一个方框都可能
+
+39
+00:01:14,100 --> 00:01:15,550
+a small engineering team working
+需要一个小团队来完成
+
+40
+00:01:15,920 --> 00:01:17,370
+on it, or maybe the
+当然也可能
+
+41
+00:01:17,690 --> 00:01:18,640
+entire system is just built
+你一个人来构建整个系统
+
+42
+00:01:18,800 --> 00:01:19,700
+by you, either way, but
+不管怎样
+
+43
+00:01:19,960 --> 00:01:22,340
+the question is, where should you allocate resources?
+问题是 你应该怎样分配资源呢?
+
+44
+00:01:22,730 --> 00:01:24,250
+Which of these boxes is
+哪一个方框最值得
+
+45
+00:01:24,430 --> 00:01:26,630
+most worth your efforts, trying
+你投入精力去做
+
+46
+00:01:26,920 --> 00:01:28,260
+to improve the performance of.
+投入时间去改善效果
+
+47
+00:01:29,070 --> 00:01:30,350
+In order to explain the idea
+(以下这段同前重复,译者注)
+
+48
+00:01:30,840 --> 00:01:32,560
+of ceiling analysis, I'm going
+为了解释上限分析的原理
+
+49
+00:01:32,730 --> 00:01:35,690
+to keep using the example of our photo OCR pipeline.
+我将继续使用照片OCR流水线的例子
+
+50
+00:01:37,000 --> 00:01:38,320
+As I mentioned earlier, each of
+在之前的视频中我讲过
+
+51
+00:01:38,430 --> 00:01:39,630
+these boxes here, each of
+这里的每个方框
+
+52
+00:01:39,850 --> 00:01:41,860
+these machine learning components could be
+都表示一个机器学习的组成部分
+
+53
+00:01:42,170 --> 00:01:43,270
+the work of even a
+需要一个小团队来完成
+
+54
+00:01:43,470 --> 00:01:44,720
+small team of engineers, or
+当然也可能
+
+55
+00:01:45,280 --> 00:01:48,110
+maybe the whole system could be built by just one person.
+整个系统都由一个人来完成
+
+56
+00:01:48,780 --> 00:01:49,920
+But the question is, where should
+但问题是
+
+57
+00:01:50,100 --> 00:01:51,990
+you allocate scarce resources?
+你应该如何分配资源呢?
+
+58
+00:01:52,130 --> 00:01:53,200
+Now this, which of these
+也就是说
+
+59
+00:01:53,690 --> 00:01:54,860
+components, or which one or
+这些模块中
+
+60
+00:01:54,950 --> 00:01:56,250
+two or maybe all three of these components
+哪一个 或者哪两个、三个
+
+61
+00:01:57,080 --> 00:01:58,540
+is most worth your time
+是最值得你花更多的
+
+62
+00:01:59,200 --> 00:02:01,060
+to try to improve the performance of.
+精力去改善它的效果的?
+
+63
+00:02:01,660 --> 00:02:02,810
+So here's the idea of ceiling analysis.
+这便是上限分析要做的事
+
+64
+00:02:04,140 --> 00:02:05,520
+As in the development process for
+跟其他机器学习系统的
+
+65
+00:02:05,890 --> 00:02:07,170
+other machine learning systems as
+开发过程一样
+
+66
+00:02:07,340 --> 00:02:08,490
+well, in order to make
+为了决定
+
+67
+00:02:08,670 --> 00:02:09,740
+decisions on what to do
+要开发这个系统应该
+
+68
+00:02:09,970 --> 00:02:11,150
+for developing the system
+采取什么样的行动
+
+69
+00:02:11,710 --> 00:02:12,770
+is going to be
+一个有效的方法是
+
+70
+00:02:12,900 --> 00:02:14,070
+very helpful to have a
+对学习系统使用一个
+
+71
+00:02:14,580 --> 00:02:17,650
+single road number evaluation metric for this learning system.
+数值评价量度
+
+72
+00:02:18,450 --> 00:02:19,390
+So let's say we pick characters level accuracy.
+所以假如我们用字符准确度作为这个量度
+
+73
+00:02:19,530 --> 00:02:21,140
+So if, you know, given a
+因此 给定一个
+
+74
+00:02:21,570 --> 00:02:22,840
+test set image, while just
+测试样本图像
+
+75
+00:02:22,860 --> 00:02:24,710
+a fraction of alphabets of
+那么这个数值就表示
+
+76
+00:02:25,060 --> 00:02:26,570
+characters in the testing image that
+我们对测试图像中的文字
+
+77
+00:02:28,980 --> 00:02:29,390
+we recognize correctly.
+识别正确的比例
+
+78
+00:02:29,550 --> 00:02:30,830
+Or you can pick some other single world
+或者你也可以选择
+
+79
+00:02:31,030 --> 00:02:32,270
+number evaluation metric, if you
+其他的某个数值评价度量值
+
+80
+00:02:32,370 --> 00:02:33,740
+want, but let's say that
+随你选择
+
+81
+00:02:34,040 --> 00:02:35,820
+whatever evaluation metric we
+但不管选择什么评价量度值
+
+82
+00:02:35,920 --> 00:02:37,680
+pick, we get that, we
+我们只是假设
+
+83
+00:02:37,880 --> 00:02:40,090
+find that the overall system currently has 72% accuracy.
+整个系统的估计准确率为72%
+
+84
+00:02:40,350 --> 00:02:42,210
+So, in other
+所以换句话说
+
+85
+00:02:42,350 --> 00:02:43,380
+words, we have some set
+我们有一些测试集图像
+
+86
+00:02:43,520 --> 00:02:44,960
+of test set images and for
+并且对测试集中的
+
+87
+00:02:45,180 --> 00:02:46,460
+each test set images, we
+每一幅图像
+
+88
+00:02:46,640 --> 00:02:47,850
+run it through text section, then
+我们都对其分别运行
+
+89
+00:02:47,980 --> 00:02:49,280
+character 7 nation, then character
+文字检测、字符分割
+
+90
+00:02:49,560 --> 00:02:50,680
+recognition, and we find
+然后字符识别
+
+91
+00:02:51,010 --> 00:02:52,240
+that on our test set, the
+然后我们发现
+
+92
+00:02:52,370 --> 00:02:53,570
+overall accuracy of the
+整个测试集的准确率是72%
+
+93
+00:02:53,800 --> 00:02:56,220
+entire system was 72% on one of the metric you chose.
+不管你用什么度量值来度量
+
+94
+00:02:58,120 --> 00:02:59,700
+Now just the idea behind
+下面是上限分析的
+
+95
+00:03:00,070 --> 00:03:01,610
+sealing analysis which is that
+主要思想
+
+96
+00:03:01,910 --> 00:03:03,530
+we're going to go to let
+首先我们关注
+
+97
+00:03:03,670 --> 00:03:05,100
+see the first module of a
+这个机器学习流程中的
+
+98
+00:03:05,400 --> 00:03:06,810
+machinery pipelines text detection.
+第一个模块 文字检测
+
+99
+00:03:07,270 --> 00:03:08,400
+And what we are going
+而我们要做的
+
+100
+00:03:08,420 --> 00:03:09,170
+to do is we are going to
+实际上是在
+
+101
+00:03:09,270 --> 00:03:11,310
+monkey around with the test set.
+给测试集样本捣点儿乱
+
+102
+00:03:11,980 --> 00:03:12,920
+We are going to go to the
+我们要对
+
+103
+00:03:12,990 --> 00:03:14,270
+test set and for every test example
+每一个测试集样本
+
+104
+00:03:14,830 --> 00:03:16,170
+we are just going to provide it
+都给它提供一个
+
+105
+00:03:16,380 --> 00:03:18,230
+the correct text detection outputs.
+正确的文字检测结果
+
+106
+00:03:19,210 --> 00:03:20,300
+In other words, we are going
+换句话说
+
+107
+00:03:20,560 --> 00:03:21,760
+to the test set and just
+我们要遍历每个测试集样本
+
+108
+00:03:21,960 --> 00:03:23,340
+manually tell the algorithm
+然后人为地告诉算法
+
+109
+00:03:24,350 --> 00:03:26,210
+where the text is
+每一个测试样本中
+
+110
+00:03:26,780 --> 00:03:27,940
+in each of the test examples.
+什么地方出现了文字
+
+111
+00:03:28,950 --> 00:03:29,960
+So in other words, we
+因此换句话说
+
+112
+00:03:30,030 --> 00:03:31,510
+are going to simulate what happens
+我们是要仿真出
+
+113
+00:03:32,030 --> 00:03:33,640
+if we have a text detection
+如果是100%
+
+114
+00:03:33,890 --> 00:03:35,350
+system with a 100%
+正确地检测出
+
+115
+00:03:35,610 --> 00:03:37,180
+accuracy, for the purpose
+图片中的文字信息
+
+116
+00:03:38,300 --> 00:03:40,410
+of detecting text in an image.
+应该是什么样的
+
+117
+00:03:42,050 --> 00:03:43,070
+And really the way you
+当然 要做到这个
+
+118
+00:03:43,110 --> 00:03:44,210
+do that is very simple right, instead
+是很容易的
+
+119
+00:03:44,620 --> 00:03:45,840
+of letting your learning algorithm
+现在不用你的学习算法
+
+120
+00:03:46,340 --> 00:03:47,630
+detect the text in the images.
+来检测图像中的文字了
+
+121
+00:03:48,180 --> 00:03:49,110
+You wouldn't say go to the
+你只需要找到对应的图像
+
+122
+00:03:49,340 --> 00:03:51,230
+images and just manually label what
+然后人为地识别出
+
+123
+00:03:51,540 --> 00:03:53,620
+is the location of the text in my test set image.
+测试集图像中出现文字的区域
+
+124
+00:03:54,200 --> 00:03:55,040
+And you would then let these
+然后你要做的就是让这些
+
+125
+00:03:55,530 --> 00:03:56,620
+correct, so let these ground
+绝对正确的结果
+
+126
+00:03:56,990 --> 00:03:58,370
+true labels of where as
+这些绝对为真的标签
+
+127
+00:03:58,560 --> 00:04:00,010
+the text be part of
+也就是告诉你
+
+128
+00:04:00,090 --> 00:04:01,330
+your text set and use these
+图像中哪些位置
+
+129
+00:04:01,580 --> 00:04:02,990
+ground true labels what you
+有文字信息的标签
+
+130
+00:04:03,110 --> 00:04:04,200
+feed in to the next
+把它们传给下一个模块
+
+131
+00:04:04,470 --> 00:04:07,550
+stage of the pipeline, to the character segmentation pipeline.
+也就是传给字符分割模块
+
+132
+00:04:07,710 --> 00:04:09,250
+So just said it again, by
+我再说一遍
+
+133
+00:04:09,680 --> 00:04:10,790
+putting a checkmark over here,
+这里打钩的地方
+
+134
+00:04:11,500 --> 00:04:12,590
+what I mean is Im going
+我想做的是
+
+135
+00:04:12,750 --> 00:04:13,750
+to go to my test set and
+遍历我的测试集
+
+136
+00:04:13,860 --> 00:04:14,970
+just give it the correct answers,
+直接向它公布“标准答案”
+
+137
+00:04:15,480 --> 00:04:16,520
+give it the correct labels, for
+为这个流程中的文字检测部分
+
+138
+00:04:16,650 --> 00:04:18,250
+the text detection part of the pipeline.
+直接提供正确的标签
+
+139
+00:04:19,240 --> 00:04:20,280
+So that, as it, I have
+这样好像我就会
+
+140
+00:04:20,410 --> 00:04:21,700
+a perfect text detection system
+有一个非常棒的文字检测系统
+
+141
+00:04:22,370 --> 00:04:24,270
+on my test One into
+能很好地检测我的测试样本
+
+142
+00:04:24,460 --> 00:04:26,570
+do that run this data
+然后我们要做的是
+
+143
+00:04:27,190 --> 00:04:28,150
+to the rest of five points
+继续运行完接下来的几个模块
+
+144
+00:04:28,530 --> 00:04:29,860
+paper presentation and counter definition.
+也就是字符分割和字符识别
+
+145
+00:04:30,680 --> 00:04:31,930
+And then, use the same
+然后使用跟之前一样的
+
+146
+00:04:32,300 --> 00:04:33,310
+evaluation metric as before,
+评价量度指标
+
+147
+00:04:34,000 --> 00:04:35,240
+to measure what is the
+来测量整个系统的
+
+148
+00:04:35,450 --> 00:04:36,900
+overall accuracy of the entire system.
+总体准确度
+
+149
+00:04:37,790 --> 00:04:39,890
+And with perfect hopefully the performance goes up.
+这样用准确的文字检测结果 系统的表现应该会有提升
+
+150
+00:04:40,330 --> 00:04:41,870
+Let 's say it
+假如说 准确率
+
+151
+00:04:41,930 --> 00:04:44,550
+goes up 89% and then
+提高到89%
+
+152
+00:04:44,680 --> 00:04:45,830
+were going to keep going, next lets
+然后我们继续进行
+
+153
+00:04:46,090 --> 00:04:47,120
+go to the next selection of
+接着执行流水线中的下一模块 字符分割
+
+154
+00:04:47,330 --> 00:04:50,230
+pipeline, two character segmentation and again were going to go to my test.
+同前面一样 我还是去找出我的测试集
+
+155
+00:04:50,540 --> 00:04:52,300
+And now going to
+然后现在我不仅用
+
+156
+00:04:52,390 --> 00:04:54,140
+give the correct text detection
+标准的文字检测结果
+
+157
+00:04:54,900 --> 00:04:55,970
+output and give the correct
+我还同时用标准的
+
+158
+00:04:56,490 --> 00:04:58,220
+character segmentation outputs and
+字符分割结果
+
+159
+00:04:59,400 --> 00:05:00,780
+manually label the correct
+所以还是遍历测试样本
+
+160
+00:05:01,330 --> 00:05:03,710
+segment orientations of text into individual characters.
+人工地给出正确的字符分割结果
+
+161
+00:05:04,730 --> 00:05:05,560
+And see how much that helps.
+然后看看这样做以后 效果怎样变化
+
+162
+00:05:05,810 --> 00:05:06,670
+And let's say it goes up to
+假如我们这样做以后
+
+163
+00:05:06,800 --> 00:05:09,140
+90% accuracy for the overall system.
+整个系统准确率提高到90%
+
+164
+00:05:10,070 --> 00:05:11,060
+Alright so as always the accuracy is.
+注意跟前面一样 这里说的准确率
+
+165
+00:05:11,340 --> 00:05:13,420
+Accuracy of the overall systems.
+是指整个系统的准确率
+
+166
+00:05:14,120 --> 00:05:15,460
+So whatever the final output
+所以无论最后一个模块
+
+167
+00:05:15,830 --> 00:05:17,450
+of the character recognition system is.
+字符识别模块给出的最终输出是什么
+
+168
+00:05:17,560 --> 00:05:18,870
+Whatever the final output of
+无论整个流水线的
+
+169
+00:05:19,040 --> 00:05:19,660
+the overall pipeline is, it's going
+最后输出结果是什么
+
+170
+00:05:19,930 --> 00:05:22,400
+to measure the accuracy of that.
+我们都是测出的整个系统的准确率
+
+171
+00:05:22,520 --> 00:05:23,720
+And then finally like character recognition
+最后我们还是执行最后一个模块 字符识别
+
+172
+00:05:24,170 --> 00:05:26,170
+system and give that the correct label as well.
+同样也是人工给出这一模块的正确标签
+
+173
+00:05:26,780 --> 00:05:29,270
+And if I do that too then, no surprise that I should get a 100% accuracy.
+这样做以后 我应该理所当然得到100%准确率
+
+174
+00:05:31,270 --> 00:05:32,530
+Now, the nice thing about having
+进行上限分析的
+
+175
+00:05:32,850 --> 00:05:34,340
+done this analysis analysis is we
+一个好处是
+
+176
+00:05:34,450 --> 00:05:36,080
+can now understand what is
+我们现在就知道了
+
+177
+00:05:36,700 --> 00:05:40,250
+the upside potential for improving each of these components.
+如果对每一个模块进行改善 它们各自的上升空间是多大
+
+178
+00:05:41,390 --> 00:05:44,180
+So we see that if we get perfect text detection.
+所以 我们可以看到 如果我们拥有完美的文字检测模块
+
+179
+00:05:44,950 --> 00:05:46,360
+Our performance went up from
+那么整个系统的表现将会从
+
+180
+00:05:46,710 --> 00:05:48,080
+72 to 89 percent, so
+准确率72%上升到89%
+
+181
+00:05:48,420 --> 00:05:50,670
+that's' a 17 percent performance gain.
+因此效果的增益是17%
+
+182
+00:05:51,640 --> 00:05:52,680
+So this means that you've
+这就意味着
+
+183
+00:05:52,890 --> 00:05:54,030
+to take your current system you
+如果你在现有系统的基础上
+
+184
+00:05:54,160 --> 00:05:56,130
+spend a lot of time improving text detection.
+花费时间和精力改善文字检测模块的效果
+
+185
+00:05:57,330 --> 00:05:58,750
+That means that we could potentially improve
+那么系统的表现
+
+186
+00:05:59,200 --> 00:06:00,640
+our system's performance by 17 percent.
+可能会提高17%
+
+187
+00:06:01,020 --> 00:06:02,850
+This seems like it's well worth our while.
+看起来这还挺值得
+
+188
+00:06:03,770 --> 00:06:05,840
+Whereas in contrast, when going
+而相对来讲
+
+189
+00:06:06,200 --> 00:06:08,360
+from text detection When we
+如果我们取得完美的字符分割模块
+
+190
+00:06:08,640 --> 00:06:12,450
+gave it perfect character segmentation, performance went up only by one percent.
+那么最终系统表现只提升了1%
+
+191
+00:06:13,020 --> 00:06:14,820
+So, that's a more sobering message.
+这便提供了一个很重要的信息
+
+192
+00:06:15,250 --> 00:06:16,880
+It means that no matter how
+这就告诉我们
+
+193
+00:06:17,090 --> 00:06:18,510
+much time you spend character segmentation,
+不管我们投入多大精力在字符分割上
+
+194
+00:06:19,800 --> 00:06:20,990
+maybe the upside potential is
+系统效果的潜在上升空间
+
+195
+00:06:21,080 --> 00:06:22,280
+going to be pretty small, and maybe
+也都是很小很小
+
+196
+00:06:22,460 --> 00:06:23,420
+you do not want to
+所以你就不会让一个
+
+197
+00:06:23,580 --> 00:06:24,340
+have a large team of engineers
+比较大的工程师团队
+
+198
+00:06:24,860 --> 00:06:26,860
+working on character segmentation that
+花时间忙于字符分割模块
+
+199
+00:06:26,990 --> 00:06:28,860
+this sort of analysis shows that
+因为通过上限分析我们知道了
+
+200
+00:06:29,150 --> 00:06:30,180
+even when you give it the
+即使你把字符分割模块做得再好
+
+201
+00:06:30,260 --> 00:06:32,480
+perfect character segmentation, your
+再怎么完美 你的系统表现
+
+202
+00:06:32,620 --> 00:06:34,180
+performance goes up by only one percent.
+最多也只能提升1%
+
+203
+00:06:34,620 --> 00:06:36,090
+So right there, this is really estimates.
+所以这就估计出
+
+204
+00:06:36,890 --> 00:06:38,080
+What is the ceiling, or what's
+通过改善各个模块的质量
+
+205
+00:06:38,300 --> 00:06:39,360
+an upper bound on how much
+你的系统表现
+
+206
+00:06:39,550 --> 00:06:40,690
+you can improve the performance of your
+所能提升的上限值
+
+207
+00:06:40,740 --> 00:06:42,710
+system by working on one of these components?
+或者说最大值 是多少
+
+208
+00:06:44,330 --> 00:06:45,600
+And finally, going for character,
+最后
+
+209
+00:06:46,320 --> 00:06:47,700
+when we get better
+如果我们取得完美的字符识别模块
+
+210
+00:06:47,900 --> 00:06:50,080
+character recognition, the performance went up by ten percent.
+那么整个系统的表现将提高10%
+
+211
+00:06:50,530 --> 00:06:51,640
+So you know, again you
+所以 同样
+
+212
+00:06:51,750 --> 00:06:52,570
+can decide, is a ten
+你也可以分析
+
+213
+00:06:52,860 --> 00:06:55,630
+percent improvement, how much is that working out?
+10%的效果提升值得投入多少工作量
+
+214
+00:06:55,830 --> 00:06:57,200
+It tells you that maybe
+也许这也告诉你
+
+215
+00:06:57,400 --> 00:06:58,670
+with more efforts spent on the
+如果把精力投入在
+
+216
+00:06:58,730 --> 00:06:59,690
+last station of the pipeline,
+流水线的最后这个模块
+
+217
+00:07:00,360 --> 00:07:02,840
+you can improve the performance
+那么系统的性能
+
+218
+00:07:03,760 --> 00:07:04,500
+of the systems as well.
+还是能得到较大的提高
+
+219
+00:07:05,610 --> 00:07:06,580
+Another way of thinking about this
+另一种认识这种分析方法的角度是
+
+220
+00:07:06,870 --> 00:07:08,090
+is that, by going through this
+通过这样的分析
+
+221
+00:07:08,290 --> 00:07:09,470
+sort of analysis you're trying to
+你就能总结出
+
+222
+00:07:09,570 --> 00:07:10,640
+figure out, you know, what is
+改善每个模块的性能
+
+223
+00:07:10,740 --> 00:07:12,700
+the upside potential, of improving
+系统的上升空间是多少
+
+224
+00:07:13,480 --> 00:07:14,980
+each of these components or how
+或者说如果其中的某个模块
+
+225
+00:07:15,080 --> 00:07:16,730
+much could you possibly gain if
+变得绝对完美时
+
+226
+00:07:17,260 --> 00:07:18,910
+one of these components became absolutely
+你能得到什么收获
+
+227
+00:07:19,380 --> 00:07:20,780
+perfect and just really
+这就像是给系统表现
+
+228
+00:07:21,060 --> 00:07:23,230
+places an upper bound on the performance of that system.
+加上了一个提升的上限值
+
+229
+00:07:24,220 --> 00:07:26,290
+So, the idea of ceiling analysis is pretty important.
+所以 上限分析的概念是很重要的
+
+230
+00:07:26,900 --> 00:07:29,840
+Let me just illustrate this idea again, but with a different example but a more complex one.
+下面我再用一个稍微复杂一点的例子来演绎一下上限分析的原理
+
+231
+00:07:31,860 --> 00:07:32,990
+Let's say that you want to
+假如说你想对这张图像
+
+232
+00:07:33,260 --> 00:07:34,830
+do face recognition from images,
+进行人脸识别
+
+233
+00:07:35,280 --> 00:07:35,960
+so unless you want to look at
+也就是说看着这张照片
+
+234
+00:07:35,990 --> 00:07:37,650
+the picture and recognize whether or
+你希望识别出
+
+235
+00:07:37,820 --> 00:07:38,770
+not the person in this picture
+照片里这个人
+
+236
+00:07:39,470 --> 00:07:40,640
+is a particular friend of yours,
+是不是你的朋友
+
+237
+00:07:40,670 --> 00:07:43,880
+trying to recognize the person shown in this image.
+希望辨识出图像中的人
+
+238
+00:07:44,180 --> 00:07:46,260
+This is a slightly artificial example.
+这是一个偏人工智能的例子
+
+239
+00:07:47,130 --> 00:07:51,080
+This isn't actually how face
+当然这并不是现实中的
+
+240
+00:07:51,320 --> 00:07:52,790
+recognition is done in
+人脸识别技术
+
+241
+00:07:52,800 --> 00:07:53,660
+practice, but I want to step through an example of what a
+但我想通过这个例子
+
+242
+00:07:53,870 --> 00:07:54,800
+pipeline might look like to
+来向你展示一个流水线
+
+243
+00:07:54,940 --> 00:07:56,220
+give you another example of how
+并且给你另一个关于
+
+244
+00:07:56,450 --> 00:07:57,820
+a ceiling analysis process might look.
+上限分析的实例
+
+245
+00:07:58,710 --> 00:07:59,980
+So, we have a
+假如我们有张照片
+
+246
+00:08:00,160 --> 00:08:03,830
+camera image and let's say that we design a pipeline as follows.
+我们设计了如下的流水线
+
+247
+00:08:04,420 --> 00:08:05,120
+Let's say the first thing you want
+假如我们第一步要做的
+
+248
+00:08:05,380 --> 00:08:07,480
+to do is do pre-processing of
+是图像预处理
+
+249
+00:08:07,560 --> 00:08:08,770
+the image, so let's take those
+假如我们就用
+
+250
+00:08:08,910 --> 00:08:10,310
+images like I have shown on
+右上角这张照片
+
+251
+00:08:10,390 --> 00:08:11,040
+the upper right, and let's say we
+现在加入我们想要
+
+252
+00:08:11,140 --> 00:08:12,510
+want to remove the background, so
+把背景去掉
+
+253
+00:08:13,030 --> 00:08:14,790
+through pre-processing the background disappears.
+那么经过预处理 背景就被去掉了
+
+254
+00:08:16,070 --> 00:08:18,820
+Next we want to say detect the face of the person.
+下一步我们希望检测出人脸的位置
+
+255
+00:08:19,370 --> 00:08:20,550
+That's usually done with a learning algorithm.
+这通常通过一个学习算法来实现
+
+256
+00:08:20,930 --> 00:08:21,960
+So we'll run a sliding
+我们会运行一个滑动窗分类器
+
+257
+00:08:22,180 --> 00:08:24,900
+windows crossfire to draw a box around the person's face.
+在人脸上画一个框
+
+258
+00:08:25,680 --> 00:08:26,720
+Having detected the face it
+在检测到脸部以后
+
+259
+00:08:26,790 --> 00:08:27,650
+turns out that if you
+如果你想要
+
+260
+00:08:27,770 --> 00:08:29,320
+want to recognize people it turns
+识别出这个人
+
+261
+00:08:29,530 --> 00:08:31,630
+out that the eyes is a highly useful cue.
+那么眼睛是一个很重要的线索
+
+262
+00:08:32,000 --> 00:08:33,860
+We actually, in terms
+事实上
+
+263
+00:08:34,130 --> 00:08:35,420
+ofrecognizing your friends, the
+要辨认出你的朋友
+
+264
+00:08:35,700 --> 00:08:36,870
+appearance of their eyes is actually
+你通常会看眼睛
+
+265
+00:08:37,330 --> 00:08:38,680
+one of the most important cues that you use.
+这是个比较重要的线索
+
+266
+00:08:39,470 --> 00:08:41,610
+So let's run another crossfire to detect the eyes of the person.
+所以 我们需要运行另一个分类器来检测人的眼睛
+
+267
+00:08:42,500 --> 00:08:43,660
+So, segment out the eyes,
+分割出眼睛
+
+268
+00:08:44,410 --> 00:08:45,650
+and then and since this
+这样就提供了
+
+269
+00:08:45,900 --> 00:08:47,290
+will give us useful features to
+识别出一个人的
+
+270
+00:08:47,380 --> 00:08:48,840
+recognize a person, and then
+很重要的特征
+
+271
+00:08:49,100 --> 00:08:50,400
+other parts of the face of physical interest.
+然后继续识别脸上其他重要的部位
+
+272
+00:08:50,990 --> 00:08:52,330
+Maybe segment out the nose,
+比如分割出鼻子
+
+273
+00:08:52,830 --> 00:08:54,750
+segment out the mouth, and
+分割出嘴巴
+
+274
+00:08:54,980 --> 00:08:56,230
+then, having found the
+这样找出了
+
+275
+00:08:56,370 --> 00:08:57,060
+eyes, the nose and the mouth,
+眼睛、鼻子、嘴巴
+
+276
+00:08:57,340 --> 00:08:58,420
+all of these give us useful
+所有这些都是非常有用的特征
+
+277
+00:08:58,740 --> 00:08:59,920
+features to maybe feed into
+然后这些特征可以被输入给某个
+
+278
+00:09:00,580 --> 00:09:01,540
+a logistic regression crossfire.
+逻辑回归的分类器
+
+279
+00:09:02,500 --> 00:09:03,200
+And it's the job of the
+然后这个分类器的任务
+
+280
+00:09:03,480 --> 00:09:04,420
+crossfire to then give us the
+就是给出最终的标签
+
+281
+00:09:04,710 --> 00:09:05,850
+overall label to find the
+找出我们认为能
+
+282
+00:09:05,970 --> 00:09:06,930
+label for who we think
+辨别出这个人是谁的
+
+283
+00:09:07,190 --> 00:09:08,450
+is the identity of this person.
+最终的标签
+
+284
+00:09:10,110 --> 00:09:11,730
+So this is a kind of complicated pipeline.
+这是一个稍微复杂一些的流水线
+
+285
+00:09:12,160 --> 00:09:13,300
+It's actually probably more complicated
+如果你真的想识别出人的话
+
+286
+00:09:13,950 --> 00:09:16,810
+than you should be using, if you actually want to recognize people.
+可能实际的流程比这个还要复杂
+
+287
+00:09:17,620 --> 00:09:20,330
+But there's an illustrative example that's useful to think about for ceiling analysis.
+但这给出了很好的一个上限分析的例子
+
+288
+00:09:22,150 --> 00:09:24,510
+So how do you go through ceiling analysis for this pipeline?
+对这个流水线怎么进行上限分析呢?
+
+289
+00:09:25,000 --> 00:09:26,790
+Well, we'll step through these pieces one at a time.
+我们还是每次关注一个步骤
+
+290
+00:09:27,470 --> 00:09:28,900
+Let's say your overall system has
+假如说你整个系统的准确率
+
+291
+00:09:29,150 --> 00:09:30,560
+85 percent accuracy, the first
+达到了85%
+
+292
+00:09:30,720 --> 00:09:31,670
+thing I do is go to
+那么我要做的第一件事情
+
+293
+00:09:31,750 --> 00:09:32,890
+my test set and manually
+还是找到我的测试集
+
+294
+00:09:33,860 --> 00:09:36,200
+give it a ground foreground, background,
+然后对前景和背景
+
+295
+00:09:36,740 --> 00:09:38,090
+segmentations, and then manually go to
+进行分割
+
+296
+00:09:38,150 --> 00:09:39,670
+the test set, and use Photoshop
+然后使用Photoshop
+
+297
+00:09:40,290 --> 00:09:41,750
+or something, to just tell it
+或者别的什么软件
+
+298
+00:09:41,950 --> 00:09:43,130
+where's the background, and just
+识别出哪些区域是背景
+
+299
+00:09:43,360 --> 00:09:45,230
+manually remove the background, so
+然后手动把背景删掉
+
+300
+00:09:45,470 --> 00:09:48,050
+ground true background, and see how much the accuracy changes.
+然后观察准确率提高多少
+
+301
+00:09:48,990 --> 00:09:50,320
+In this example, the accuracy
+假设在这个例子中
+
+302
+00:09:50,800 --> 00:09:53,700
+goes up by 0.1% so
+准确率提高了0.1%
+
+303
+00:09:53,860 --> 00:09:54,900
+this is a strong sign that
+这是个很明显的信号
+
+304
+00:09:55,100 --> 00:09:56,240
+even if you had perfect background
+它告诉你即便你
+
+305
+00:09:56,630 --> 00:09:59,680
+segmentation your performance, even
+把背景分割做得很好
+
+306
+00:09:59,840 --> 00:10:01,650
+if perfect background removal, the
+完全去除了背景图案
+
+307
+00:10:01,730 --> 00:10:03,740
+performance of your system isn't going to go up that much.
+但整个系统的表现也并不会提高多少
+
+308
+00:10:03,880 --> 00:10:05,000
+So this is maybe not worth a
+所以似乎并不值得
+
+309
+00:10:05,190 --> 00:10:07,720
+huge effort to work on pre-processing, on background removal.
+花太多精力在预处理或者背景移除上
+
+310
+00:10:09,270 --> 00:10:10,170
+Then, everything goes to the
+接下来 再遍历测试集
+
+311
+00:10:10,230 --> 00:10:11,290
+test set, given the correct
+给出正确的脸部识别图案
+
+312
+00:10:11,780 --> 00:10:13,650
+face detection images, then again
+接下来还是依次运行
+
+313
+00:10:14,140 --> 00:10:16,690
+step through the eyes, nose, mouth segmentations in some order.
+眼睛、鼻子和嘴巴的分割
+
+314
+00:10:17,100 --> 00:10:17,470
+Pick one order.
+选择一种顺序就行了
+
+315
+00:10:17,700 --> 00:10:18,890
+Let's give the correct location
+给出眼睛的正确位置
+
+316
+00:10:19,340 --> 00:10:20,520
+of the eyes, correct location of
+鼻子的正确位置
+
+317
+00:10:20,750 --> 00:10:22,510
+the nose, correct location of
+嘴巴的正确位置
+
+318
+00:10:22,520 --> 00:10:23,740
+the mouth, and then finally
+最后 再给出最终的正确标签
+
+319
+00:10:24,130 --> 00:10:26,200
+if I just give it the correct overall label, I get 100% accuracy.
+准确率提高到100%
+
+320
+00:10:27,900 --> 00:10:29,390
+And so, you know, as
+注意看
+
+321
+00:10:29,500 --> 00:10:30,430
+I go through the system
+在我每次通过这个系统的时候
+
+322
+00:10:31,040 --> 00:10:32,080
+and just give more and more
+我给测试集提供的
+
+323
+00:10:32,210 --> 00:10:33,900
+components the correct labels
+正确的模块越来越多
+
+324
+00:10:34,370 --> 00:10:35,350
+in the test set, the performance
+因此整个系统的表现
+
+325
+00:10:35,830 --> 00:10:37,550
+So if the overall system goes up,
+逐步上升
+
+326
+00:10:37,730 --> 00:10:38,640
+and you can look at how much
+这样你就能很清楚地看到
+
+327
+00:10:38,890 --> 00:10:39,860
+the performance went up on
+通过不同的步骤
+
+328
+00:10:40,240 --> 00:10:41,660
+different steps, so, you know, from
+系统的表现增加了多少
+
+329
+00:10:42,550 --> 00:10:43,830
+giving it the perfect face detection,
+比如 有了完美的脸部识别
+
+330
+00:10:44,440 --> 00:10:45,270
+and it looks like the overall
+整个系统的表现似乎
+
+331
+00:10:45,570 --> 00:10:48,290
+performance of this system went up by 5.9 percent.
+提高了5.9%
+
+332
+00:10:49,710 --> 00:10:50,670
+So that's a pretty big jump,
+这算是比较大的提高了
+
+333
+00:10:50,980 --> 00:10:52,100
+means that maybe it's worth quite
+这告诉你也许在脸部检测上
+
+334
+00:10:52,370 --> 00:10:53,660
+a bit of effort on better face detection.
+多做点努力是有意义的
+
+335
+00:10:54,670 --> 00:10:56,290
+Went four percent there, went
+这里提高4%
+
+336
+00:10:56,710 --> 00:10:58,680
+one percent there, one percent
+这两步都是提高1%
+
+337
+00:10:59,160 --> 00:11:00,600
+there and three percent there.
+这一步提高3%
+
+338
+00:11:01,520 --> 00:11:02,840
+So it looks like the
+所以从整体上看
+
+339
+00:11:02,980 --> 00:11:04,250
+components that most worth
+最值得我付出努力的模块
+
+340
+00:11:04,730 --> 00:11:06,520
+our while are, when
+按顺序排列一下
+
+341
+00:11:06,680 --> 00:11:08,540
+I gave it perfect face detection,
+排在最前的是脸部检测
+
+342
+00:11:09,680 --> 00:11:10,190
+system went up.
+系统表现提高了
+
+343
+00:11:10,490 --> 00:11:11,990
+By 5.9 performance, might give
+5.9%
+
+344
+00:11:12,170 --> 00:11:14,170
+it perfect eye segmentation, went up
+给它完美的眼部分割
+
+345
+00:11:14,380 --> 00:11:15,540
+by 4%, and then my final logistical
+系统表现提高4%
+
+346
+00:11:16,000 --> 00:11:19,220
+crossfire, well there's another 3 percent gap there maybe.
+最终是我的逻辑回归分类器 提高大约3%
+
+347
+00:11:19,570 --> 00:11:20,580
+And so, this tells us
+因此 这很清楚地指出了
+
+348
+00:11:20,810 --> 00:11:23,400
+maybe one of the components that are most worth our while working on.
+哪一个模块是最值得花精力去完善的
+
+349
+00:11:24,610 --> 00:11:25,690
+And by the way, I
+顺便一提
+
+350
+00:11:25,830 --> 00:11:28,110
+want to tell you, it's a true cautionary story.
+我还想讲一个真实的故事
+
+351
+00:11:28,680 --> 00:11:29,620
+The reason I put in this
+我在预处理这里
+
+352
+00:11:29,850 --> 00:11:32,350
+pre-processing background removal is
+放入背景移除这个部分的原因是
+
+353
+00:11:32,600 --> 00:11:34,050
+because I actually know
+我知道一件真实的事情
+
+354
+00:11:34,340 --> 00:11:35,530
+of a true story where there
+原来有一个研究小组
+
+355
+00:11:35,770 --> 00:11:37,140
+was a research team that actually
+大概有两个人
+
+356
+00:11:37,480 --> 00:11:38,990
+literally had two people spend
+不夸张地说
+
+357
+00:11:39,580 --> 00:11:40,250
+about a year and a half,
+花了一年半的时间
+
+358
+00:11:40,530 --> 00:11:42,410
+spend 18 months, working on
+整整18个月
+
+359
+00:11:42,770 --> 00:11:44,050
+better background removal.
+都在完善背景移除的效果
+
+360
+00:11:44,480 --> 00:11:45,680
+We are rushing here... I am
+我不太清楚
+
+361
+00:11:46,120 --> 00:11:47,490
+obscuring the details for obvious
+具体的细节和原因是什么
+
+362
+00:11:47,970 --> 00:11:48,770
+reasons, but there was a
+但确实是有两个工程师
+
+363
+00:11:48,820 --> 00:11:50,610
+computer vision application where there
+为了开发某个
+
+364
+00:11:50,720 --> 00:11:51,660
+was a team of two engineers
+计算机视觉的应用系统
+
+365
+00:11:51,770 --> 00:11:52,850
+who literally spent I think
+大概花了一年半的时间
+
+366
+00:11:52,990 --> 00:11:54,210
+about a year and a half, working
+就为了得到一个
+
+367
+00:11:54,550 --> 00:11:56,050
+on better background removal.
+更好的背景移除效果
+
+368
+00:11:56,550 --> 00:11:57,720
+Actually they worked out
+事实上他们确实研究出了非常复杂的算法
+
+369
+00:11:57,820 --> 00:12:00,270
+really complicated algorithms, so I ended up publishing I think, one research paper.
+貌似最后还发表了一篇文章
+
+370
+00:12:01,080 --> 00:12:02,000
+But after all that work they
+但最终他们发现
+
+371
+00:12:02,110 --> 00:12:03,020
+found that, it just did
+所有付出的这些劳动
+
+372
+00:12:03,260 --> 00:12:04,910
+not make a huge difference to
+都不能给他们研发系统
+
+373
+00:12:05,200 --> 00:12:06,490
+the overall performance of the
+的整体表现带来
+
+374
+00:12:06,710 --> 00:12:09,120
+actual application they were working on.
+比较大的提升
+
+375
+00:12:09,450 --> 00:12:10,770
+And if only, you know if
+而如果要是之前
+
+376
+00:12:10,770 --> 00:12:13,170
+only someone were to do a [xx] analysis
+他们组某个人做一下上限分析
+
+377
+00:12:13,700 --> 00:12:15,790
+beforehand, maybe we could have realized this.
+他们就会提前意识到这个问题
+
+378
+00:12:17,240 --> 00:12:18,360
+And one of them said to me
+后来 他们中有一个人
+
+379
+00:12:18,480 --> 00:12:19,510
+afterward, you know, if only they
+跟我说 如果他们之前
+
+380
+00:12:19,640 --> 00:12:20,580
+had done the sort of analysis
+也做了某种这样的分析
+
+381
+00:12:20,850 --> 00:12:21,710
+like this, maybe they could
+他们就会在长达
+
+382
+00:12:21,990 --> 00:12:23,190
+have realized before that 18 months
+18个月的辛苦劳动以前
+
+383
+00:12:23,440 --> 00:12:25,180
+of work, that they
+意识到这个问题
+
+384
+00:12:25,240 --> 00:12:26,300
+should have spent their effort focusing
+他们就可以把精力花在
+
+385
+00:12:26,680 --> 00:12:28,920
+on some different component than literally
+其他更重要的模块上
+
+386
+00:12:29,380 --> 00:12:31,230
+spending 18 months working on background removal.
+而不是把18个月花在背景移除上
+
+387
+00:12:33,910 --> 00:12:36,140
+So to summarize, pipelines are
+总结一下
+
+388
+00:12:36,390 --> 00:12:38,630
+pretty pervasive and complex machine learning applications.
+流水线是非常常用却又很复杂的机器学习应用
+
+389
+00:12:39,890 --> 00:12:40,950
+And when you are working on
+当你在开发某个
+
+390
+00:12:41,200 --> 00:12:42,780
+a big machine learning application, I
+机器学习应用的时候
+
+391
+00:12:42,830 --> 00:12:45,450
+mean I think your time as a developer is so valuable.
+作为一个开发者 你的时间是相当宝贵的
+
+392
+00:12:46,090 --> 00:12:47,360
+So just don't waste your
+所以真的不要花时间
+
+393
+00:12:47,460 --> 00:12:50,120
+time working on something that ultimately isn't going to matter.
+去做一些到头来没意义的事情
+
+394
+00:12:51,350 --> 00:12:52,370
+And in this video, we talked
+因此在这节课中
+
+395
+00:12:52,490 --> 00:12:53,570
+about this idea of ceiling analysis,
+我给大家介绍了上限分析的概念
+
+396
+00:12:54,340 --> 00:12:55,750
+which I've often found to
+我经常觉得上限分析
+
+397
+00:12:55,850 --> 00:12:57,000
+be a very good tool for
+是个很有用的工具
+
+398
+00:12:57,130 --> 00:12:58,660
+identifying the component, and if
+当你想花精力到某个模块上时
+
+399
+00:12:58,760 --> 00:12:59,830
+you actually put a focused effort
+你可以用上线分析的方法
+
+400
+00:13:00,050 --> 00:13:01,010
+on that component, and make a
+来确定你的努力
+
+401
+00:13:01,250 --> 00:13:02,420
+big difference, it would actually
+会不会产生什么意义
+
+402
+00:13:03,050 --> 00:13:04,360
+have a huge effect on the
+整个系统的表现
+
+403
+00:13:04,620 --> 00:13:06,040
+overall performance of your final system.
+会不会产生明显的提高
+
+404
+00:13:07,070 --> 00:13:08,010
+So, over the years, working
+所以 经过这么多年
+
+405
+00:13:08,340 --> 00:13:09,520
+with machine learning, I've actually learned
+在机器学习中的摸爬滚打
+
+406
+00:13:09,710 --> 00:13:10,900
+to not trust my own gut
+我已经学会了不要凭自己的直觉
+
+407
+00:13:11,100 --> 00:13:13,200
+feeling about what component to work on.
+来判断应该干什么
+
+408
+00:13:13,280 --> 00:13:14,310
+So, very often, when you have
+虽然我从事
+
+409
+00:13:14,540 --> 00:13:15,440
+worked with machine learning for a
+机器学习的工作
+
+410
+00:13:15,570 --> 00:13:17,160
+long time, but often, our local
+已经很多年了
+
+411
+00:13:17,360 --> 00:13:18,770
+machine learning problem, and I
+但经常遇到某个机器学习问题时
+
+412
+00:13:18,950 --> 00:13:20,130
+may have some gut feeling about,
+总有一些直觉告诉我
+
+413
+00:13:20,450 --> 00:13:22,970
+oh, let's, you know, jump on that component, and just spend more time on that.
+我们应该跳到那一个模块 应该把时间花在那儿
+
+414
+00:13:24,120 --> 00:13:25,050
+That over the years that I
+但经过这么多年
+
+415
+00:13:25,160 --> 00:13:26,600
+have come to even trust my
+我现在也开始慢慢意识到
+
+416
+00:13:26,740 --> 00:13:27,800
+own gut feelings and knowing not
+还是不能太相信
+
+417
+00:13:28,130 --> 00:13:29,310
+to trust gut feelings that much
+自己的感觉
+
+418
+00:13:29,980 --> 00:13:31,450
+and instead really have a
+相反地 如果要解决某个机器学习问题
+
+419
+00:13:31,520 --> 00:13:33,060
+solid machine learning problem, where it's
+最好能把问题
+
+420
+00:13:33,180 --> 00:13:34,750
+possible to structure things.
+分成多个模块
+
+421
+00:13:34,960 --> 00:13:36,340
+To do a ceiling analysis often
+然后做一下上限分析
+
+422
+00:13:36,660 --> 00:13:37,720
+does a much better and much
+这通常给你更可靠
+
+423
+00:13:37,910 --> 00:13:39,110
+more reliable way for deciding
+更好的方法
+
+424
+00:13:39,670 --> 00:13:40,900
+where to put a focused effort
+来为你决定
+
+425
+00:13:40,940 --> 00:13:42,270
+into, to really improve this,
+该把劲儿往哪儿使
+
+426
+00:13:42,690 --> 00:13:44,570
+the performance of some component and
+该提高哪个模块的效果
+
+427
+00:13:44,680 --> 00:13:45,900
+we kind of be sure that when
+这样我们就会非常确认
+
+428
+00:13:46,180 --> 00:13:46,960
+do that it will actually have
+把这个模块做好就能提高
+
+429
+00:13:47,200 --> 00:13:49,460
+a huge effect on the final performance of your process system.
+系统的最终表现【果壳教育无边界字幕组】翻译:所罗门捷列夫
+
diff --git a/srt/19 - 1 - Summary and Thank You (5 min).srt b/srt/19 - 1 - Summary and Thank You (5 min).srt
new file mode 100644
index 00000000..66597f18
--- /dev/null
+++ b/srt/19 - 1 - Summary and Thank You (5 min).srt
@@ -0,0 +1,495 @@
+1
+00:00:00,004 --> 00:00:03,840
+Welcome to the final video of this Machine Learning class.
+欢迎来到《机器学习》课的最后一段视频
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:03,840 --> 00:00:06,473
+We've been through a lot of different videos together.
+我们已经一起学习很长一段时间了
+
+3
+00:00:06,473 --> 00:00:08,774
+In this video I would like to just quickly
+在最后这段视频中
+
+4
+00:00:08,774 --> 00:00:11,003
+summarize the main topics of this course
+我想快速地回顾一下这门课的主要内容
+
+5
+00:00:11,003 --> 00:00:13,089
+and then say a few words at the end and that
+然后简单说几句想说的话
+
+6
+00:00:13,089 --> 00:00:14,729
+will wrap up the class.
+作为这门课的结束
+
+7
+00:00:16,390 --> 00:00:18,020
+So what have we done?
+那么我们学到了些什么呢?
+
+8
+00:00:18,020 --> 00:00:21,957
+In this class we spent a lot of time talking about supervised learning algorithms
+在这门课中 我们花了大量的时间
+
+9
+00:00:21,957 --> 00:00:25,436
+like linear regression, logistic regression, neural networks, SVMs.
+介绍了诸如线性回归 逻辑回归 神经网络 支持向量机
+
+10
+00:00:25,436 --> 00:00:29,435
+for problems where you have labelled data and labelled examples
+等等一些监督学习算法 这类算法具有带标签的数据和样本
+
+11
+00:00:29,435 --> 00:00:31,300
+like x(i), y(i)
+比如 x(i) y(i)
+
+12
+00:00:31,300 --> 00:00:35,715
+And we also spent quite a lot of time talking about unsupervised learning
+然后我们也花了很多时间介绍无监督学习
+
+13
+00:00:35,715 --> 00:00:37,344
+like K-means clustering,
+例如 K-均值聚类
+
+14
+00:00:37,344 --> 00:00:40,316
+Principal Components Analysis for dimensionality reduction
+用于降维的主成分分析
+
+15
+00:00:40,316 --> 00:00:43,847
+and Anomaly Detection algorithms for when you have only
+以及当你只有一系列无标签数据 x(i) 时的
+
+16
+00:00:43,847 --> 00:00:46,363
+unlabelled data x(i)
+异常检测算法
+
+17
+00:00:46,363 --> 00:00:49,378
+Although Anomaly Detection can also use some labelled data
+当然 有时带标签的数据
+
+18
+00:00:49,378 --> 00:00:51,189
+to evaluate the algorithm.
+也可以用于异常检测算法的评估
+
+19
+00:00:51,451 --> 00:00:54,725
+We also spent some time talking about special applications
+此外 我们也花时间讨论了一些特别的应用
+
+20
+00:00:54,725 --> 00:00:56,407
+or special topics like Recommender Systems
+或者特别的话题 比如说推荐系统
+
+21
+00:00:56,407 --> 00:00:58,895
+and large scale machine learning systems
+以及大规模机器学习系统
+
+22
+00:00:58,895 --> 00:01:01,477
+including parallelized and rapid-use systems
+包括并行系统和映射化简方法
+
+23
+00:01:01,477 --> 00:01:03,925
+as well as some special applications like
+还有其他一些特别的应用比如
+
+24
+00:01:03,925 --> 00:01:07,609
+sliding windows object classification for computer vision.
+用于计算机视觉技术的滑动窗口分类算法
+
+25
+00:01:07,609 --> 00:01:11,549
+And finally we also spent a lot of time talking about different aspects
+最后 我们还提到了很多关于构建
+
+26
+00:01:11,549 --> 00:01:15,198
+of, sort of, advice on building a machine learning system.
+机器学习系统的实用建议
+
+27
+00:01:15,198 --> 00:01:17,264
+And this involved both trying to understand
+这包括了怎样理解
+
+28
+00:01:17,264 --> 00:01:19,233
+what is it that makes a machine learning algorithm
+某个机器学习算法
+
+29
+00:01:19,233 --> 00:01:20,561
+work or not work.
+是否正常工作的原因
+
+30
+00:01:20,561 --> 00:01:22,012
+So we talked about things like bias and variance,
+所以我们谈到了偏差和方差的问题
+
+31
+00:01:22,012 --> 00:01:25,479
+and how regularization can help with some variance problems.
+也谈到了解决方差问题的正则化
+
+32
+00:01:25,479 --> 00:01:28,445
+And we also spent a little bit of time talking about
+同时我们也讨论了
+
+33
+00:01:28,445 --> 00:01:32,313
+this question of how to decide what to work on next.
+怎样决定接下来怎么做的问题
+
+34
+00:01:32,313 --> 00:01:35,019
+So, how to prioritize how you spend your time
+也就是说当你在开发一个机器学习系统时
+
+35
+00:01:35,019 --> 00:01:37,513
+when you're developing a machine learning system.
+什么工作才是接下来应该优先考虑的问题
+
+36
+00:01:38,021 --> 00:01:41,044
+So we talked about evaluation of learning algorithms,
+因此我们讨论了学习算法的评价法
+
+37
+00:01:41,044 --> 00:01:44,221
+evaluation metrics like precision recall, F1 score
+介绍了评价矩阵 比如 查准率 召回率以及F1分数
+
+38
+00:01:44,221 --> 00:01:47,072
+as well as practical aspects of evaluation
+还有评价学习算法比较实用的
+
+39
+00:01:47,072 --> 00:01:49,898
+like the training, cross-validation and test sets.
+训练集 交叉验证集和测试集
+
+40
+00:01:49,898 --> 00:01:52,319
+And we also spent a lot of time talking about
+我们也介绍了学习算法的调试
+
+41
+00:01:52,319 --> 00:01:55,741
+debugging learning algorithms and making sure
+以及如何确保
+
+42
+00:01:55,741 --> 00:01:57,212
+the learning algorithm is working.
+学习算法的正常运行
+
+43
+00:01:57,212 --> 00:01:59,075
+So we talked about diagnostics
+于是我们介绍了一些诊断法
+
+44
+00:01:59,075 --> 00:02:01,999
+like learning curves and also talked about things like
+比如学习曲线 同时也讨论了
+
+45
+00:02:01,999 --> 00:02:04,394
+error analysis and ceiling analysis.
+误差分析 上限分析等等内容
+
+46
+00:02:04,394 --> 00:02:08,187
+And so all of these were different tools for helping you to decide
+所有这些工具都能有效地指引你
+
+47
+00:02:08,187 --> 00:02:10,349
+what to do next and how to spend your valuable
+决定接下来应该怎样做
+
+48
+00:02:10,349 --> 00:02:12,585
+time when you're developing a machine learning system.
+让你把宝贵的时间用在刀刃上
+
+49
+00:02:12,585 --> 00:02:17,665
+And in addition to having the tools of machine learning at your disposal
+现在你已经掌握了很多机器学习的工具
+
+50
+00:02:17,665 --> 00:02:20,228
+so knowing the tools of machine learning like
+包括监督学习算法和无监督学习算法等等
+
+51
+00:02:20,228 --> 00:02:22,127
+supervised learning and unsupervised learning and so on,
+但除了这些以外
+
+52
+00:02:22,127 --> 00:02:26,015
+I hope that you now not only have the tools,
+我更希望你现在不仅仅只是认识这些工具
+
+53
+00:02:26,015 --> 00:02:29,457
+but that you know how to apply these tools really well
+更重要的是掌握怎样有效地利用这些工具
+
+54
+00:02:29,457 --> 00:02:32,658
+to build powerful machine learning systems.
+来建立强大的机器学习系统
+
+55
+00:02:33,658 --> 00:02:35,556
+So, that's it.
+所以 就是这样
+
+56
+00:02:35,556 --> 00:02:37,645
+Those were the topics of this class
+以上就是这门课的全部内容
+
+57
+00:02:37,645 --> 00:02:39,614
+and if you worked all the way through this course
+如果你跟着我们的课程一路走来
+
+58
+00:02:39,614 --> 00:02:41,308
+you should now consider yourself
+到现在 你应该已经感觉到
+
+59
+00:02:41,308 --> 00:02:43,511
+an expert in machine learning.
+自己已经成为机器学习方面的专家了吧
+
+60
+00:02:43,511 --> 00:02:46,879
+As you know, machine learning is a technology
+我们都知道 机器学习是一门对科技 工业
+
+61
+00:02:46,879 --> 00:02:49,916
+that's having huge impact on science, technology and industry.
+产生深远影响的重要学科
+
+62
+00:02:49,916 --> 00:02:53,360
+And you're now well qualified to use these tools
+而现在 你已经完全具备了应用这些
+
+63
+00:02:53,360 --> 00:02:55,351
+of machine learning to great effect.
+机器学习工具来创造伟大成就的能力
+
+64
+00:02:55,351 --> 00:02:57,910
+I hope that many of you in this class
+我希望你们中的很多人
+
+65
+00:02:57,910 --> 00:02:59,765
+will find ways to use machine learning
+都能在相应的领域 应用所学的机器学习工具
+
+66
+00:02:59,765 --> 00:03:02,324
+to build cool systems and cool applications
+构建出完美的机器学习系统
+
+67
+00:03:02,324 --> 00:03:03,946
+and cool products.
+开发出无与伦比的产品和应用
+
+68
+00:03:03,946 --> 00:03:06,084
+And I hope that you find ways
+并且我也希望你们
+
+69
+00:03:06,084 --> 00:03:07,930
+to use machine learning not only
+通过应用机器学习
+
+70
+00:03:07,930 --> 00:03:09,762
+to make your life better but maybe someday
+不仅仅改变自己的生活
+
+71
+00:03:09,762 --> 00:03:14,749
+to use it to make many other people's life better as well.
+有朝一日 还要让更多的人生活得更加美好!
+
+72
+00:03:14,780 --> 00:03:19,699
+I also wanted to let you know that this class has been great fun for me to teach.
+我也想告诉大家 教这门课对我来讲是一种享受
+
+73
+00:03:19,699 --> 00:03:21,788
+So, thank you for that.
+所以 谢谢大家
+
+74
+00:03:21,788 --> 00:03:23,807
+And before wrapping up,
+最后 在结束之前
+
+75
+00:03:23,807 --> 00:03:25,282
+there's just one last thing I wanted to say.
+我还想再多说一点
+
+76
+00:03:25,282 --> 00:03:28,956
+Which is that: It was maybe not so long ago,
+那就是 也许不久以前
+
+77
+00:03:28,956 --> 00:03:31,306
+that I was a student myself.
+我也是一个学生
+
+78
+00:03:31,306 --> 00:03:34,711
+And even today, you know, I still try to take different courses
+即使是现在 我也尽可能挤出时间
+
+79
+00:03:34,711 --> 00:03:36,902
+when I have time to try to learn new things.
+听一些课 学一些新的东西
+
+80
+00:03:36,902 --> 00:03:39,989
+And so I know how time-consuming it is
+所以 我深知要坚持学完这门课
+
+81
+00:03:39,989 --> 00:03:42,273
+to learn this stuff.
+是很需要花一些时间的
+
+82
+00:03:42,273 --> 00:03:44,663
+I know that you're probably a busy person
+我知道 也许你是一个很忙的人
+
+83
+00:03:44,663 --> 00:03:47,302
+with many, many other things going on in your life.
+生活中有很多很多事情要处理
+
+84
+00:03:47,302 --> 00:03:49,838
+And so the fact that you still found
+正因如此 你依然挤出时间
+
+85
+00:03:49,838 --> 00:03:52,431
+the time or took the time to watch these videos
+来观看这些课程视频
+
+86
+00:03:52,431 --> 00:03:55,799
+and, you know, many of these videos just went on
+我知道 很多视频的时间
+
+87
+00:03:55,799 --> 00:03:57,598
+for hours, right?
+都长达数小时 是吧
+
+88
+00:03:57,598 --> 00:04:00,068
+And the fact many of you took the time
+你依然花了好多时间
+
+89
+00:04:00,068 --> 00:04:01,826
+to go through the review questions
+来做这些复习题
+
+90
+00:04:01,826 --> 00:04:03,731
+and that many of you took the time
+你们中好多人 还愿意花时间
+
+91
+00:04:03,731 --> 00:04:06,250
+to work through the programming exercises.
+来研究那些编程练习
+
+92
+00:04:06,250 --> 00:04:09,483
+And these were long and complicate programming exercises.
+那些又长又复杂的编程练习
+
+93
+00:04:09,483 --> 00:04:12,840
+I wanted to say thank you for that.
+我对你们表示衷心的感谢
+
+94
+00:04:12,840 --> 00:04:17,920
+And I know that many of you have worked hard on this class
+我知道你们很多人在这门课中都非常努力
+
+95
+00:04:17,920 --> 00:04:21,880
+and that many of you have put a lot of time into this class,
+很多人都在这门课上花了很多时间
+
+96
+00:04:21,880 --> 00:04:25,396
+that many of you have put a lot of yourselves into this class.
+很多人都为这门课贡献了自己的很多精力
+
+97
+00:04:25,396 --> 00:04:29,292
+So I hope that you also got a lot of out this class.
+所以 我衷心地希望你们能从这门课中有所收获
+
+98
+00:04:29,292 --> 00:04:31,347
+And I wanted to say:
+最后我想说 再次感谢你们选修这门课程!
+
+99
+00:04:31,347 --> 00:04:36,423
+Thank you very much for having been a student in this class.
+
diff --git a/srt/2 - 1 - Model Representation (8 min).srt b/srt/2 - 1 - Model Representation (8 min).srt
new file mode 100644
index 00000000..c2d5c0ed
--- /dev/null
+++ b/srt/2 - 1 - Model Representation (8 min).srt
@@ -0,0 +1,431 @@
+1
+00:00:00,338 --> 00:00:04,677
+Our first learning algorithm will be linear regression. In this video, you'll see
+我们的第一个学习算法将 线性回归。在这段视频中,你会看到
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:04,678 --> 00:00:09,234
+what the model looks like and more importantly you'll see what the overall
+什么样的模型看起来像多 重要的是,你会看到什么整体
+
+3
+00:00:09,234 --> 00:00:14,801
+process of supervised learning looks like. Let's use some motivating example of predicting
+监督学习过程中的模样。让我们 使用一些激励的例子预测
+
+4
+00:00:14,801 --> 00:00:20,036
+housing prices. We're going to use a data set of housing prices from the city of
+住房价格上涨。我们将使用数据 从城市的住房价格设置
+
+5
+00:00:20,036 --> 00:00:25,205
+Portland, Oregon. And here I'm gonna plot my data set of a number of houses
+俄勒冈州波特兰市。在这里,我要去 绘制数据集的一些房屋
+
+6
+00:00:25,205 --> 00:00:30,833
+that were different sizes that were sold for a range of different prices. Let's say
+不同尺寸已售出 对于不同的价格范围内。比方说,
+
+7
+00:00:30,833 --> 00:00:35,872
+that given this data set, you have a friend that's trying to sell a house and
+这组数据,你有一个 朋友,试图把房子卖了,
+
+8
+00:00:35,872 --> 00:00:41,238
+let's see if friend's house is size of 1250 square feet and you want to tell them
+让我们来看看,如果朋友的房子是大小 1250平方英尺,你要告诉他们
+
+9
+00:00:41,238 --> 00:00:46,459
+how much they might be able to sell the house for. Well one thing you could do is
+他们也许能卖多少 房子。好了,你可以做的一件事是
+
+10
+00:00:46,648 --> 00:00:53,039
+fit a model. Maybe fit a straight line to this data. Looks something like that and based
+拟合模型。也许适合直线 此数据。看起来是这样的,根据
+
+11
+00:00:53,039 --> 00:00:59,168
+on that, maybe you could tell your friend that let's say maybe he can sell the
+,也许你可以告诉你的朋友 比方说,也许他可以卖
+
+12
+00:00:59,168 --> 00:01:03,575
+house for around $220,000. So this is an example of a
+房子周围220,000元。 因此,这是一个例子的
+
+13
+00:01:03,575 --> 00:01:08,834
+supervised learning algorithm. And it's supervised learning because we're given
+监督的学习算法。和它的 因为我们的监督学习
+
+14
+00:01:08,834 --> 00:01:14,287
+the, quotes, "right answer" for each of our examples. Namely we're told what was
+,报价,“正确的答案”为每个 我们的例子。即告诉我们是什么
+
+15
+00:01:14,287 --> 00:01:19,351
+the actual house, what was the actual price of each of the houses in our data
+实际的房子,什么是实际 每个房子的价格在我们的数据
+
+16
+00:01:19,351 --> 00:01:24,441
+set were sold for and moreover, this is an example of a regression problem where
+集已售出,而且,这是 的一个例子的回归问题中
+
+17
+00:01:24,441 --> 00:01:29,545
+the term regression refers to the fact that we are predicting a real-valued output
+回归一词是指这样的事实 我们预测一个真正的值输出
+
+18
+00:01:29,545 --> 00:01:34,586
+namely the price. And just to remind you the other most common type of supervised
+即价格。只是提醒你 其他监督的最常见的类型
+
+19
+00:01:34,586 --> 00:01:39,006
+learning problem is called the classification problem where we predict
+学习问题被称为 分类问题,我们预测
+
+20
+00:01:39,006 --> 00:01:45,202
+discrete-valued outputs such as if we are looking at cancer tumors and trying to
+比如,如果我们的离散值输出 看癌症肿瘤,并试图
+
+21
+00:01:45,202 --> 00:01:52,032
+decide if a tumor is malignant or benign. So that's a zero-one valued discrete output. More
+决定如果肿瘤是良性或恶性。 所以这是一个零一值离散输出。更多
+
+22
+00:01:52,032 --> 00:01:57,087
+formally, in supervised learning, we have a data set and this data set is called a
+正式监督学习,我们有 数据集并在这样的数据组被称为一个
+
+23
+00:01:57,087 --> 00:02:02,018
+training set. So for housing prices example, we have a training set of
+训练集。因此,住房价格 例如,我们有一个训练集
+
+24
+00:02:02,018 --> 00:02:07,386
+different housing prices and our job is to learn from this data how to predict prices
+不同的房价和我们的工作是 从这个数据中学习如何预测价格
+
+25
+00:02:07,386 --> 00:02:11,907
+of the houses. Let's define some notation that we're using throughout this course.
+的房子。让我们来定义一些符号 我们正在使用的整个过程。
+
+26
+00:02:11,907 --> 00:02:16,100
+We're going to define quite a lot of symbols. It's okay if you don't remember
+我们要定义颇多 符号。没关系,如果你不记得
+
+27
+00:02:16,100 --> 00:02:20,075
+all the symbols right now but as the course progresses it will be useful
+所有的符号,但现在作为 课程的进展,将是有益的
+
+28
+00:02:20,075 --> 00:02:24,267
+[inaudible] convenient notation. So I'm gonna use lower case m throughout this course to
+[听不清]方便的符号。所以,我会使用 整个本课程小写米
+
+29
+00:02:24,267 --> 00:02:28,897
+denote the number of training examples. So in this data set, if I have, you know,
+培训的例子的数字表示。所以 在这组数据中,如果我有,你知道,
+
+30
+00:02:28,897 --> 00:02:34,366
+let's say 47 rows in this table. Then I have 47 training examples and m equals 47.
+让我们说,在此表中的47列。然后我 有47个训练实例和m等于47。
+
+31
+00:02:34,366 --> 00:02:39,497
+Let me use lowercase x to denote the input variables often also called the
+让我用小写字母x表示 输入变量经常也被称为
+
+32
+00:02:39,497 --> 00:02:44,290
+features. That would be the x is here, it would the input features. And I'm gonna
+功能。这将是x是在这里,它会输入功能。我要去
+
+33
+00:02:44,290 --> 00:02:49,556
+use y to denote my output variables or the target variable which I'm going to
+用y来表示我的输出变量或 目标变量,我要去
+
+34
+00:02:49,556 --> 00:02:54,552
+predict and so that's the second column here. [inaudible] notation, I'm
+预测,所以这是第二 列在这里。 [听不清]符号,我
+
+35
+00:02:54,552 --> 00:03:05,749
+going to use (x, y) to denote a single training example. So, a single row in this
+要使用(X,Y)来表示一个单 培训的例子。所以,在此单排
+
+36
+00:03:05,749 --> 00:03:12,068
+table corresponds to a single training example and to refer to a specific
+表对应一个单一的培训 的例子,并参照特定的
+
+37
+00:03:12,068 --> 00:03:19,708
+training example, I'm going to use this notation x(i) comma gives me y(i) And, we're
+培训例子中,我将使用这个 符号X(I)逗号给了我Y(I),我们
+
+38
+00:03:25,322 --> 00:03:30,935
+going to use this to refer to the ith training example. So this superscript i
+要利用这个参阅第i个 培训的例子。所以这个下标i
+
+39
+00:03:30,935 --> 00:03:37,864
+over here, this is not exponentiation right? This (x(i), y(i)), the superscript i in
+在这里,这是不求幂 对不对? (X(I),Y(I)),上标i
+
+40
+00:03:37,864 --> 00:03:44,873
+parentheses that's just an index into my training set and refers to the ith row in
+那只是一个索引我的括号 培训是指第i行
+
+41
+00:03:44,873 --> 00:03:51,629
+this table, okay? So this is not x to the power of i, y to the power of i. Instead
+此表,好吗?所以这不是X到 电源I,Y i的功率。代替
+
+42
+00:03:51,629 --> 00:03:58,216
+(x(i), y(i)) just refers to the ith row of this table. So for example, x(1) refers to the
+(×(i)中,Y(I)),在此指的是第i行 表中。因此,例如,x(1)指的是
+
+43
+00:03:58,216 --> 00:04:04,972
+input value for the first training example so that's 2104. That's this x in the first
+输入值的第一次训练的例子,所以 那是2104。这是这个x在第一
+
+44
+00:04:04,972 --> 00:04:11,685
+row. x(2) will be equal to 1416 right? That's the second x
+一行。 ×(2)将等于 1416吧?这是第二个X
+
+45
+00:04:11,685 --> 00:04:17,385
+and y(1) will be equal to 460. The first, the y value for my first
+和y(1)将等于460。 第一,我的第一个y值
+
+46
+00:04:17,385 --> 00:04:24,526
+training example, that's what that (1) refers to. So as mentioned, occasionally I'll ask you a
+培训的例子,这就是:(1) 指。所以提到的,偶尔我会问你一个
+
+47
+00:04:24,526 --> 00:04:28,345
+question to let you check your understanding and a few seconds in this
+质疑让你检查你的 理解,在这几秒钟
+
+48
+00:04:28,345 --> 00:04:34,044
+video a multiple-choice question will pop up in the video. When it does,
+视频选择题 会弹出视频。当它,
+
+49
+00:04:34,044 --> 00:04:40,362
+please use your mouse to select what you think is the right answer. What defined by
+请使用鼠标来选择你 我认为是正确的答案。定义的
+
+50
+00:04:40,362 --> 00:04:45,124
+the training set is. So here's how this supervised learning algorithm works.
+训练集。因此,这里是如何 监督学习算法的工作原理。
+
+51
+00:04:45,124 --> 00:04:50,513
+We saw that with the training set like our training set of housing prices and we feed
+我们看到,像我们的训练集 训练集的住房价格,我们养活
+
+52
+00:04:50,513 --> 00:04:55,715
+that to our learning algorithm. Is the job of a learning algorithm to then output a
+就我们的学习算法。是对工作 学习算法,然后输出
+
+53
+00:04:55,715 --> 00:05:00,101
+function which by convention is usually denoted lowercase h and h
+按照约定的功能 通常表示小写h和h
+
+54
+00:05:00,101 --> 00:05:06,574
+stands for hypothesis And what the job of the hypothesis is, is, is a function that
+看台假说和什么样的工作, 的假设是,是,是一个函数,
+
+55
+00:05:06,574 --> 00:05:12,471
+takes as input the size of a house like maybe the size of the new house your friend's
+作为输入那样的房子的大小 也许你的朋友的新房子的大小
+
+56
+00:05:12,471 --> 00:05:18,368
+trying to sell so it takes in the value of x and it tries to output the estimated
+挂羊头卖狗肉,所以它需要的价值 x和它试图输出估计
+
+57
+00:05:18,368 --> 00:05:31,630
+value of y for the corresponding house. So h is a function that maps from x's
+y值对应的房子。 因此,h是从x的一个??函数,映射
+
+58
+00:05:31,630 --> 00:05:37,729
+to y's. People often ask me, you know, why is this function called
+y的。人们经常问我,你 知道了,这是为什么函数调用
+
+59
+00:05:37,729 --> 00:05:42,121
+hypothesis. Some of you may know the meaning of the term hypothesis, from the
+假说。你们有些人可能知道 这意味着,长期假设,从
+
+60
+00:05:42,121 --> 00:05:46,744
+dictionary or from science or whatever. It turns out that in machine learning, this
+字典或从科学或什么的。它 原来,在机器学习,这
+
+61
+00:05:46,744 --> 00:05:51,310
+is a name that was used in the early days of machine learning and it kinda stuck. 'Cause
+在初期使用的名称 机器学习和它有点卡住。因为
+
+62
+00:05:51,310 --> 00:05:55,239
+maybe not a great name for this sort of function, for mapping from sizes of
+也许不是一个伟大的名字为这种 功能,从尺寸的映射
+
+63
+00:05:55,239 --> 00:05:59,978
+houses to the predictions, that you know.... I think the term hypothesis, maybe isn't
+房子的预言,你知道.... 我认为长期假设,也许是不
+
+64
+00:05:59,978 --> 00:06:04,543
+the best possible name for this, but this is the standard terminology that people use in
+此最好的可能的名称,但是这是 标准术语的人使用
+
+65
+00:06:04,543 --> 00:06:09,362
+machine learning. So don't worry too much about why people call it that. When
+机器学习。所以不要太担心 人们为什么称呼它。何时
+
+66
+00:06:09,362 --> 00:06:14,332
+designing a learning algorithm, the next thing we need to decide is how do we
+设计一个学习算法,下一个 需要决定的事情,我们是怎么做的,我们
+
+67
+00:06:14,332 --> 00:06:20,540
+represent this hypothesis h. For this and the next few videos, I'm going to choose
+代表这假设h。为了这个以及 接下来的几个视频,我要选择
+
+68
+00:06:20,540 --> 00:06:26,978
+our initial choice , for representing the hypothesis, will be the following. We're going to
+我们最初的选择,代表 假设,会出现下面的。我们要
+
+69
+00:06:26,978 --> 00:06:33,009
+represent h as follows. And we will write this as htheta(x) equals theta0
+表示h为如下。我们将这样写: ?西塔(X)等于THETA 0 U> U>
+
+70
+00:06:33,009 --> 00:06:39,254
+plus theta1 of x. And as a shorthand, sometimes instead of writing, you
+加θ来的x 1。而作为一个缩写, 有时,而不是写作,你 U>
+
+71
+00:06:39,254 --> 00:06:45,441
+know, h subscript theta of x, sometimes there's a shorthand, I'll just write as a h of x.
+知道标西塔,H,X,有时 有一个速记,我就写一个h的x。
+
+72
+00:06:45,441 --> 00:06:51,627
+But more often I'll write it as a subscript theta over there. And plotting
+但更多的时候我会写它作为一个 标西塔那边。和绘图
+
+73
+00:06:51,627 --> 00:06:58,210
+this in the pictures, all this means is that, we are going to predict that y is a linear
+这在图片中,所有这一切意味着, 我们将要预测,y是一个线性
+
+74
+00:06:58,210 --> 00:07:04,634
+function of x. Right, so that's the data set and what this function is doing,
+x的函数。没错,所以这是 数据集,这个功能是做什么,
+
+75
+00:07:04,634 --> 00:07:11,698
+is predicting that y is some straight line function of x. That's h of x equals theta 0
+预测y是一些直 直线x的函数。这是x的?等于THETA 0
+
+76
+00:07:11,698 --> 00:07:18,450
+plus theta 1 x, okay? And why a linear function? Well, sometimes we'll want to
+加THETA 1个,好吗?为什么线性 功能?嗯,有时候我们会想
+
+77
+00:07:18,450 --> 00:07:23,405
+fit more complicated, perhaps non-linear functions as well. But since this linear
+适应更加复杂,或许非线性 功能。但是,由于这种线性
+
+78
+00:07:23,405 --> 00:07:28,298
+case is the simple building block, we will start with this example first of fitting
+案件是简单的积木,我们将 从这个例子,首先拟合
+
+79
+00:07:28,298 --> 00:07:32,943
+linear functions, and we will build on this to eventually have more complex
+线性函数,我们将建立 这最终有更复杂的
+
+80
+00:07:32,943 --> 00:07:37,403
+models, and more complex learning algorithms. Let me also give this
+模型,以及更复杂的学习 算法。让我也给这个
+
+81
+00:07:37,403 --> 00:07:42,628
+particular model a name. This model is called linear regression or this, for
+特定型号的名称。这种模式是 称为线性回归,
+
+82
+00:07:42,628 --> 00:07:48,271
+example, is actually linear regression with one variable, with the variable being
+例如,实际上是线性回归 一个变量,该变量是
+
+83
+00:07:48,271 --> 00:07:53,914
+x. Predicting all the prices as functions of one variable X. And another name for
+x的所有的价格预测功能 一个变量X的另一个名字
+
+84
+00:07:53,914 --> 00:07:58,852
+this model is univariate linear regression. And univariate is just a
+这种模式是单变量线性 回归。单因素仅仅是一个
+
+85
+00:07:58,852 --> 00:08:04,400
+fancy way of saying one variable. So, that's linear regression. In the next
+奇特的方式说一个变量。因此, 这就是线性回归。在接下来的
+
+86
+00:08:04,400 --> 00:08:09,760
+video we'll start to talk about just how we go about implementing this model.
+我们将开始谈论多么视频 我们去实施这一模式。
+
diff --git a/srt/2 - 2 - Cost Function (8 min).srt b/srt/2 - 2 - Cost Function (8 min).srt
new file mode 100644
index 00000000..d09af2fa
--- /dev/null
+++ b/srt/2 - 2 - Cost Function (8 min).srt
@@ -0,0 +1,523 @@
+1
+00:00:00,000 --> 00:00:05,399
+In this video we'll define something
+called the cost function. This will let us
+在这段视频中我们将定义代价函数的概念 这有助于我们
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:05,399 --> 00:00:10,688
+figure out how to fit the best possible
+straight line to our data. In linear
+弄清楚如何把最有可能的直线与我们的数据相拟合
+
+3
+00:00:10,688 --> 00:00:16,758
+regression we have a training set like
+that shown here. Remember our notation M
+在线性回归中我们有一个像这样的训练集 记住
+
+4
+00:00:16,758 --> 00:00:21,972
+was the number of training examples. So
+maybe M=47. And the form of the
+M代表了训练样本的数量 所以 比如 M = 47
+
+5
+00:00:21,972 --> 00:00:27,731
+hypothesis, which we use to make
+predictions, is this linear function. To
+而我们的假设函数 也就是用来进行预测的函数 是这样的线性函数形式
+
+6
+00:00:27,731 --> 00:00:33,723
+introduce a little bit more terminology,
+these theta zero and theta one, right,
+接下来我们会引入一些术语 这些θ0和θ1
+
+7
+00:00:33,723 --> 00:00:39,759
+these theta i's are what I call the
+parameters of the model. What we're
+这些θi我把它们称为模型参数 在这个视频中
+
+8
+00:00:39,759 --> 00:00:44,578
+going to do in this video is talk about
+how to go about choosing these two
+我们要做的就是谈谈如何选择这两个参数值θ0和θ1
+
+9
+00:00:44,578 --> 00:00:49,654
+parameter values, theta zero and theta
+one. With different choices of parameters
+选择不同的参数θ0和θ1
+
+10
+00:00:49,654 --> 00:00:54,408
+theta zero and theta one we get different
+hypotheses, different hypothesis
+我们会得到不同的假设 不同的假设函数
+
+11
+00:00:54,408 --> 00:00:59,355
+functions. I know some of you will
+probably be already familiar with what I'm
+我知道你们中的有些人可能已经知道我在这张幻灯片上要讲的
+
+12
+00:00:59,355 --> 00:01:04,367
+going to do on this slide, but just to
+review here are a few examples. If theta
+但我们还是用这几个例子来复习回顾一下
+
+13
+00:01:04,367 --> 00:01:09,378
+zero is 1.5 and theta one is 0, then
+the hypothesis function will look like
+如果θ0是1.5 θ1是0 那么假设函数会看起来是这样
+
+14
+00:01:09,378 --> 00:01:15,701
+this. Right, because your hypothesis
+function will be h( x) equals 1.5 plus
+是吧 因为你的假设函数是h(x)=1.5+0*x
+
+15
+00:01:15,701 --> 00:01:22,645
+0 times x which is this constant value
+function, this is flat at 1.5. If
+是这样一个常数函数 恒等于1.5
+
+16
+00:01:22,645 --> 00:01:29,332
+theta zero equals 0 and theta one
+equals 0.5, then the hypothesis will look
+如果θ0=0并且θ1=0.5 那么假设会看起来像这样
+
+17
+00:01:29,332 --> 00:01:35,536
+like this. And it should pass through this
+point (2, 1), says you now have h(x) or
+它会通过点(2,1) 这样你又得到了h(x)
+
+18
+00:01:35,536 --> 00:01:40,666
+really some htheta(x) but
+sometimes I'll just omit theta for
+或者hθ(x) 但是有时我们为了简洁会省略θ
+
+19
+00:01:40,666 --> 00:01:46,518
+brevity. So, h(x) will be equal to just
+0.5 times x which looks like that. And
+因此 h(x)将等于0.5倍的x 就像这样
+
+20
+00:01:46,518 --> 00:01:52,443
+finally if theta zero equals 1 and theta
+one equals 0.5 then we end up with the
+最后 如果θ0=1并且θ1=0.5 我们最后得到的假设会看起来像这样
+
+21
+00:01:52,443 --> 00:01:58,598
+hypothesis that looks like this. Let's
+see, it should pass through the (2, 2)
+让我们来看看 它应该通过点(2,2)
+
+22
+00:01:58,598 --> 00:02:04,468
+point like so. And this is my new h(x)
+or my new htheta(x). All right? Well
+这是我的新的h(x)或者写作hθ(x) 对吧?
+
+23
+00:02:04,468 --> 00:02:09,980
+you remember that this is
+htheta(x) but as a shorthand
+你还记得之前我们提到过hθ(x)的 但作为简写 我们通常只把它写作h(x)
+
+24
+00:02:09,980 --> 00:02:16,584
+sometimes I just write this as h(x). In
+linear regression we have a training set,
+在线性回归中 我们有一个训练集
+
+25
+00:02:16,584 --> 00:02:22,439
+like maybe the one I've plotted here. What
+we want to do is come up with values for
+可能就像我在这里绘制的 我们要做的就是
+
+26
+00:02:22,439 --> 00:02:28,295
+the parameters theta zero and theta one.
+So that the straight line we get out
+得出θ0 θ1这两个参数的值 来让假设函数表示的直线
+
+27
+00:02:28,295 --> 00:02:33,799
+of this corresponds to a straight line
+that somehow fits the data well. Like
+尽量地与这些数据点很好的拟合
+
+28
+00:02:33,799 --> 00:02:39,756
+maybe that line over there. So how do we
+come up with values theta zero, theta one
+也许就像这里的这条线一样 那么我们如何得出θ0 θ1的值
+
+29
+00:02:39,756 --> 00:02:45,350
+that corresponds to a good fit to the
+data? The idea is we're going to choose
+来使它很好地拟合数据的呢?我们的想法是 我们要选择
+
+30
+00:02:45,350 --> 00:02:51,162
+our parameters theta zero, theta one so
+that h(x), meaning the value we predict
+能使h(x) 也就是 输入x时我们预测的值
+
+31
+00:02:51,162 --> 00:02:56,330
+on input x, that this at least close to
+the values y for the examples in our
+最接近该样本对应的y值的参数θ0 θ1
+
+32
+00:02:56,330 --> 00:03:01,133
+training set, for our training examples.
+So, in our training set we're given a
+所以 在我们的训练集中我们会得到一定数量的样本
+
+33
+00:03:01,133 --> 00:03:06,505
+number of examples where we know x decides
+the house and we know the actual price of
+我们知道x表示卖出哪所房子 并且知道这所房子的实际价格
+
+34
+00:03:06,505 --> 00:03:11,796
+what it's sold for. So let's try to
+choose values for the parameters so that
+所以 我们要尽量选择参数值 使得
+
+35
+00:03:11,796 --> 00:03:17,302
+at least in the training set, given the
+x's in the training set, we make
+在训练集中 给出训练集中的x值
+
+36
+00:03:17,302 --> 00:03:23,507
+reasonably accurate predictions for the y
+values. Let's formalize this. So linear
+我们能合理准确地预测y的值
+
+37
+00:03:23,507 --> 00:03:29,401
+regression, what we're going to do is that I'm
+going to want to solve a minimization
+让我们给出标准的定义 在线性回归中 我们要解决的是一个最小化问题
+
+38
+00:03:29,401 --> 00:03:38,787
+problem. So I'm going to write minimize over theta
+zero, theta one. And, I want this to be
+所以我要写出关于θ0 θ1的最小化 而且
+
+39
+00:03:38,787 --> 00:03:44,379
+small, right, I want the difference
+between h(x) and y to be small. And one
+我希望这个式子极其小 是吧 我想要h(x)和y之间的差异要小
+
+40
+00:03:44,379 --> 00:03:50,493
+thing I'm gonna do is try to minimize the
+square difference between the output of
+我要做的事情是尽量减少假设的输出与房子真实价格
+
+41
+00:03:50,493 --> 00:03:56,159
+the hypothesis and the actual price of the
+house. Okay? So let's fill in some
+之间的差的平方 明白吗?接下来我会详细的阐述
+
+42
+00:03:56,159 --> 00:04:01,379
+details. Remember that I was using the
+notation (x(i), y(i)) to represent the
+别忘了 我用符号( x(i),y(i) )代表第i个样本
+
+43
+00:04:01,379 --> 00:04:07,418
+ith training example. So what I
+want really is to sum over my training
+所以我想要做的是对所有训练样本进行一个求和
+
+44
+00:04:07,418 --> 00:04:13,202
+set. Sum from i equals 1 to M of
+the square difference between
+对i=1到i=M的样本 将对假设进行预测得到的结果
+
+45
+00:04:13,202 --> 00:04:18,896
+this is the prediction of my hypothesis
+when it is input the size of house number
+此时的输入是第i号房子的面积 对吧
+
+46
+00:04:18,896 --> 00:04:24,380
+i, right, minus the actual price that
+house number i will sell for and I want to
+将第i号对应的预测结果 减去第i号房子的实际价格 所得的差的平方相加得到总和
+
+47
+00:04:24,380 --> 00:04:29,588
+minimize the sum of my training set sum
+from i equals 1 through M of the
+而我希望尽量减小这个值
+
+48
+00:04:29,588 --> 00:04:35,281
+difference of this squared error,
+square difference between the predicted
+也就是预测值和实际值的差的平方误差和 或者说预测价格和
+
+49
+00:04:35,281 --> 00:04:41,091
+price of the house and the price
+that it will actually sell for. And just
+实际卖出价格的差的平方
+
+50
+00:04:41,091 --> 00:04:47,723
+remind you of your notation M here was
+the, the size of my training set, right,
+我说了这里的m指的是训练集的样本容量
+
+51
+00:04:47,723 --> 00:04:53,347
+so the M there is my number of training
+examples. Right? That hash sign is the
+对吧
+
+52
+00:04:53,347 --> 00:04:59,045
+abbreviation for "number" of training
+examples. Okay? And to make some of our,
+这个井号是训练样本“个数”的缩写 对吧 而为了让表达式的数学意义
+
+53
+00:04:59,045 --> 00:05:04,888
+make the math a little bit easier, I'm
+going to actually look at, you know, 1
+变得容易理解一点 我们实际上考虑的是
+
+54
+00:05:04,888 --> 00:05:09,578
+over M times that. So we're going to try
+to minimize my average error, which we're
+这个数的1/m 因此我们要尝试尽量减少我们的平均误差
+
+55
+00:05:09,578 --> 00:05:13,926
+going to minimize one by 2M.
+Putting the 2, the constant one half, in
+也就是尽量减少其1/2m 通常是这个数的一半
+
+56
+00:05:13,926 --> 00:05:18,386
+front it just makes some of the math a
+little easier. So minimizing one half of
+前面的这些只是为了使数学更直白一点 因此对这个求和值的二分之一求最小值
+
+57
+00:05:18,386 --> 00:05:23,073
+something, right, should give you the same
+values of the parameters theta zero, theta
+应该得出相同的θ0值和相同的θ1值来
+
+58
+00:05:23,073 --> 00:05:27,647
+one as minimizing that function. And just
+make sure this, this, this equation is
+请大家一定弄清楚这个道理
+
+59
+00:05:27,647 --> 00:05:35,569
+clear, right? This expression in here,
+htheta(x), this is my, this is
+没问题吧?在这里hθ(x)的这种表达 这是我们的假设
+
+60
+00:05:35,569 --> 00:05:44,880
+our usual, right? That's equal to this
+plus theta one x(i). And, this notation,
+它等于θ0加上θ1与x(i)的乘积 而这个表达
+
+61
+00:05:44,880 --> 00:05:49,814
+minimize over theta zero and theta one,
+this means find me the values of theta
+表示关于θ0和θ1的最小化过程 这意味着我们要找到θ0和θ1
+
+62
+00:05:49,814 --> 00:05:54,369
+zero and theta one that causes this
+expression to be minimized. And this
+的值来使这个表达式的值最小
+
+63
+00:05:54,369 --> 00:05:59,557
+expression depends on theta zero and theta
+one. Okay? So just to recap, we're posing
+这个表达式因θ0和θ1的变化而变化对吧?
+
+64
+00:05:59,557 --> 00:06:04,382
+this problem as find me the values of
+theta zero and theta one so that the
+因此 简单地说 我们正在把这个问题变成 找到能使
+
+65
+00:06:04,575 --> 00:06:09,292
+average already one over two M times the
+sum of square errors between my
+我的训练集中预测值和真实值的差的平方的和
+
+66
+00:06:09,292 --> 00:06:14,590
+predictions on the training set minus the
+actual values of the houses on the
+的1/2M最小的θ0和θ1的值
+
+67
+00:06:14,590 --> 00:06:19,694
+training set is minimized. So this is
+going to be my overall objective function
+因此 这将是我的线性回归的整体目标函数
+
+68
+00:06:19,694 --> 00:06:25,127
+for linear regression. And just to, you
+know rewrite this out a little bit more
+为了使它更明确一点 我们要改写这个函数
+
+69
+00:06:25,127 --> 00:06:30,580
+cleanly what I'm going to do by convention
+is we usually define a cost function.
+按照惯例 我要定义一个代价函数
+
+70
+00:06:30,860 --> 00:06:38,965
+Which is going to be exactly this. That
+formula that I have up here. And what I
+正如屏幕中所示 这里的这个公式
+
+71
+00:06:38,965 --> 00:06:48,388
+want to do is minimize over theta zero and
+theta one my function J of theta zero
+我们想要做的就是关于θ0和θ1 对函数J(θ0,θ1)求最小值
+
+72
+00:06:48,388 --> 00:06:57,428
+comma theta one. Just write this
+out, this is my cost function. So, this
+这就是我的代价函数
+
+73
+00:06:57,428 --> 00:07:06,943
+cost function is also called the squared
+error function or sometimes called the
+代价函数也被称作平方误差函数 有时也被称为
+
+74
+00:07:06,943 --> 00:07:14,461
+square error cost function and it turns
+out that Why, why do we, you know, take
+平方误差代价函数 事实上 我们之所以要求出
+
+75
+00:07:14,461 --> 00:07:19,006
+the squares of the errors? It turns out
+that the squared error cost function is a
+误差的平方和 是因为误差平方代价函数
+
+76
+00:07:19,006 --> 00:07:23,214
+reasonable choice and will work well for
+most problems, for most regression
+对于大多数问题 特别是回归问题 都是一个合理的选择
+
+77
+00:07:23,214 --> 00:07:27,815
+problems. There are other cost functions
+that will work pretty well, but the squared
+还有其他的代价函数也能很好地发挥作用
+
+78
+00:07:27,815 --> 00:07:32,473
+error cost function is probably the most
+commonly used one for regression problems.
+但是平方误差代价函数可能是解决回归问题最常用的手段了
+
+79
+00:07:32,473 --> 00:07:36,793
+Later in this class we'll also talk about alternative
+cost functions as well, but this, this
+在后续课程中 我们还会谈论其他的代价函数
+
+80
+00:07:36,793 --> 00:07:41,282
+choice that we just had, should be a
+pret-, pretty reasonable thing to try for
+但我们刚刚讲的选择是对于大多数线性回归问题非常合理的
+
+81
+00:07:41,282 --> 00:07:45,706
+most linear regression problems. Okay. So
+that's the cost function. So far we've
+好吧 所以这是代价函数 到目前为止 我们已经
+
+82
+00:07:45,706 --> 00:07:50,899
+just seen a mathematical definition of you
+know this cost function and in case this
+介绍了代价函数的数学定义
+
+83
+00:07:50,899 --> 00:07:55,973
+function J of theta zero theta one in case
+this function seems a little bit abstract
+也许这个函数J(θ0,θ1)有点抽象
+
+84
+00:07:55,973 --> 00:08:00,808
+and you still don't have a good sense of
+what its doing in the next video, in the
+可能你仍然不知道它的内涵
+
+85
+00:08:00,808 --> 00:08:05,882
+next couple videos we're actually going to
+go a little bit deeper into what the cost
+在接下来的几个视频里 我们要更进一步解释
+
+86
+00:08:05,882 --> 00:08:10,776
+function J is doing and try to give you
+better intuition about what its computing
+代价函数J的工作原理 并尝试更直观地解释它在计算什么
+
+87
+00:08:10,776 --> 00:08:12,329
+and why we want to use it.
+以及我们使用它的目的
+【果壳教育无边界字幕组】翻译:antis 校对:cheerzzh 审核:所罗门捷列夫
+
diff --git a/srt/2 - 3 - Cost Function - Intuition I (11 min).srt b/srt/2 - 3 - Cost Function - Intuition I (11 min).srt
new file mode 100644
index 00000000..20f1a6eb
--- /dev/null
+++ b/srt/2 - 3 - Cost Function - Intuition I (11 min).srt
@@ -0,0 +1,631 @@
+1
+00:00:00,000 --> 00:00:04,100
+In the previous video, we gave the
+mathematical definition of the cost
+在以前的视频,我们给了代价函数的数学定义
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:04,100 --> 00:00:08,616
+function. In this video, let's look at
+some examples, to get back to intuition
+在这段视频中,让我们来看看一些例子,在直觉上理解下
+
+3
+00:00:08,616 --> 00:00:14,466
+about what the cost function is doing, and
+why we want to use it. To recap, here's
+代价函数是用来做什么的,和为什么我们要使用它。总括来说
+
+4
+00:00:14,466 --> 00:00:19,396
+what we had last time. We want to fit a
+straight line to our data, so we had this
+这是我们上节课学到的 我们希望找到与数据拟合的直线
+
+5
+00:00:19,396 --> 00:00:23,958
+formed as a hypothesis with these
+parameters theta zero and theta one, and
+所以 根据这些参数θ0和θ1形成为一个假设
+
+6
+00:00:23,958 --> 00:00:28,888
+with different choices of the parameters
+we end up with different straight line
+随着选择不同的参数,我们得到不同的直线
+
+7
+00:00:31,323 --> 00:00:33,758
+fits. So the data which are fit
+like so, and there's a cost function, and
+就会有像这样的数据对应 这存在一个代价函数
+
+8
+00:00:33,758 --> 00:00:38,554
+that was our optimization objective.
+[sound] So this video, in order to better
+这是我们的优化目标 所以这段视频中,为了更好地
+
+9
+00:00:38,554 --> 00:00:43,293
+visualize the cost function J, I'm going
+to work with a simplified hypothesis
+可视化代价函数J,我要使用一个简化的假设函数
+
+10
+00:00:43,293 --> 00:00:48,220
+function, like that shown on the right. So
+I'm gonna use my simplified hypothesis,
+如显示在右侧的那个 所以 我会用我的简化假设
+
+11
+00:00:48,220 --> 00:00:53,275
+which is just theta one times X. We can,
+if you want, think of this as setting the
+这个假设仅包含这些参数θ1*x,如果你愿意,你可以认为这是设置
+
+12
+00:00:53,275 --> 00:00:58,721
+parameter theta zero equal to 0. So I
+have only one parameter theta one and
+参数θ0=0 于是,我只有一个参数θ1
+
+13
+00:00:58,721 --> 00:01:04,372
+my cost function is similar to before
+except that now H of X that is now equal
+我的代价函数和以前差不太多 只不过 h(x)现在直接等于
+
+14
+00:01:04,372 --> 00:01:10,309
+to just theta one times X. And I have only
+one parameter theta one and so my
+θ1乘x 我只有θ1一个参数了,所以我们的
+
+15
+00:01:10,309 --> 00:01:16,246
+optimization objective is to minimize j of
+theta one. In pictures what this means is
+优化目标是尽量减少J(θ1) 在图片中,意思就是
+
+16
+00:01:16,246 --> 00:01:21,611
+that if theta zero equals zero that
+corresponds to choosing only hypothesis
+如果θ0=0,相当于只选择经过远点的假设函数
+
+17
+00:01:21,611 --> 00:01:27,176
+functions that pass through the origin,
+that pass through the point (0, 0). Using
+也就是通过点(0,0)
+
+18
+00:01:27,176 --> 00:01:33,415
+this simplified definition of a hypothesizing cost
+function let's try to understand the cost
+应用这个假设代价函数的简化定义,让我们去更好的了解
+
+19
+00:01:33,415 --> 00:01:40,178
+function concept better. It turns out that
+two key functions we want to understand.
+代价函数概念 这里有两个关键函数我们想要了解
+
+20
+00:01:40,178 --> 00:01:46,432
+The first is the hypothesis function, and
+the second is a cost function. So, notice
+第一个是假设函数 第二个是代价函数
+
+21
+00:01:46,432 --> 00:01:52,068
+that the hypothesis, right, H of X. For a
+face value of theta one, this is a
+注意一下这个假设函数h(x),是的,系数为θ1,这是一个关于x的函数
+
+22
+00:01:52,068 --> 00:01:58,168
+function of X. So the hypothesis is a
+function of, what is the size of the house
+因此这个函数是一个函数实现房子x的尺寸的判定
+
+23
+00:01:58,168 --> 00:02:03,959
+X. In contrast, the cost function, J,
+that's a function of the parameter, theta
+与此相反,代价函数J的参数是θ1
+
+24
+00:02:03,959 --> 00:02:09,993
+one, which controls the slope of the
+straight line. Let's plot these functions
+它控制该直线的斜率。让我们来绘制这些函数
+
+25
+00:02:09,993 --> 00:02:15,481
+and try to understand them both better.
+Let's start with the hypothesis. On the left,
+并试着去更好的了解他们 让我们以假设函数开始 在左边
+
+26
+00:02:15,481 --> 00:02:20,283
+let's say here's my training set with
+three points at (1, 1), (2, 2), and (3, 3). Let's
+这里是我的训练集有三个点(1,1),(2,2),(3,3)
+
+27
+00:02:20,283 --> 00:02:25,338
+pick a value theta one, so when theta one
+equals one, and if that's my choice for
+让我们选个θ1的值 所以当θ1=1,如果这是我的选择
+
+28
+00:02:25,338 --> 00:02:30,392
+theta one, then my hypothesis is going to
+look like this straight line over here.
+那么我的假设就是在这里的这条直线。
+
+29
+00:02:30,392 --> 00:02:35,234
+And I'm gonna point out, when I'm plotting
+my hypothesis function. X-axis, my
+我会指出,当我绘制我的假设函数。 X轴,我的横轴
+
+30
+00:02:35,234 --> 00:02:40,525
+horizontal axis is labeled X, is labeled
+you know, size of the house over here.
+横轴标记为X,标记你知道的,在这里的房子的大小。
+
+31
+00:02:40,525 --> 00:02:46,551
+Now, of temporary, set
+theta one equals one, what I want to do is
+现在,暂时的,θ1=1,我想要弄清楚的是
+
+32
+00:02:46,551 --> 00:02:52,430
+figure out what is j of theta one, when
+theta one equals one. So let's go ahead
+当θ1=1时,J(θ1)的值 所以,让我们继续前进
+
+33
+00:02:52,430 --> 00:02:58,781
+and compute what the cost function has
+for. You'll devalue one. Well, as usual,
+计算出代价函数 你会降低它的价值 嗯,像往常一样
+
+34
+00:02:58,781 --> 00:03:05,761
+my cost function is defined as follows,
+right? Some from, some of 'em are training
+我的代价函数定义如下,对不对?一些训练集
+
+35
+00:03:05,761 --> 00:03:13,840
+sets of this usual squared error term.
+And, this is therefore equal to. And this.
+误差项平方的和 这因此 等于 而这个
+
+36
+00:03:14,740 --> 00:03:25,066
+Of theta one x I minus y I and if you
+simplify this turns out to be. That. Zero
+θ1*x(i)-y(i),如果你简化这个会得到
+
+37
+00:03:25,066 --> 00:03:31,995
+Squared to zero squared to zero squared which
+is of course, just equal to zero. Now,
+平方值为0 平方为0 平方 这是当然的就等于零
+
+38
+00:03:31,995 --> 00:03:39,098
+inside the cost function. It turns out each
+of these terms here is equal to zero. Because
+在代价函数中,事实上,这些在这里是等于零
+
+39
+00:03:39,098 --> 00:03:46,288
+for the specific training set I have or my
+3 training examples are (1, 1), (2, 2), (3,3). If theta
+对于特定的训练集 我有我的3个训练样例(1,1),(2,2),(3,3)
+
+40
+00:03:46,288 --> 00:03:54,667
+one is equal to one. Then h of x. H of x
+i. Is equal to y I exactly, let me write
+如果θ1=1 那么h( x(i) )=y(i) 让我写的更清楚一点,好吗
+
+41
+00:03:54,667 --> 00:04:04,164
+this better. Right? And so, h of x minus
+y, each of these terms is equal to zero,
+所有的h(x)-y结果都是0
+
+42
+00:04:04,164 --> 00:04:14,821
+which is why I find that j of one is equal
+to zero. So, we now know that j of one Is
+也就是J(1)=0 所以,我们现在知道J(1)=0
+
+43
+00:04:14,821 --> 00:04:20,504
+equal to zero. Let's plot that. What I'm
+gonna do on the right is plot my cost
+让我描绘出它 我在右侧画出我的代价函数
+
+44
+00:04:20,504 --> 00:04:26,187
+function j. And notice, because my cost
+function is a function of my parameter
+注意,因为我的代价函数的参数是θ1
+
+45
+00:04:26,187 --> 00:04:32,017
+theta one, when I plot my cost function,
+the horizontal axis is now labeled with
+当我绘制我的代价函数,横轴是现在标为θ1
+
+46
+00:04:32,017 --> 00:04:38,069
+theta one. So I have j of one zero
+zero so let's go ahead and plot that. End
+所以,我们现在有J(1)=0让我们继续描绘
+
+47
+00:04:38,069 --> 00:04:46,464
+up with. An X over there. Now lets look at
+some other examples. Theta-1 can take on a
+那边的X 现在,让我们看看其他一些例子。 θ1可以采取
+
+48
+00:04:46,464 --> 00:04:52,470
+range of different values. Right? So
+theta-1 can take on the negative values,
+一个范围内的不同值,对吗?因此,θ1可以取负数
+
+49
+00:04:52,470 --> 00:04:58,876
+zero, positive values. So what if theta-1
+is equal to 0.5. What happens then? Let's
+0,正数 那么,如果θ1=0.5会发生什么?让我们
+
+50
+00:04:58,876 --> 00:05:05,442
+go ahead and plot that. I'm now going to
+set theta-1 equals 0.5, and in that case my
+继续前进,绘制 我现在要设置θ1=0.5,在这种情况下
+
+51
+00:05:05,442 --> 00:05:11,688
+hypothesis now looks like this. As a line
+with slope equals to 0.5, and, lets
+假设现在看起来是这样。为一条斜率等于0.5的直线
+
+52
+00:05:11,688 --> 00:05:17,855
+compute J, of 0.5. So that is going to be
+one over 2M of, my usual cost function.
+我们计算J(0.5) 所以这将是我平常代价函数的是1/2M
+
+53
+00:05:17,855 --> 00:05:23,769
+It turns out that the cost function is
+going to be the sum of square values of
+事实证明,代价函数是这些线的长度的平方值的总和
+
+54
+00:05:23,769 --> 00:05:29,609
+the height of this line. Plus the sum of
+square of the height of that line, plus
+把这些线的长度的平方相加
+
+55
+00:05:29,609 --> 00:05:34,783
+the sum of square of the height of that
+line, right? ?Cause just this vertical
+这些线的高度的平方的总和,对不对?是垂直距离
+
+56
+00:05:34,783 --> 00:05:42,854
+distance, that's the difference between,
+you know, Y. I. and the predicted value, H
+也就是 你知道的,y与预测值间的距离
+
+57
+00:05:42,854 --> 00:05:48,772
+of XI, right? So the first example
+is going to be 0.5 minus one squared.
+预测值是h( x(i) ),对不对?因此,第一个例子将是(0.5-1)的平方
+
+58
+00:05:49,033 --> 00:05:55,647
+Because my hypothesis predicted 0.5.
+Whereas, the actual value was one. For my
+因为我的假设值是0.5 然而,实际值是1
+
+59
+00:05:55,647 --> 00:06:02,436
+second example, I get, one minus two
+squared, because my hypothesis predicted
+第二个例子中,我得到(1-2)^2,因为我的假设值为1
+
+60
+00:06:02,436 --> 00:06:09,663
+one, but the actual housing price was two.
+And then finally, plus. 1.5 minus three
+但实际的住房价格是2。最后再加上(1.5-3)的平方
+
+61
+00:06:09,663 --> 00:06:17,263
+squared. And so that's equal to one over
+two times three. Because, M when trading
+所以等于1/(2*3) 因为M有3组例子
+
+62
+00:06:17,263 --> 00:06:24,274
+set size, right, have three training
+examples. In that, that's times
+在那,化简为
+
+63
+00:06:24,274 --> 00:06:33,011
+simplifying for the parentheses it's 3.5.
+So that's 3.5 over six which is about
+(3.5) 因此 结果为3.5/6 约定于0.68
+
+64
+00:06:33,011 --> 00:06:41,085
+0.68. So now we know that j of 0.5 is
+about 0.68. Lets go and plot that. Oh
+所以,现在我们知道,J(0.5)约等于0.68。让我们去绘制它
+
+65
+00:06:41,085 --> 00:06:50,308
+excuse me, math error, it's actually 0.58. So
+we plot that which is maybe about over
+对不起,数学错误,它实际上是0.58。因此,我们把它绘制在这里
+
+66
+00:06:50,308 --> 00:07:00,293
+there. Okay? Now, let's do one more. How
+about if theta one is equal to zero, what
+好吗?现在,让我们再做一个 当θ1=0会如何呢
+
+67
+00:07:00,293 --> 00:07:08,975
+is J of zero equal to? It turns out that
+if theta one is equal to zero, then H of X
+事实证明,如果θ1=0,则h(x)
+
+68
+00:07:08,975 --> 00:07:16,916
+is just equal to, you know, this flat
+line, right, that just goes horizontally
+恰好等于 一条水平线,没错,平行
+
+69
+00:07:16,916 --> 00:07:26,882
+like this. And so, measuring the errors.
+We have that J of zero is equal to one
+像这样 所以计算误差。我们有J(0)
+
+70
+00:07:26,882 --> 00:07:34,659
+over two M, times one squared plus two
+squared plus three squared, which is, One
+=1/2M ( 1^2 + 2^2 + 3^2 ) = 1/6*14约等于2.3
+
+71
+00:07:34,659 --> 00:07:41,555
+six times fourteen which is about 2.3. So
+let's go ahead and plot as well. So it
+所以,让我们继续前进,并绘制好。因此
+
+72
+00:07:41,555 --> 00:07:47,622
+ends up with a value around 2.3
+and of course we can keep on doing this
+得到与2.3左右的值,当然,我们可以继续这样做
+
+73
+00:07:47,622 --> 00:07:53,335
+for other values of theta one. It turns
+out that you can have you know negative
+为θ1赋上其他值 事实证明,你也可以给θ1赋值负数
+
+74
+00:07:53,335 --> 00:07:59,327
+values of theta one as well so if theta
+one is negative then h of x would be equal
+如果θ1是负数 比如h(x)将等于
+
+75
+00:07:59,327 --> 00:08:05,179
+to say minus 0.5 times x then theta
+one is minus 0.5 and so that corresponds
+-0.5x 当θ1=-0.5
+
+76
+00:08:05,179 --> 00:08:10,188
+to a hypothesis with a
+slope of negative 0.5. And you can
+对应的假设函数是斜率为-0.5的直线
+
+77
+00:08:10,188 --> 00:08:15,694
+actually keep on computing these errors.
+This turns out to be, you know, for 0.5,
+我们可以继续计算这些误差 你知道,当这个值为0.5,
+
+78
+00:08:15,694 --> 00:08:21,520
+it turns out to have really high error. It
+works out to be something, like, 5.25. And
+就会有非常高的误差 最后得到的数是,5.25
+
+79
+00:08:21,520 --> 00:08:28,087
+so on, and the different values of theta
+one, you can compute these things, right?
+继续,随着θ1的值不同,你可以计算出这些东西,对不对?
+
+80
+00:08:28,087 --> 00:08:34,413
+And it turns out that you, your computed
+range of values, you get something like
+而事实证明,你在一定范围计算一些值,你会得到这样的东西
+
+81
+00:08:34,413 --> 00:08:40,499
+that. And by computing the range of
+values, you can actually slowly create
+计算的值的范围,你可以慢慢地创建
+
+82
+00:08:40,499 --> 00:08:50,999
+out. What does function J of Theta say and
+that's what J of Theta is. To recap, for
+函数J(θ1)到底是什么 总的来说
+
+83
+00:08:50,999 --> 00:08:57,851
+each value of theta one, right? Each value
+of theta one corresponds to a different
+对于每个θ1的值 都对应着假设函数的一个值
+
+84
+00:08:57,851 --> 00:09:04,448
+hypothesis, or to a different straight
+line fit on the left. And for each value
+或对应着一条不同的直线,如左侧 并且每个值
+
+85
+00:09:04,448 --> 00:09:11,723
+of theta one, we could then derive a
+different value of j of theta one. And for
+θ1 我们可以得出一个不同的J(θ1)的值
+
+86
+00:09:11,723 --> 00:09:19,354
+example, you know, theta one=1,
+corresponded to this straight line
+例如,您知道,θ1= 1,符合这条直线
+
+87
+00:09:19,354 --> 00:09:27,846
+straight through the data. Whereas theta
+one=0.5. And this point shown in magenta
+正好经过数据 然而当θ1= 0.5 对应的可能就是红色画出的这条线
+
+88
+00:09:27,846 --> 00:09:35,340
+corresponded to maybe that line, and theta
+one=zero which is shown in blue that corresponds to
+当θ1=0对应的是蓝色画出的水平线
+
+89
+00:09:35,340 --> 00:09:41,527
+this horizontal line. Right, so for each
+value of theta one we wound up with a
+对,所以我们每个值θ1都对应J(θ1)一个值
+
+90
+00:09:41,527 --> 00:09:48,516
+different value of J of theta one and we
+could then use this to trace out this plot
+然后,我们可以使用这个可以在右侧描绘出这个图形
+
+91
+00:09:48,516 --> 00:09:54,461
+on the right. Now you remember, the
+optimization objective for our learning
+现在,你还记得在我们学习算法时的优化目标
+
+92
+00:09:54,461 --> 00:10:01,963
+algorithm is we want to choose the value
+of theta one. That minimizes J of theta one.
+是我们通过选择θ1的价值 最大限度地减少J(θ1)
+
+93
+00:10:01,963 --> 00:10:08,076
+Right? This was our objective function for
+the linear regression. Well, looking at
+这是我们的线性回归的目标函数的
+
+94
+00:10:08,076 --> 00:10:13,710
+this curve, the value that minimizes j of
+theta one is, you know, theta one equals
+这条曲线中 使J(θ1)最小的值是1,你知道 当θ1=1
+
+95
+00:10:13,710 --> 00:10:19,132
+to one. And low and behold, that is indeed
+the best possible straight line fit
+看呀,这确实是能最好的拟合数据的直线
+
+96
+00:10:19,132 --> 00:10:24,624
+through our data, by setting theta one
+equals one. And just, for this particular
+通过设置θ1=1 这只是以个特殊的训练集
+
+97
+00:10:24,624 --> 00:10:30,328
+training set, we actually end up fitting
+it perfectly. And that's why minimizing j
+我们实际上已经完美的满足了它 这就是为什么要
+
+98
+00:10:30,328 --> 00:10:36,447
+of theta one corresponds to finding a
+straight line that fits the data well. So,
+通使J(θ1)最小来找到一个最拟合数据的直线 因此
+
+99
+00:10:36,447 --> 00:10:40,884
+to wrap up. In this video, we looked up
+some plots. To understand the cost
+在这段视频中,我们看到了一些图形 要了解代价函数
+
+100
+00:10:40,884 --> 00:10:45,259
+function. To do so, we simplify the
+algorithm. So that it only had one
+要做到这一点,我们简化了算法,使它只能有一个参数θ1
+
+101
+00:10:45,259 --> 00:10:50,258
+parameter theta one. And we set the
+parameter theta zero to be only zero. In the next video.
+我们设置参数θ0=0 在接下来的视频
+
+102
+00:10:50,258 --> 00:10:54,445
+We'll go back to the original problem
+formulation and look at some
+我们会回到原来的问题的公式
+
+103
+00:10:54,445 --> 00:10:59,570
+visualizations involving both theta zero
+and theta one. That is without setting theta
+看一些涉及到θ0和θ1的图 这是没有设置θ0=0
+
+104
+00:10:59,570 --> 00:11:04,757
+zero to zero. And hopefully that will give
+you, an even better sense of what the cost
+希望这会让你在代价函数J是如何
+
+105
+00:11:04,757 --> 00:11:09,257
+function j is doing in the original
+linear regression formulation.
+在最初的线性回归公式中工作的有更好地理解
+
diff --git a/srt/2 - 4 - Cost Function - Intuition II (9 min).srt b/srt/2 - 4 - Cost Function - Intuition II (9 min).srt
new file mode 100644
index 00000000..72397b6b
--- /dev/null
+++ b/srt/2 - 4 - Cost Function - Intuition II (9 min).srt
@@ -0,0 +1,576 @@
+1
+00:00:00,960 --> 00:00:05,684
+In this video, lets delve deeper and get
+even better intuition about what the cost
+在这段视频中 让我们更深刻的研究更直觉的体会代价函数是来干什么的
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:05,684 --> 00:00:10,523
+function is doing. This video assumes that
+you're familiar with contour plots. If you
+此视频以你熟悉等高线图的绘制为前提
+
+3
+00:00:10,523 --> 00:00:15,189
+are not familiar with contour plots or
+contour figures some of the illustrations
+如果你不熟悉等高线图 这个视频中的某些描述
+
+4
+00:00:15,189 --> 00:00:20,144
+in this video may or may not make sense to
+you but is okay and if you end up skipping
+对你来说明确或者不明确都没关系 如果你因为以前没有见过等高线
+
+5
+00:00:20,144 --> 00:00:24,522
+this video or some of it does not quite
+make sense because you haven't seen
+而跳过这个视频或跳过一部分视频也不会有多大的影响
+
+6
+00:00:24,522 --> 00:00:29,246
+contour plots before. That's okay and you will
+still understand the rest of this course
+如果没有这部分知识 你仍然能理解这门课程的其他内容
+
+7
+00:00:29,246 --> 00:00:34,935
+without those parts of this. Here's our
+problem formulation as usual, with the
+下面是我们的问题 和往常一样
+
+8
+00:00:34,935 --> 00:00:39,882
+hypothesis parameters, cost function, and
+our optimization objective. Unlike
+有假设参数 代价函数 和我们的优化目标
+
+9
+00:00:39,882 --> 00:00:45,163
+before, unlike the last video, I'm
+going to keep both of my parameters, theta
+不像上次的视频那样 在对代价函数以可视化时 我要保留我所有的参数 θ0 和 θ1
+
+10
+00:00:45,163 --> 00:00:50,573
+zero, and theta one, as we generate our
+visualizations for the cost function. So, same
+所以像上次一样的
+
+11
+00:00:50,573 --> 00:00:57,204
+as last time, we want to understand the
+hypothesis H and the cost function J. So,
+我们想了解假设函数h和代价函数 J 因此
+
+12
+00:00:57,204 --> 00:01:04,167
+here's my training set of housing prices
+and let's make some hypothesis. You know,
+这里是我的训练集的住房价格 让我们做一些假设
+
+13
+00:01:04,167 --> 00:01:10,219
+like that one, this is not a particularly
+good hypothesis. But, if I set theta
+像这一个 这是不是一个特别好的假设 但是当我设置 θ0=50
+
+14
+00:01:10,219 --> 00:01:16,270
+zero=50 and theta one=0.06, then I end up
+with this hypothesis down here and that
+和 θ1= 0.06 那么我就得到了对应这条直线的以下一假设
+
+15
+00:01:16,270 --> 00:01:22,190
+corresponds to that straight line. Now given
+these value of theta zero and theta one,
+既然 θ0 和 θ1 被如此赋值
+
+16
+00:01:22,190 --> 00:01:27,511
+we want to plot the corresponding, you
+know, cost function on the right. What we
+我们要在右边绘制相应的 你知道 代价函数
+
+17
+00:01:27,511 --> 00:01:33,150
+did last time was, right, when we only had
+theta one. In other words, drawing plots
+像上次做的那样 当时 我们只有θ1
+
+18
+00:01:33,150 --> 00:01:37,814
+that look like this as a function of theta
+one. But now we have two parameters, theta
+因为函数只有一个参数 θ1 画出来的图像这样 但现在我们有两个参数
+
+19
+00:01:37,814 --> 00:01:42,340
+zero, and theta one, and so the plot gets
+a little more complicated. It turns out
+θ0 和 θ1 使图变得更复杂一点 事实证明
+
+20
+00:01:42,340 --> 00:01:47,699
+that when we have only one parameter, that
+the parts we drew had this sort of bow
+当我们只有一个参数 我们画的这部分想一个弓型的函数
+
+21
+00:01:47,699 --> 00:01:52,925
+shaped function. Now, when we have two
+parameters, it turns out the cost function
+现在 当我们有两个参数 代价函数有一个同样的弓型
+
+22
+00:01:52,925 --> 00:01:58,218
+also has a similar sort of bow shape. And,
+in fact, depending on your training set,
+而且事实上这取决于你的训练集
+
+23
+00:01:58,218 --> 00:02:03,511
+you might get a cost function that maybe
+looks something like this. So, this is a
+你可能会得到一个看起来像这样的代价函数
+
+24
+00:02:03,511 --> 00:02:09,404
+3-D surface plot, where the axes
+are labeled theta zero and theta one. So
+这是一个3D曲面图 轴标为 θ0 和 θ1
+
+25
+00:02:09,404 --> 00:02:15,326
+as you vary theta zero and theta one, the two
+parameters, you get different values of the
+所以 当你改变 θ0 θ1 这两个参数 你会得到不同的代价函数
+
+26
+00:02:15,326 --> 00:02:20,964
+cost function J (theta zero, theta one)
+and the height of this surface above a
+J(θ0,θ1) 的值 从表面到关于θ0 θ1 特殊点的高度
+
+27
+00:02:20,964 --> 00:02:26,347
+particular point of theta zero, theta one.
+Right, that's, that's the vertical axis. The
+对 这是 这是竖轴
+
+28
+00:02:26,347 --> 00:02:31,200
+height of the surface of the points
+indicates the value of J of theta zero, J
+从表面到关于 θ0 θ1 特殊点的高度 就是 J (θ0,θ1) 的值
+
+29
+00:02:31,200 --> 00:02:36,471
+of theta one. And you can see it sort of
+has this bow like shape. Let me show you
+你可以看到 它像是有这样的碗型形状
+
+30
+00:02:36,471 --> 00:02:46,351
+the same plot in 3D. So here's the same
+figure in 3D, horizontal axis theta one and
+让我们在 3D 中展示这个图形 这是 3D 中的同样的图形 横轴 θ1
+
+31
+00:02:46,351 --> 00:02:52,122
+vertical axis J(theta zero, theta one), and if I rotate
+this plot around. You kinda of a
+竖直轴 J (θ0,θ1) 如果我旋转这个图形 你会找到点感觉
+
+32
+00:02:52,122 --> 00:02:57,608
+get a sense, I hope, of this bowl
+shaped surface as that's what the cost
+这个碗状曲面 就是代价函数的样子
+
+33
+00:02:57,608 --> 00:03:03,592
+function J looks like. Now for the purpose
+of illustration in the rest of this video
+为了在以下的视频中进行分析
+
+34
+00:03:03,592 --> 00:03:09,077
+I'm not actually going to use these sort
+of 3D surfaces to show you the cost
+我实际上不会使用 3D 曲面来展示代价函数 J
+
+35
+00:03:09,077 --> 00:03:16,475
+function J, instead I'm going to use
+contour plots. Or what I also call contour
+我要使用等高线图 或者称为等高图像
+
+36
+00:03:16,475 --> 00:03:24,748
+figures. I guess they mean the same thing.
+To show you these surfaces. So here's an
+这两个词意义相同 为了向你展示这些曲面 因此这里是等高线图的例子
+
+37
+00:03:24,748 --> 00:03:31,135
+example of a contour figure, shown on the
+right, where the axis are theta zero and
+显示在右侧 其中的轴线为 θ0和θ1
+
+38
+00:03:31,135 --> 00:03:37,602
+theta one. And what each of these ovals,
+what each of these ellipsis shows is a set
+和这些椭圆形 省略的展现了一系列
+
+39
+00:03:37,602 --> 00:03:43,757
+of points that takes on the same value for
+J(theta zero, theta one). So
+J(θ0,θ1) 值相等的点 所以
+
+40
+00:03:43,757 --> 00:03:50,514
+concretely, for example this, you'll take
+that point and that point and that point.
+具体地说 例如这个 这一点 这一点 这一点
+
+41
+00:03:50,514 --> 00:03:55,583
+All three of these points that I just
+drew in magenta, they have the same value
+我用红色标注的这三个点 它们具有相同的 J(θ0,θ1) 值
+
+42
+00:03:55,583 --> 00:03:59,730
+for J (theta zero, theta one). Okay.
+Where, right, these, this is the theta
+好 这些 这是它的θ0 θ1轴
+
+43
+00:03:59,730 --> 00:04:04,774
+zero, theta one axis but those three have
+the same Value for J (theta zero, theta one)
+但上述三个点具有相同的 J(θ0,θ1) 值
+
+44
+00:04:04,774 --> 00:04:10,218
+and if you haven't seen contour
+plots much before think of, imagine if you
+如果你没有见过的等高线图 想象一下
+
+45
+00:04:10,218 --> 00:04:14,992
+will. A bow shaped function that's coming
+out of my screen. So that the minimum, so
+呈现在我屏幕上的碗型函数 使它的值最小的
+
+46
+00:04:14,992 --> 00:04:19,668
+the bottom of the bow is this point right
+there, right? This middle, the middle of
+碗底的底部就是的这一点 对不对?
+
+47
+00:04:19,668 --> 00:04:24,285
+these concentric ellipses. And imagine a
+bow shape that sort of grows out of my
+这些同心椭圆的中心 想象一个碗的形状渐渐变成我屏幕上的样子
+
+48
+00:04:24,285 --> 00:04:28,786
+screen like this, so that each of these
+ellipses, you know, has the same height
+所以在我屏幕上的每一个椭圆的 你知道 都有有相同的高度
+
+49
+00:04:28,786 --> 00:04:33,345
+above my screen. And the minimum with the
+bow, right, is right down there. And so
+碗型的最小值 恰好在这里
+
+50
+00:04:33,345 --> 00:04:37,787
+the contour figures is a, is way to,
+is maybe a more convenient way to
+所以等高线图 也许是一个更方便的方式来显示函数 J
+
+51
+00:04:37,787 --> 00:04:45,185
+visualize my function J. [sound] So, let's
+look at some examples. Over here, I have a
+所以让我们来看看一些例子 在这里 我有一个特殊点
+
+52
+00:04:45,185 --> 00:04:53,275
+particular point, right? And so this is,
+with, you know, theta zero equals maybe
+你知道 θ0可能等于800
+
+53
+00:04:53,275 --> 00:05:01,964
+about 800, and theta one equals maybe a
+-0.15 . And so this point, right, this
+θ1可能等于-0.15 所以这点 对 红色的这点
+
+54
+00:05:01,964 --> 00:05:07,322
+point in red corresponds to one
+set of pair values of theta zero, theta one
+对应了一对θ0 θ1的值
+
+55
+00:05:07,322 --> 00:05:12,092
+and the corresponding, in fact, to that
+hypothesis, right, theta zero is
+事实上对于该假设而言 θ0约等于800
+
+56
+00:05:12,092 --> 00:05:17,189
+about 800, that is, where it intersects
+the vertical axis is around 800, and this is
+也就是说 它与垂直轴相交于800左右
+
+57
+00:05:17,189 --> 00:05:21,763
+slope of about -0.15. Now this line is
+really not such a good fit to the
+斜率约-0.15 现在 这条线是真的并没对数据进行一个很好的拟合
+
+58
+00:05:21,763 --> 00:05:26,859
+data, right. This hypothesis, h(x), with these values of theta zero,
+对不对? 假设函数 h(x) 和θ0 θ1的值
+
+59
+00:05:26,859 --> 00:05:32,283
+theta one, it's really not such a good fit
+to the data. And so you find that, it's
+并没对数据进行一个很好的拟合 所以你会发现
+
+60
+00:05:32,283 --> 00:05:37,531
+cost. Is a value that's out here that's
+you know pretty far from the minimum right
+存在成本 是一个和最小值相比相当远的值
+
+61
+00:05:37,531 --> 00:05:42,901
+it's pretty far this is a pretty high cost
+because this is just not that good a fit
+这是一个相当高的成本 因为这并没有那么好地拟合数据
+
+62
+00:05:42,901 --> 00:05:47,247
+to the data. Let's look at some more
+examples. Now here's a different
+让我们来看看一些例子 现在 这里是一个不同的假设
+
+63
+00:05:47,247 --> 00:05:52,489
+hypothesis that's you know still not a
+great fit for the data but may be slightly
+同样并没有那么好地拟合数据 但可能会略好一点
+
+64
+00:05:52,489 --> 00:05:57,986
+better so here right that's my point that
+those are my parameters theta zero theta
+所以这里是我的点 这些是我的参数θ0 θ1的值 对吗
+
+65
+00:05:57,986 --> 00:06:07,387
+one and so my theta zero value. Right?
+That's bout 360 and my value for theta
+θ0大概等于360 θ1=0
+
+66
+00:06:07,387 --> 00:06:14,047
+one. Is equal to zero. So, you know, let's
+break it out. Let's take theta zero equals
+所以 你知道 让我们找出它 让我们把θ0=360
+
+67
+00:06:14,047 --> 00:06:20,063
+360 theta one equals zero. And this pair
+of parameters corresponds to that
+θ1=0 这对参数对应于这条假设函数
+
+68
+00:06:20,063 --> 00:06:26,161
+hypothesis, corresponds to flat line, that is, h(x) equals 360 plus zero
+对应这条平行线 也就是 h(x)=360+0*x
+
+69
+00:06:26,161 --> 00:06:32,421
+times x. So that's the hypothesis. And
+this hypothesis again has some cost, and
+所以这是假设函数 这个假设函数同样具有一定的成本
+
+70
+00:06:32,421 --> 00:06:38,600
+that cost is, you know, plotted as the
+height of the J function at that point.
+要知道 这个成本即绘制在 J 函数在该点的高度
+
+71
+00:06:38,791 --> 00:06:44,886
+Let's look at just a couple of examples.
+Here's one more, you know, at this value
+让我们来看几个例子 这里还有一个 你知道 当给θ0这个值
+
+72
+00:06:44,886 --> 00:06:52,231
+of theta zero, and at that value of theta
+one, we end up with this hypothesis, h(x)
+给θ1那个值 我们最后又一次得到了假设函数 h(x)
+
+73
+00:06:52,231 --> 00:06:58,599
+and again, not a great fit to the data,
+and is actually further away from the minimum. Last example, this is
+不能非常拟合的数据 甚至离最小值有点远 最后一个例子
+
+74
+00:06:58,599 --> 00:07:03,450
+actually not quite at the minimum, but
+it's pretty close to the minimum. So this
+其实并不是最小值 但它是相当接近最小值的 因此
+
+75
+00:07:03,450 --> 00:07:08,486
+is not such a bad fit to the, to the data,
+where, for a particular value, of, theta
+并不是太坏的拟合该数据 其中 θ0 为一个特定的值
+
+76
+00:07:08,486 --> 00:07:13,337
+zero. Which, one of them has value, as in
+for a particular value for theta one. We
+这里有一个特定的值给 θ1
+
+77
+00:07:13,337 --> 00:07:18,004
+get a particular h(x). And this is, this
+is not quite at the minimum, but it's
+于是我们得到了一个特殊的 h(x) 这并不是最小值
+
+78
+00:07:18,004 --> 00:07:23,039
+pretty close. And so the sum of squares
+errors is sum of squares distances between
+但是已经相当接近了 因此误差的平方的总和是训练样本与假设函数的距离的
+
+79
+00:07:23,039 --> 00:07:28,259
+my, training samples and my hypothesis.
+Really, that's a sum of square distances,
+的平方的和 这真的是一个距离的平方的总和 对不对?
+
+80
+00:07:28,259 --> 00:07:32,548
+right? Of all of these errors. This is
+pretty close to the minimum even though
+所有的这些误差 都相当接近最小值 虽然依旧不是最小值
+
+81
+00:07:32,548 --> 00:07:37,096
+it's not quite the minimum. So with these
+figures I hope that gives you a better
+因此通过这些图 我希望你更好地理解
+
+82
+00:07:37,096 --> 00:07:41,869
+understanding of what values of the cost
+function J, how they are and how that
+关于代价函数 J 他们是如何拟合不同的假设函数
+
+83
+00:07:41,869 --> 00:07:47,324
+corresponds to different hypothesis and so as
+how better hypotheses may corresponds to points
+和当数据非常接近函数 J 的最小值
+
+84
+00:07:47,324 --> 00:07:52,983
+that are closer to the minimum of this cost
+function J. Now of course what we really
+假设函数到底能多好地拟合数据
+
+85
+00:07:52,983 --> 00:07:57,619
+want is an efficient algorithm, right, a
+efficient piece of software for
+我们真正想要的是一个高效的算法 一个高效的软件
+
+86
+00:07:57,619 --> 00:08:02,218
+automatically finding The value of theta
+zero and theta one, that minimizes the
+能自动找寻使 J 最小θ0 θ1
+
+87
+00:08:02,218 --> 00:08:06,566
+cost function J, right? And what we, what
+we don't wanna do is to, you know, how to
+我们不想做的事情是 编写这样一个软件
+
+88
+00:08:06,566 --> 00:08:10,697
+write software, to plot out this point,
+and then try to manually read off the
+那就是绘制这个点 然后再尝试手动读取读数
+
+89
+00:08:10,697 --> 00:08:15,263
+numbers, that this is not a good way to do
+it. And, in fact, we'll see it later, that
+这并不容易做到 而且事实上 我们会在之后看到它
+
+90
+00:08:15,426 --> 00:08:19,938
+when we look at more complicated examples,
+we'll have high dimensional figures with
+当我们看更复杂的例子 我们涉及到更多的参数 用到高维的图形
+
+91
+00:08:19,938 --> 00:08:23,906
+more parameters, that, it turns out,
+we'll see in a few, we'll see later in
+我们会在以后的课程中看到一些
+
+92
+00:08:23,906 --> 00:08:28,091
+this course, examples where this figure,
+you know, cannot really be plotted, and
+不能被真正绘制的例子
+
+93
+00:08:28,091 --> 00:08:33,664
+this becomes much harder to visualize. And
+so, what we want is to have software
+视觉上更难表达 所以 我们需要有软件
+
+94
+00:08:33,664 --> 00:08:37,729
+to find the value of theta zero, theta one
+that minimizes this function and
+找到使函数最小的θ0 θ1的值
+
+95
+00:08:37,916 --> 00:08:42,914
+in the next video we start to talk about
+an algorithm for automatically finding
+在接下来的视频中 我们谈谈自动寻找使函数 J 最小的
+
+96
+00:08:42,914 --> 00:08:47,600
+that value of theta zero and theta one
+that minimizes the cost function J.
+θ0 θ1的值的算法
+【果壳教育无边界字幕组】翻译:antis 校对:Jaminalia
+
diff --git a/srt/2 - 5 - Gradient Descent (11 min).srt b/srt/2 - 5 - Gradient Descent (11 min).srt
new file mode 100644
index 00000000..e2e8192e
--- /dev/null
+++ b/srt/2 - 5 - Gradient Descent (11 min).srt
@@ -0,0 +1,830 @@
+1
+00:00:00,000 --> 00:00:04,934
+We've previously defined the cost function
+J. In this video I want to tell you about
+我们已经定义了代价函数J 而在这段视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:04,934 --> 00:00:09,634
+an algorithm called gradient descent for
+minimizing the cost function J. It turns
+我想向你们介绍梯度下降这种算法 这种算法可以将代价函数J最小化
+
+3
+00:00:09,634 --> 00:00:14,275
+out gradient descent is a more general
+algorithm and is used not only in linear
+梯度下降是很常用的算法 它不仅被用在线性回归上
+
+4
+00:00:14,275 --> 00:00:18,916
+regression. It's actually used all over
+the place in machine learning. And later
+它实际上被广泛的应用于机器学习领域中的众多领域
+
+5
+00:00:18,916 --> 00:00:23,791
+in the class we'll use gradient descent to
+minimize other functions as well, not just
+在后面课程中 为了解决其他线性回归问题 我们也将使用梯度下降法
+
+6
+00:00:23,791 --> 00:00:27,845
+the cost function J, for linear regression.
+So in this video, I'm going to
+最小化其他函数 而不仅仅是只用在本节课的代价函数J
+
+7
+00:00:27,845 --> 00:00:32,558
+talk about gradient descent for minimizing
+some arbitrary function J. And then in
+因此在这个视频中 我将讲解用梯度下降算法最小化函数 J
+
+8
+00:00:32,558 --> 00:00:37,406
+later videos, we'll take those algorithm
+and apply it specifically to the cost
+在后面的视频中 我们还会将此算法应用于具体的
+
+9
+00:00:37,406 --> 00:00:43,332
+function J that we had to find for linear
+regression. So here's the problem
+代价函数J中来解决线性回归问题 下面是问题概述
+
+10
+00:00:43,332 --> 00:00:48,112
+setup. We're going to see that we have
+some function J of (theta0, theta1).
+在这里 我们有一个函数J(θ0, θ1)
+
+11
+00:00:48,112 --> 00:00:52,773
+Maybe it's a cost function from linear
+regression. Maybe it's some other function
+也许这是一个线性回归的代价函数 也许是一些其他函数
+
+12
+00:00:52,773 --> 00:00:56,801
+we want to minimize. And we want
+to come up with an algorithm for
+要使其最小化 我们需要用一个算法
+
+13
+00:00:56,801 --> 00:01:01,174
+minimizing that as a function of J of
+(theta0, theta1). Just as an aside,
+来最小化函数J(θ0, θ1) 就像刚才说的
+
+14
+00:01:01,174 --> 00:01:05,793
+it turns out that gradient descent
+actually applies to more general
+事实证明 梯度下降算法可应用于
+
+15
+00:01:05,793 --> 00:01:10,994
+functions. So imagine if you have
+a function that's a function of
+多种多样的函数求解 所以想象一下如果你有一个函数
+
+16
+00:01:10,994 --> 00:01:16,194
+J of (theta0, theta1, theta2, up to
+some theta n). And you want to
+J(θ0, θ1, θ2, ...,θn )
+
+17
+00:01:16,405 --> 00:01:21,795
+minimize over (theta0 up to theta n)
+of this J of (theta0 up to theta n).
+你希望可以通过最小化 θ0到θn 来最小化此代价函数J(θ0 到θn)
+
+18
+00:01:21,795 --> 00:01:26,580
+It turns out gradient descent
+is an algorithm for solving
+用n个θ是为了证明梯度下降算法可以解决更一般的问题
+
+19
+00:01:26,580 --> 00:01:31,368
+this more general problem, but for the
+sake of brevity, for the sake of
+但为了简洁起见 为了简化符号
+
+20
+00:01:31,368 --> 00:01:35,935
+your succinctness of notation, I'm just
+going to present only two parameters
+在接下来的视频中 我只用两个参数
+
+21
+00:01:36,113 --> 00:01:41,097
+throughout the rest of this video. Here's
+the idea for gradient descent. What we're
+下面就是关于梯度下降的构想
+
+22
+00:01:41,097 --> 00:01:45,882
+going to do is we're going to start off
+with some initial guesses for theta0 and
+我们要做的是 我们要开始对θ0和θ1 进行一些初步猜测
+
+23
+00:01:45,882 --> 00:01:50,788
+theta1. Doesn't really matter what they
+are, but a common choice would be if we
+它们到底是什么其实并不重要 但通常的选择是将 θ0设为0
+
+24
+00:01:50,788 --> 00:01:55,452
+set theta0 to 0, and
+set theta1 to 0. Just initialize
+将θ1也设为0 将它们都初始化为0
+
+25
+00:01:55,452 --> 00:02:00,322
+them to 0. What we're going to do in
+gradient descent is we'll keep changing
+我们在梯度下降算法中要做的
+
+26
+00:02:00,322 --> 00:02:05,258
+theta0 and theta1 a little bit to
+try to reduce J of (theta0, theta1)
+就是不停地一点点地改变 θ0和θ1 试图通过这种改变使得J(θ0, θ1)变小
+
+27
+00:02:05,258 --> 00:02:10,571
+until hopefully we wind up at a minimum or
+maybe a local minimum. So, let's see
+直到我们找到 J 的最小值 或许是局部最小值
+
+28
+00:02:10,796 --> 00:02:16,106
+see pictures of what gradient descent
+does. Let's say I try to minimize this
+让我们通过一些图片来看看梯度下降法是如何工作的
+
+29
+00:02:16,106 --> 00:02:20,849
+function. So notice the axes. This is,
+(theta0, theta1) on the horizontal
+我在试图让这个函数值最小 注意坐标轴 θ0和θ1在水平轴上
+
+30
+00:02:20,849 --> 00:02:25,774
+axes, and J is a vertical axis. And so the
+height of the surface shows J, and we
+而函数 J在垂直坐标轴上 图形表面高度则是 J的值
+
+31
+00:02:25,774 --> 00:02:30,582
+want to minimize this function. So, we're
+going to start off with (theta0, theta1)
+我们希望最小化这个函数 所以我们从 θ0和θ1的某个值出发
+
+32
+00:02:30,582 --> 00:02:35,375
+at some point. So imagine picking some
+value for (theta0, theta1), and that
+所以想象一下 对 θ0和θ1赋以某个初值
+
+33
+00:02:35,375 --> 00:02:39,934
+corresponds to starting at some point on
+the surface of this function. Okay? So
+也就是对应于从这个函数表面上的某个起始点出发 对吧
+
+34
+00:02:39,934 --> 00:02:44,201
+whatever value of (theta0, theta1)
+gives you some point here. I did
+所以不管 θ0和θ1的取值是多少
+
+35
+00:02:44,201 --> 00:02:48,819
+initialize them to 0, but
+sometimes you initialize it to other
+我将它们初始化为0 但有时你也可把它初始化为其他值
+
+36
+00:02:48,819 --> 00:02:53,942
+values as well. Now. I want us to imagine
+that this figure shows a hill. Imagine
+现在我希望大家把这个图像想象为一座山
+
+37
+00:02:53,942 --> 00:02:59,178
+this is like a landscape of some grassy
+park with two hills like so.
+想像类似这样的景色 公园中有两座山
+
+38
+00:02:59,178 --> 00:03:04,618
+And I want you to imagine that you are
+physically standing at that point on the
+想象一下你正站立在山的这一点上
+
+39
+00:03:04,618 --> 00:03:09,990
+hill on this little red hill in your park.
+In gradient descent, what we're
+站立在你想象的公园这座红色山上 在梯度下降算法中
+
+40
+00:03:09,990 --> 00:03:15,770
+going to do is spin 360 degrees around and
+just look all around us and ask, "If I were
+我们要做的就是旋转360度 看看我们的周围 并问自己
+
+41
+00:03:15,770 --> 00:03:20,423
+to take a little baby step in some
+direction, and I want to go downhill as
+我要在某个方向上 用小碎步尽快下山
+
+42
+00:03:20,423 --> 00:03:25,320
+quickly as possible, what direction do I
+take that little baby step in if I want to
+如果我想要下山 如果我想尽快走下山
+
+43
+00:03:25,320 --> 00:03:29,686
+go down, if I sort of want to physically
+walk down this hill as rapidly as
+这些小碎步需要朝什么方向?
+
+44
+00:03:29,686 --> 00:03:34,465
+possible?" Turns out that if we're standing
+at that point on the hill, you look all
+如果我们站在山坡上的这一点
+
+45
+00:03:34,465 --> 00:03:39,185
+around, you find that the best direction
+to take a little step downhill
+你看一下周围 你会发现最佳的下山方向
+
+46
+00:03:39,185 --> 00:03:44,035
+is roughly that direction. Okay. And now
+you're at this new point on your hill.
+大约是那个方向 好的 现在你在山上的新起点上
+
+47
+00:03:44,035 --> 00:03:49,430
+You're going to, again, look all around, and then
+say, "What direction should I step in order
+你再看看周围 然后再一次想想
+
+48
+00:03:49,430 --> 00:03:54,695
+to take a little baby step downhill?" And
+if you do that and take another step, you
+我应该从什么方向迈着小碎步下山? 然后你按照自己的判断又迈出一步
+
+49
+00:03:54,695 --> 00:03:59,700
+take a step in that direction, and then
+you keep going. You know, from this new
+往那个方向走了一步 然后重复上面的步骤
+
+50
+00:03:59,700 --> 00:04:04,835
+point, you look around, decide what
+direction will take you downhill most
+从这个新的点 你环顾四周 并决定从什么方向将会最快下山
+
+51
+00:04:04,835 --> 00:04:09,775
+quickly, take another step, another step,
+and so on, until you converge to this,
+然后又迈进了一小步 又是一小步
+并依此类推 直到你接近这里
+
+52
+00:04:09,970 --> 00:04:15,059
+local minimum down here. Further descent
+has an interesting property. This first
+直到局部最低点的位置 此外 这种下降有一个有趣的特点
+
+53
+00:04:15,059 --> 00:04:19,682
+time we ran gradient descent, we were
+starting at this point over here, right?
+第一次我们是从这个点开始进行梯度下降算法的 是吧
+
+54
+00:04:19,682 --> 00:04:24,183
+Started at that point over here. Now
+imagine, we initialize gradient
+在这一点上从这里开始 现在想象一下
+
+55
+00:04:24,183 --> 00:04:29,232
+descent just a couple steps to the right.
+Imagine we initialized gradient descent with
+我们在刚才的右边一些的位置 对梯度下降进行初始化
+
+56
+00:04:29,232 --> 00:04:34,159
+that point on the upper right. If you were
+to repeat this process, and stop at the
+想象我们在右边高一些的这个点 开始使用梯度下降 如果你重复上述步骤
+
+57
+00:04:34,159 --> 00:04:39,207
+point, and look all around. Take a little
+step in the direction of steepest descent.
+停留在该点 并环顾四周 往下降最快的方向迈出一小步
+
+58
+00:04:39,207 --> 00:04:44,772
+You would do that. Then look around, take
+another step, and so on. And if you start
+然后环顾四周 又迈出一步 然后如此往复
+
+59
+00:04:44,772 --> 00:04:50,570
+it just a couple steps to the right, the
+gradient descent will have taken you to
+如果你从右边不远处开始 梯度下降算法将会带你来到
+
+60
+00:04:50,570 --> 00:04:56,236
+this second local optimum over on the
+right. So if you had started at this first
+这个右边的第二个局部最优处 如果从刚才的第一个点出发
+
+61
+00:04:56,236 --> 00:05:01,602
+point, you would have wound up at this
+local optimum. But if you started just a
+你会得到这个局部最优解 但如果你的起始点偏移了一些
+
+62
+00:05:01,602 --> 00:05:06,762
+little bit, a slightly different location,
+you would have wound up at a very
+起始点的位置略有不同 你会得到一个
+
+63
+00:05:06,762 --> 00:05:12,197
+different local optimum. And this is a
+property of gradient descent that we'll
+非常不同的局部最优解 这就是梯度下降算法的一个特点
+
+64
+00:05:12,197 --> 00:05:17,425
+say a little bit more about later. So,
+that's the intuition in pictures. Let's
+我们会在之后继续探讨这个问题 好的 这是我们从图中得到的直观感受
+
+65
+00:05:17,425 --> 00:05:22,929
+look at the map. This is the definition of
+the gradient descent algorithm. We're
+看看这个图 这是梯度下降算法的定义
+
+66
+00:05:22,929 --> 00:05:28,240
+going to just repeatedly do this. On to
+convergence. We're going to update my
+我们将会反复做这些 直到收敛 我们要更新参数 θj
+
+67
+00:05:28,240 --> 00:05:33,543
+parameter theta j by, you know, taking
+theta j and subtracting from it alpha
+方法是 用 θj 减去 α乘以这一部分
+
+68
+00:05:33,543 --> 00:05:39,129
+times this term over here. So, let's
+see. There are a lot of details in this
+让我们来看看 这个公式有很多细节问题
+
+69
+00:05:39,129 --> 00:05:45,030
+equation, so let me unpack some of it.
+First, this notation here, colon equals.
+我来详细讲解一下 首先 注意这个符号 :=
+
+70
+00:05:45,030 --> 00:05:51,643
+We're going to use := to denote
+assignment, so it's the assignment
+我们使用 := 表示赋值 这是一个赋值运算符
+
+71
+00:05:51,643 --> 00:05:57,790
+operator. So concretely, if I
+write A: =B, what this means in
+具体地说 如果我写 a:= b
+
+72
+00:05:57,790 --> 00:06:02,878
+a computer, this means take the
+value in B and use it to overwrite
+在计算机专业内 这意味着不管 a的值是什么
+
+73
+00:06:02,878 --> 00:06:08,517
+whatever the value of A. So this means we
+will set A to be equal to the value of B.
+取 b的值 并将其赋给a 这意味着我们让 a等于b的值
+
+74
+00:06:08,517 --> 00:06:13,674
+Okay, it's assignment. I can
+also do A:=A+1. This means
+这就是赋值 我也可以做 a:= a+1
+
+75
+00:06:13,674 --> 00:06:18,969
+take A and increase its value by one.
+Whereas in contrast, if I use the equals
+这意味着 取出a值 并将其增加1 与此不同的是
+
+76
+00:06:18,969 --> 00:06:26,067
+sign and I write A=B, then this is a
+truth assertion. So if I write A=B,
+如果我使用等号 = 并且写出a=b 那么这是一个判断为真的声明
+
+77
+00:06:26,067 --> 00:06:31,006
+then I'm asserting that the value of A
+equals to the value of B. So the left hand
+如果我写 a=b 就是在断言 a的值是等于 b的值的
+
+78
+00:06:31,006 --> 00:06:36,331
+side, that's a computer operation, where
+you set the value of A to be a value. The
+在左边这里 这是计算机运算 将一个值赋给 a
+
+79
+00:06:36,331 --> 00:06:41,399
+right hand side, this is asserting, I'm
+just making a claim that the values of A
+而在右边这里 这是声明 声明 a的值 与b的值相同
+
+80
+00:06:41,399 --> 00:06:46,274
+and B are the same. And so, whereas I can
+write A: =A+1, that means increment A by
+因此 我可以写 a:=a+1 这意味着 将 a的值再加上1
+
+81
+00:06:46,274 --> 00:06:50,764
+1. Hopefully, I won't ever write A=A+1.
+Because that's just wrong.
+但我不会写 a=a+1 因为这本来就是错误的
+
+82
+00:06:50,764 --> 00:06:55,704
+A and A+1 can never be equal to
+the same values. So that's the first
+a 和 a+1 永远不会是同一个值 这是这个定义的第一个部分
+
+83
+00:06:55,704 --> 00:07:05,733
+part of the definition. This alpha
+here is a number that is called the
+这里的α 是一个数字 被称为学习速率
+
+84
+00:07:05,733 --> 00:07:12,360
+learning rate. And what alpha does is, it
+basically controls how big a step we take
+什么是α呢? 在梯度下降算法中 它控制了
+
+85
+00:07:12,360 --> 00:07:17,113
+downhill with gradient descent. So if
+alpha is very large, then that corresponds
+我们下山时会迈出多大的步子 因此如果 α值很大
+
+86
+00:07:17,113 --> 00:07:21,925
+to a very aggressive gradient descent
+procedure, where we're trying to take huge
+那么相应的梯度下降过程中 我们会试图用大步子下山
+
+87
+00:07:21,925 --> 00:07:26,322
+steps downhill. And if alpha is very
+small, then we're taking little, little
+如果α值很小 那么我们会迈着很小的小碎步下山
+
+88
+00:07:26,322 --> 00:07:31,194
+baby steps downhill. And, I'll come
+back and say more about this later.
+关于如何设置 α的值等内容 在之后的课程中
+
+89
+00:07:31,194 --> 00:07:35,660
+About how to set alpha and so on.
+And finally, this term here. That's the
+我会回到这里并且详细说明 最后 是公式的这一部分
+
+90
+00:07:35,660 --> 00:07:40,582
+derivative term, I don't want to talk
+about it right now, but I will derive this
+这是一个微分项 我现在不想谈论它
+
+91
+00:07:40,582 --> 00:07:45,564
+derivative term and tell you exactly what
+this is based on. And some of you
+但我会推导出这个微分项 并告诉你到底这要如何计算
+
+92
+00:07:45,564 --> 00:07:50,547
+will be more familiar with calculus than
+others, but even if you aren't familiar
+你们中有人大概比较熟悉微积分 但即使你不熟悉微积分
+
+93
+00:07:50,547 --> 00:07:55,469
+with calculus don't worry about it, I'll
+tell you what you need to know about this
+也不用担心 我会告诉你 对这一项 你最后需要做什么
+
+94
+00:07:55,469 --> 00:08:00,580
+term here. Now there's one more subtlety
+about gradient descent which is, in
+现在 在梯度下降算法中 还有一个更微妙的问题
+
+95
+00:08:00,580 --> 00:08:05,837
+gradient descent, we're going to
+update theta0 and theta1. So
+在梯度下降中 我们要更新 θ0和θ1
+
+96
+00:08:05,837 --> 00:08:10,699
+this update takes place where j=0, and
+j=1. So you're going to update j, theta0,
+当 j=0 和 j=1 时 会产生更新 所以你将更新 J θ0还有θ1
+
+97
+00:08:10,699 --> 00:08:15,955
+and update theta1. And the subtlety of
+how you implement gradient descent is,
+实现梯度下降算法的微妙之处是
+
+98
+00:08:15,955 --> 00:08:21,562
+for this expression, for this
+update equation, you want to
+在这个表达式中 如果你要更新这个等式
+
+99
+00:08:21,562 --> 00:08:31,384
+simultaneously update theta0 and
+theta1. What I mean by that is
+你需要同时更新 θ0和θ1
+
+100
+00:08:31,384 --> 00:08:36,432
+that in this equation,
+we're going to update
+我的意思是在这个等式中 我们要这样更新
+
+101
+00:08:36,432 --> 00:08:40,975
+theta0:=theta0 - something, and update
+theta1:=theta1 - something.
+θ0:=θ0 - 一些东西 并更新 θ1:=θ1 - 一些东西
+
+102
+00:08:40,975 --> 00:08:45,834
+And the way to implement this is,
+you should compute the right hand
+实现方法是 你应该计算公式右边的部分
+
+103
+00:08:45,834 --> 00:08:52,677
+side. Compute that thing for both theta0
+and theta1, and then simultaneously at
+通过那一部分计算出θ0和θ1的值
+
+104
+00:08:52,677 --> 00:08:57,469
+the same time update theta0 and
+theta1. So let me say what I mean
+然后同时更新 θ0和θ1 让我进一步阐述这个过程
+
+105
+00:08:57,469 --> 00:09:02,024
+by that. This is a correct implementation
+of gradient descent meaning simultaneous
+在梯度下降算法中 这是正确实现同时更新的方法
+
+106
+00:09:02,024 --> 00:09:06,461
+updates. I'm going to set temp0 equals
+that, set temp1 equals that. So basically
+我要设 temp0等于这些 设temp1等于那些
+
+107
+00:09:06,461 --> 00:09:11,430
+compute the right hand sides. And then having
+computed the right hand sides and stored
+所以首先计算出公式右边这一部分 然后将计算出的结果
+
+108
+00:09:11,430 --> 00:09:15,926
+them together in temp0 and temp1,
+I'm going to update theta0 and theta1
+一起存入 temp0和 temp1 之中 然后同时更新 θ0和θ1
+
+109
+00:09:15,926 --> 00:09:20,245
+simultaneously, because that's the
+correct implementation. In contrast,
+因为这才是正确的实现方法
+
+110
+00:09:20,245 --> 00:09:25,533
+here's an incorrect implementation that
+does not do a simultaneous update. So in
+与此相反 下面是不正确的实现方法 因为它没有做到同步更新
+
+111
+00:09:25,533 --> 00:09:31,666
+this incorrect implementation, we compute
+temp0, and then we update theta0
+在这种不正确的实现方法中 我们计算 temp0 然后我们更新θ0
+
+112
+00:09:31,666 --> 00:09:36,644
+and then we compute temp1. Then we
+update temp1. And the difference between
+然后我们计算 temp1 然后我们将 temp1 赋给θ1
+
+113
+00:09:36,644 --> 00:09:41,877
+the right hand side and the left hand side
+implementations is that if we look down
+右边的方法和左边的区别是 让我们看这里
+
+114
+00:09:41,877 --> 00:09:46,791
+here, you look at this step, if by this
+time you've already updated theta0
+就是这一步 如果这个时候你已经更新了θ0
+
+115
+00:09:46,791 --> 00:09:51,897
+then you would be using the new
+value of theta0 to compute this
+那么你会使用 θ0的新的值来计算这个微分项
+
+116
+00:09:51,897 --> 00:09:57,340
+derivative term and so this gives you a
+different value of temp1 than the left
+所以由于你已经在这个公式中使用了新的 θ0的值
+
+117
+00:09:57,340 --> 00:10:01,565
+hand side, because you've now
+plugged in the new value of theta0
+那么这会产生一个与左边不同的 temp1的值
+
+118
+00:10:01,565 --> 00:10:05,852
+into this equation. And so this on right
+hand side is not a correct implementation
+所以右边并不是正确地实现梯度下降的做法
+
+119
+00:10:05,852 --> 00:10:09,916
+of gradient descent. So I don't
+want to say why you need to do the
+我不打算解释为什么你需要同时更新
+
+120
+00:10:09,916 --> 00:10:14,617
+simultaneous updates, it turns
+out that the way gradient descent
+同时更新是梯度下降中的一种常用方法
+
+121
+00:10:14,617 --> 00:10:18,735
+is usually implemented, we'll say more
+about it later, it actually turns out to
+我们之后会讲到
+
+122
+00:10:18,735 --> 00:10:22,496
+be more natural to implement the
+simultaneous update. And when people talk
+实际上同步更新是更自然的实现方法
+
+123
+00:10:22,496 --> 00:10:26,665
+about gradient descent, they always mean
+simultaneous update. If you implement the
+当人们谈到梯度下降时 他们的意思就是同步更新
+
+124
+00:10:26,665 --> 00:10:30,630
+non-simultaneous update, it turns out
+it will probably work anyway, but this
+如果用非同步更新去实现算法 代码可能也会正确工作
+
+125
+00:10:30,630 --> 00:10:34,747
+algorithm on the right is not what people
+people refer to as gradient descent and
+但是右边的方法并不是人们所指的那个梯度下降算法
+
+126
+00:10:34,747 --> 00:10:38,356
+this is some other algorithm with
+different properties. And for various
+而是具有不同性质的其他算法
+
+127
+00:10:38,356 --> 00:10:42,220
+reasons, this can behave in
+slightly stranger ways. And
+由于各种原因 这其中会表现出微小的差别
+
+128
+00:10:42,220 --> 00:10:46,626
+what you should do is to really
+implement the simultaneous update of
+你应该做的是 在梯度下降中真正实现同时更新
+
+129
+00:10:46,626 --> 00:10:52,313
+gradient descent. So, that's the outline of the
+gradient descent algorithm. In the next video,
+这些就是梯度下降算法的梗概
+
+130
+00:10:52,313 --> 00:10:56,998
+we're going to go into the details of the
+derivative term, which I wrote out but
+在接下来的视频中 我们要进入这个微分项的细节之中
+
+131
+00:10:56,998 --> 00:11:01,799
+didn't really define. And if you've taken
+a calculus class before and if you're
+我已经写了出来但没有真正定义 如果你已经修过微积分课程
+
+132
+00:11:01,799 --> 00:11:06,367
+familiar with partial derivatives and
+derivatives, it turns out that's exactly
+如果你熟悉偏导数和导数 这其实就是这个微分项
+
+133
+00:11:06,367 --> 00:11:11,425
+what that derivative term is. But in case
+you aren't familiar with calculus, don't
+如果你不熟悉微积分 不用担心
+
+134
+00:11:11,425 --> 00:11:15,680
+worry about it. The next video will give
+you all the intuitions and will tell you
+即使你之前没有看过微积分
+
+135
+00:11:15,680 --> 00:11:19,883
+everything you need to know to compute
+that derivative term, even if you haven't
+或者没有接触过偏导数 在接下来的视频中
+
+136
+00:11:19,883 --> 00:11:24,296
+seen calculus, or even if you haven't seen
+partial derivatives before. And with
+你会得到一切你需要知道的 如何计算这个微分项的知识
+
+137
+00:11:24,296 --> 00:11:28,288
+that, with the next video, hopefully,
+we'll be able to give all the intuitions
+下一个视频中 希望我们能够给出
+
+138
+00:11:28,288 --> 00:11:30,180
+you need to apply gradient descent.
+实现梯度下降算法的所有知识
+【果壳教育无边界字幕组】翻译:10号少年 校对:小白_远游 审核:所罗门捷列夫
+
diff --git a/srt/2 - 6 - Gradient Descent Intuition (12 min).srt b/srt/2 - 6 - Gradient Descent Intuition (12 min).srt
new file mode 100644
index 00000000..12be2cf4
--- /dev/null
+++ b/srt/2 - 6 - Gradient Descent Intuition (12 min).srt
@@ -0,0 +1,783 @@
+1
+00:00:00,000 --> 00:00:04,353
+In the previous video, we gave a
+mathematical definition of gradient
+在之前的视频中 我们给出了一个数学上关于梯度
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:04,353 --> 00:00:09,464
+descent. Let's delve deeper, and in this
+video, get better intuition about what the
+下降的定义 本次视频我们更深入研究一下 更直观地感受一下这个
+
+3
+00:00:09,464 --> 00:00:14,701
+algorithm is doing, and why the steps of
+the gradient descent algorithm might make
+算法是做什么的 以及梯度下降算法的更新过程有什么意义
+
+4
+00:00:14,701 --> 00:00:20,639
+sense. Here's the gradient descent
+algorithm that we saw last time. And, just
+这是我们上次视频中看到的梯度下降算法
+
+5
+00:00:20,639 --> 00:00:26,427
+to remind you, this parameter, or this
+term, alpha, is called the learning rate.
+提醒一下 这个参数 α 术语称为学习速率
+
+6
+00:00:26,427 --> 00:00:32,444
+And it controls how big a step we take
+when updating my parameter theta J. And
+它控制我们以多大的幅度更新这个参数θj.
+
+7
+00:00:32,444 --> 00:00:41,360
+this second term here is the derivative
+term. And what I want to do in this video
+第二部分是导数项 而我在这个视频中要做的就是
+
+8
+00:00:41,360 --> 00:00:47,360
+is give you better intuition about what each of
+these two terms is doing and why, when put
+给你一个更直观的认识 这两部分有什么用 以及 为什么当把
+
+9
+00:00:47,360 --> 00:00:53,077
+together, this entire update makes sense.
+In order to convey these intuitions, what
+这两部分放一起时 整个更新过程是有意义的 为了更好地让你明白
+
+10
+00:00:53,077 --> 00:00:58,460
+I want to do is use a slightly simpler
+example where we want to minimize the
+我要做是用一个稍微简单的例子 比如我们想最小化的那个
+
+11
+00:00:58,460 --> 00:01:03,022
+function of just one parameter. So, so we
+have a, say we have a cost function J of
+函数只有一个参数的情形 所以 假如我们有一个代价函数J
+
+12
+00:01:03,022 --> 00:01:07,294
+just one parameter, theta one, like we
+did, you know, a few videos back. Where
+只有一个参数 θ1 就像我们前几次视频中讲的
+
+13
+00:01:07,294 --> 00:01:11,913
+theta one is a real number, okay? Just so we can have 1D plots, which
+θ1是一个实数 对吧?那么我们可以画出一维的曲线
+
+14
+00:01:11,913 --> 00:01:16,416
+are a little bit simpler to look at. And
+let's try to understand why gradient
+看起来很简单 让我们试着去理解 为什么梯度下降法
+
+15
+00:01:16,416 --> 00:01:23,940
+descent would do on this function.
+[sound]. So, let's say here's my function.
+会在这个函数上起作用 所以 假如这是我的函数
+
+16
+00:01:24,660 --> 00:01:31,696
+J of theta one, and so that's my, and
+where theta one is a real number. Right,
+关于θ1的函数J θ1是一个实数 对吧?
+
+17
+00:01:31,696 --> 00:01:39,202
+now let's say I've initialized gradient
+descent with theta one at this location.
+现在我们已经对这个点上用于梯度下降法的θ1 进行了初始化
+
+18
+00:01:39,202 --> 00:01:46,989
+So image that we start off at that point
+on my function. What gradient descent will
+想象一下在我的函数图像上 从那个点出发 那么梯度下降
+
+19
+00:01:46,989 --> 00:01:56,935
+do, is it will update. Theta one gets
+updated as theta one minus Alpha times DD
+要做的事情是不断更新 θ1等于θ1减α倍的
+
+20
+00:01:56,935 --> 00:02:04,694
+theta one J of theta one, right? and oh an
+just as an aside you know this, this
+d/dθ1J(θ1)这个项 对吧?哦 顺便插一句 你知道
+
+21
+00:02:04,694 --> 00:02:11,636
+derivative term, right? If you're
+wondering why I changed the notation from
+这个微分项是吧?可能你想问为什么我改变了符号
+
+22
+00:02:11,636 --> 00:02:16,132
+these partial derivative symbols. If you
+don't know what the difference is between
+之前用的是偏导数的符号 如果你不知道偏导数的符号
+
+23
+00:02:16,132 --> 00:02:20,523
+these partial derivative symbols and the
+dd theta don't worry about it. Technically
+和d/dθ之间的区别是什么 不用担心 从技术上讲
+
+24
+00:02:20,523 --> 00:02:24,491
+in mathematics we call this a partial
+derivative, we call this a derivative,
+在数学中 我们称这是一个偏导数 这是一个导数
+
+25
+00:02:24,491 --> 00:02:28,299
+depending on the number of, of parameters
+in the function J, but that's a
+这取决于函数J的参数数量 但是这是一个
+
+26
+00:02:28,299 --> 00:02:32,428
+mathematical technicality, so, you know
+For the purpose of this lecture, think of
+数学上的区别 就本课的目标而言 可以默认为
+
+27
+00:02:32,428 --> 00:02:36,768
+these partial symbols, and DD theta one as
+exactly the same thing. And, don't worry
+这些偏导数符号 和d/dθ1是完全一样的东西 不用担心
+
+28
+00:02:36,768 --> 00:02:41,056
+about whether there are any differences.
+I'm gonna try to use the mathematically
+是否存在任何差异 我会尽量使用数学上的
+
+29
+00:02:41,056 --> 00:02:45,190
+precise notation. But for our purposes,
+these notations are really the same thing.
+精确的符号 但就我们的目的而言 这些符号是没有区别的
+
+30
+00:02:45,360 --> 00:02:49,627
+So, let's see what this, this equation
+will do. And so we're going to compute
+好的 那么我们来看这个方程 我们要计算
+
+31
+00:02:49,627 --> 00:02:54,293
+this derivative of, I'm not sure if you've
+seen derivatives in calculus before. But
+这个导数 我不确定之前你是否在微积分中学过导数
+
+32
+00:02:54,293 --> 00:02:58,666
+what a derivative, at this point, does, is
+basically saying, you know, let's. Take
+但对于这个问题 求导的目的 基本上可以说
+
+33
+00:02:58,666 --> 00:03:02,877
+the tangent to that point, like that
+straight line, the red line, just,
+取这一点的切线 就是这样一条红色的直线
+
+34
+00:03:02,877 --> 00:03:06,976
+just touching this function and
+let's look at the slope of this red line. That's
+刚好与函数相切于这一点 让我们看看这条红色直线的斜率
+
+35
+00:03:06,976 --> 00:03:11,352
+where the derivative is. It says
+what's the slope of the line that is just
+其实这就是导数 也就是说 直线的斜率 也就是这条
+
+36
+00:03:11,352 --> 00:03:15,563
+tangent to the function, okay, and the
+slope of the line is of course is just
+刚好与函数曲线相切的这条直线 这条直线的斜率正好是
+
+37
+00:03:15,563 --> 00:03:20,789
+right, you know just the height divided by
+this horizontal thing. Now. This line has
+这个高度除以这个水平长度 现在 这条线有
+
+38
+00:03:20,789 --> 00:03:28,378
+a positive slope, so it has a positive
+derivative. And so, my update to theta is
+一个正斜率 也就是说它有正导数 因此 我得到的新的θ
+
+39
+00:03:28,378 --> 00:03:36,258
+going to be, theta one gives the update that
+theta one minus alpha times some positive
+θ1更新后等于θ1减去一个正数乘以α.
+
+40
+00:03:36,258 --> 00:03:43,103
+number. Okay? Alpha, the learning
+rate is always a positive number. And so
+α 也就是学习速率也是一个正数 所以
+
+41
+00:03:43,103 --> 00:03:47,932
+I'm gonna to take theta one, this update
+as theta one minus something. So I'm gonna
+我要使θ1减去一个东西
+
+42
+00:03:47,932 --> 00:03:52,644
+end up moving theta one to the left. I'm
+gonna decrease theta one and we can see
+所以相当于我将θ1向左移 使θ1变小了 我们可以看到
+
+43
+00:03:52,644 --> 00:03:57,473
+this is the right thing to do because I
+actually went ahead in this direction you
+这么做是对的 因为实际上我往这个方向移动
+
+44
+00:03:57,473 --> 00:04:02,582
+know to get me closer to the minimum over
+there. So, gradient descent so far seems
+确实让我更接近那边的最低点 所以 梯度下降到目前为止似乎
+
+45
+00:04:02,582 --> 00:04:08,115
+to be doing the right thing. Let's look at
+another example. So let's take my same
+是在做正确的事 让我们来看看另一个例子 让我们用同样的函数J
+
+46
+00:04:08,115 --> 00:04:13,787
+function j. Just trying to draw the same
+function j of theta one. And now let's say
+同样再画出函数J(θ1)的图像 而这次
+
+47
+00:04:13,787 --> 00:04:19,181
+I had instead initialized my parameter
+over there on the left. So theta one is
+我们把参数初始化到左边这点 所以θ1在这里
+
+48
+00:04:19,181 --> 00:04:24,161
+here. I'm gonna add that point on the
+surface. Now, my derivative term, d, d
+同样把这点对应到曲线上 现在 导数项d/dθ1J(θ1)
+
+49
+00:04:24,161 --> 00:04:29,567
+theta one j of theta one, when evaluated
+at this point, gonna look at right. The
+在这点上计算时 看上去会是这样
+
+50
+00:04:29,567 --> 00:04:35,035
+slope of that line. So this derivative
+term is a slope of this line. But this
+这条线的斜率 这个导数是这条线的斜率
+
+51
+00:04:35,035 --> 00:04:42,745
+line is slanting down, so this line has
+negative slope. Right? Or alternatively I
+但是这条线向下倾斜 所以这条线具有负斜率 对吧?
+
+52
+00:04:42,745 --> 00:04:48,718
+say that this function has negative
+derivative, just means negative slope at
+或者说 这个函数有负导数 也就意味着在那一点上有负斜率
+
+53
+00:04:54,770 --> 00:05:02,840
+theta is updated as theta minus alpha
+times a negative number. And so I have
+θ被更新为θ减去α乘以一个负数 因此我是在用
+
+54
+00:05:02,840 --> 00:05:07,881
+theta one minus a negative number which
+means I'm actually going to increase theta,
+θ1减去一个负数 这意味着我实际上是在增加θ1
+
+55
+00:05:07,881 --> 00:05:13,106
+right? Because this is minus of a negative
+number means I'm adding something to theta
+对不对?因为这是减去一个负数 意味着给θ加上一个数
+
+56
+00:05:13,106 --> 00:05:17,900
+and what that means is that I'm going to
+end up increasing theta. And so we'll
+这就意味着最后我实际上增加了θ的值 因此 我们将
+
+57
+00:05:17,900 --> 00:05:23,002
+start here and increase theta, which again
+seems like the thing I want to do to try
+从这里开始 增加θ 似乎这也是我希望得到的 也就是
+
+58
+00:05:23,002 --> 00:05:28,335
+to get me closer to the minimum. So, this
+hopefully explains the intuition behind
+让我更接近最小值了 所以 我希望这样很直观地给你解释了
+
+59
+00:05:28,335 --> 00:05:33,874
+what the derivative term is doing. Let's
+next take a look at the learning rate term
+导数项的意义 让我们接下来再看一看学习速率α
+
+60
+00:05:33,874 --> 00:05:39,956
+alpha, and try to figure out what that's
+doing. So, here's my gradient descent
+我们来研究一下它有什么用 这就是我梯度下降法的
+
+61
+00:05:39,956 --> 00:05:46,641
+update rule. Right, there's this equation
+And let's look at what can happen, if
+更新规则 就是这个等式 让我们来看看如果α 太小或 α 太大
+
+62
+00:05:46,641 --> 00:05:52,845
+Alpha is either too small, or if Alpha is
+too large. So this first example, what
+会出现什么情况 这第一个例子
+
+63
+00:05:52,845 --> 00:05:59,583
+happens if Alpha is too small. So here's
+my function J. J of theta. Lets
+α太小会发生什么呢 这是我的函数J(θ)
+
+64
+00:05:59,583 --> 00:06:04,230
+just start here. If alpha is too small
+then what I'm going to do is gonna
+就从这里开始 如果α太小了 那么我要做的是要去
+
+65
+00:06:04,230 --> 00:06:09,322
+multiply the update by some small number.
+So end up taking, you know, it's like a baby step
+用一个比较小的数乘以更新的值 所以最终 它就像一个小宝宝的步伐
+
+66
+00:06:09,322 --> 00:06:13,841
+like that. Okay, so that's one step
+[inaudible]. Then from this new point
+这是一步 然后从这个新的起点开始
+
+67
+00:06:13,841 --> 00:06:18,870
+we're gonna take another step. But if
+the alpha is too small lets take another
+迈出另一步 但是由于α 太小 因此只能迈出另一个
+
+68
+00:06:18,870 --> 00:06:25,342
+little baby step. And so if And so if my
+learning rate is too small. I'm gonna end
+小碎步 所以如果我的学习速率太小 结果就是
+
+69
+00:06:25,342 --> 00:06:30,589
+up, you know, taking these tiny, tiny baby
+steps to try to get to the minimum and I'm
+只能这样像小宝宝一样一点点地挪动 去努力接近最低点
+
+70
+00:06:30,589 --> 00:06:35,837
+gonna need a lot of steps to get to the
+minimum and so. If alpha's too small, can
+这样就需要很多步才能到达最低点 所以如果α 太小的话
+
+71
+00:06:35,837 --> 00:06:41,019
+be slow because it's gonna take these
+tiny, tiny baby steps. And it's gonna need
+可能会很慢 因为它会一点点挪动 它会需要
+
+72
+00:06:41,019 --> 00:06:45,829
+a lot of steps before it gets anyway
+close to the global minimum. Now,
+很多步才能到达全局最低点
+
+73
+00:06:45,829 --> 00:06:52,236
+how about if the alpha is too large.
+So here's my function J of theta.
+那么如果α 太大又会怎样呢 这是我的函数J(θ)
+
+74
+00:06:52,236 --> 00:06:57,590
+Turns out if alpha is too large, then
+grading descent can overshoot a minimum
+如果α 太大 那么梯度下降法可能会越过最低点
+
+75
+00:06:57,590 --> 00:07:03,362
+and may even fail to converge or even diverge. So here is what I mean. Let's say [inaudible]
+ireful minimum So the derivative council
+甚至可能无法收敛 我的意思是 比如我们从这个点开始
+
+76
+00:07:03,362 --> 00:07:08,647
+It's actually close to the minimum. So the derivative points to the right, but if alpha is too big, I'm gonna
+实际上这个点已经接近最低点 因此导数指向右侧 但如果α 太大的话
+
+77
+00:07:08,686 --> 00:07:14,140
+take a huge step. Maybe I'm gonna take a huge step like that. Right? So I end up taking a huge step.
+我会迈出很大一步 也许像这样巨大的一步 对吧?所以我最终迈出了一大步
+
+78
+00:07:14,140 --> 00:07:20,051
+Now, my cost function has got worse. cause it starts off from this value then now my value has gotten worse. Now my
+现在 我的代价函数变得更糟 因为离这个最低点越来越远
+
+79
+00:07:20,051 --> 00:07:25,190
+derivatives, you know, points to the left, it's actually decrease theta. But look, if my learning rate is too big,
+现在我的导数指向左侧 实际上在减小θ 但是你看 如果我的学习速率过大
+
+80
+00:07:25,190 --> 00:07:29,792
+I may take a huge step going from here all
+the way out there so I end up. going all
+我会移动一大步 从这点一下子又到那点了
+
+81
+00:07:29,792 --> 00:07:35,372
+there. Right? And if my learning rate was too
+big I can take another huge step on the
+对吗?如果我的学习率太大 下一次迭代
+
+82
+00:07:35,372 --> 00:07:41,034
+next acceleration and kind of overshoot
+and overshoot and so on until you notice
+又移动了一大步 越过一次 又越过一次 一次次越过最低点 直到你发现
+
+83
+00:07:41,034 --> 00:07:46,765
+I'm actually getting further and further
+away from the minimum. And so if alpha is
+实际上 离最低点越来越远 所以 如果α太大
+
+84
+00:07:46,765 --> 00:07:51,905
+too large it can fail to converge or even
+diverge. Now, I have another question for
+它会导致无法收敛 甚至发散 现在 我还有一个问题
+
+85
+00:07:51,905 --> 00:07:56,057
+you. So, this is a tricky one. And when I
+was first learning this stuff, it actually
+这问题挺狡猾的 当我第一次学习这个地方时
+
+86
+00:07:56,057 --> 00:08:00,005
+took me a long time to figure this out.
+What if your pre-emptive theta one is
+我花了很长一段时间才理解这个问题 如果我们预先把θ1
+
+87
+00:08:00,005 --> 00:08:04,106
+already at a local minimum? What do you
+think one step of gradient descent will
+放在一个局部的最低点 你认为下一步梯度下降法会怎样工作?
+
+88
+00:08:04,106 --> 00:08:10,857
+do? So let's suppose you initialize theta
+one at a local minimum. So you know
+所以假设你将θ1初始化在局部最低点
+
+89
+00:08:10,857 --> 00:08:16,713
+suppose this is your initial value of theta one
+over here and it's already at a local
+假设这是你的θ1的初始值 在这儿 它已经在一个局部的
+
+90
+00:08:16,713 --> 00:08:22,718
+optimum or the local minimum. It sends
+out that at local optimum your derivative
+最优处或局部最低点 结果是局部最优点的导数
+
+91
+00:08:22,718 --> 00:08:28,796
+would be equal to zero. Since it's that
+slope where it's that tangent point so the
+将等于零 因为它是那条切线的斜率
+
+92
+00:08:28,796 --> 00:08:35,528
+slope of this line will be equal to zero
+and thus this derivative term. Is equal to
+而这条线的斜率将等于零 因此 此导数项等于0
+
+93
+00:08:35,528 --> 00:08:40,941
+zero. And so, in your gradient descent
+update, you have theta one, gives update
+因此 在你的梯度下降更新过程中 你有一个θ1
+
+94
+00:08:40,941 --> 00:08:46,284
+that theta one, minus alpha times zero.
+And so, what this means is that, if you're
+然后用θ1 减α 乘以0来更新θ1 所以这意味着什么
+
+95
+00:08:46,284 --> 00:08:51,222
+already at a local optimum, it leaves
+theta one unchanged 'cause this, you know,
+这意味着你已经在局部最优点 它使得θ1不再改变
+
+96
+00:08:51,222 --> 00:08:56,132
+gets the update's theta one equals theta one.
+So if your parameter is already at a local
+也就是新的θ1等于原来的θ1 因此 如果你的参数已经处于
+
+97
+00:08:56,132 --> 00:09:00,694
+minimum, one step of gradient descent
+does absolutely nothing. It doesn't change
+局部最低点 那么梯度下降法更新其实什么都没做 它不会改变参数的值
+
+98
+00:09:00,694 --> 00:09:05,257
+the parameter, which is, which is what you
+want. Cuz it keeps your solution at the
+这也正是你想要的 因为它使你的解始终保持在
+
+99
+00:09:05,257 --> 00:09:09,706
+local optimum. This also explains why
+gradient descent can converge the local
+局部最优点 这也解释了为什么即使学习速率α 保持不变时
+
+100
+00:09:09,706 --> 00:09:14,326
+minimum, even with the learning rate Alpha
+fixed. Here's what I mean by that. Let's
+梯度下降也可以收敛到局部最低点 我想说的是这个意思
+
+101
+00:09:14,326 --> 00:09:21,550
+look at an example. So here's a cost
+function J with theta. That maybe I want
+我们来看一个例子 这是代价函数J(θ)
+
+102
+00:09:21,550 --> 00:09:26,811
+to minimize and let's say I initialize my
+algorithm my gradient descent algorithm, you know,
+我想找到它的最小值 首先初始化我的梯度下降算法
+
+103
+00:09:26,811 --> 00:09:32,080
+out there at that magenta point. If I take
+one step of gradient descent you know,
+在那个品红色的点初始化 如果我更新一步梯度下降
+
+104
+00:09:32,080 --> 00:09:36,941
+maybe it'll take me to that point cuz my
+derivative's pretty steep out there, right?
+也许它会带我到这个点 因为这个点的导数是相当陡的
+
+105
+00:09:36,941 --> 00:09:42,051
+Now I'm at this green point and if I take
+another step of gradient descent, you
+现在 在这个绿色的点 如果我再更新一步
+
+106
+00:09:42,051 --> 00:09:47,036
+notice that my derivative, meaning the
+slope, is less steep at the green point when
+你会发现我的导数 也即斜率 是没那么陡的
+
+107
+00:09:47,036 --> 00:09:51,959
+compared to at the magenta point out
+there, right? Because as I approach the
+相比于在品红点 对吧?因为随着我接近最低点
+
+108
+00:09:51,959 --> 00:09:56,883
+minimum my derivative gets closer and
+closer to zero as I approach the minimum.
+我的导数越来越接近零
+
+109
+00:09:56,883 --> 00:10:01,794
+So, after one step of gradient descent,
+my new derivative is a little bit smaller.
+所以 梯度下降一步后 新的导数会变小一点点
+
+110
+00:10:01,794 --> 00:10:06,635
+So I wanna take another step of gradient
+descent. I will naturally take a somewhat
+然后我想再梯度下降一步 在这个绿点我自然会用一个稍微
+
+111
+00:10:06,635 --> 00:10:11,598
+smaller step from this green point than I
+did from the magenta point. Now I'm at the new
+跟刚才在那个品红点时比 再小一点的一步
+
+112
+00:10:11,598 --> 00:10:16,038
+point, the red point, and then now even
+closer to global minimums, so the
+现在到了新的点 红色点 更接近全局最低点了
+
+113
+00:10:16,038 --> 00:10:21,229
+derivative here will be even smaller than
+it was at the green point. So when I take
+因此这点的导数会比在绿点时更小 所以
+
+114
+00:10:21,229 --> 00:10:26,420
+another step of gradient descent, you know, now
+my derivative term is even smaller, and so
+我再进行一步梯度下降时 我的导数项是更小的
+
+115
+00:10:26,420 --> 00:10:31,360
+the magnitude of the update to theta
+one is even smaller, so you can take
+θ1更新的幅度就会更小
+
+116
+00:10:31,360 --> 00:10:39,145
+small step like so, and as gradient descent
+runs. You will automatically take smaller
+所以你会移动更小的一步 像这样 随着梯度下降法的运行
+
+117
+00:10:39,145 --> 00:10:46,343
+and smaller steps until eventually you are
+taking very small steps, you know, and you
+你移动的幅度会自动变得越来越小 直到最终移动幅度非常小
+
+118
+00:10:46,343 --> 00:10:52,737
+find the converge to the to the local
+minimum. So, just to recap. In gradient
+你会发现 已经收敛到局部极小值 所以回顾一下
+
+119
+00:10:52,737 --> 00:10:57,716
+descent as we approach the local minimum,
+grading descent will automatically take
+在梯度下降法中 当我们接近局部最低点时 梯度下降法会自动采取
+
+120
+00:10:57,716 --> 00:11:02,634
+smaller steps and that's because as we
+approach the local minimum, by definition,
+更小的幅度 这是因为当我们接近局部最低点时
+
+121
+00:11:02,634 --> 00:11:07,122
+local minimum is when you have this
+derivative equal to zero. So as we
+很显然在局部最低时导数等于零 所以当我们
+
+122
+00:11:07,122 --> 00:11:12,408
+approach the local minimum this derivative
+term will automatically get smaller and
+接近局部最低时 导数值会自动变得越来越小
+
+123
+00:11:12,408 --> 00:11:16,957
+so gradient descent will automatically
+take smaller step. So, this is what
+所以梯度下降将自动采取较小的幅度
+
+124
+00:11:16,957 --> 00:11:21,506
+gradient descent looks like, and so actually
+there is no need to decrease alpha
+这就是梯度下降的做法 所以实际上没有必要再另外减小α
+
+125
+00:11:21,506 --> 00:11:26,258
+overtime. So, that's the gradient descent
+algorithm, and you can use it to minimize,
+这就是梯度下降算法 你可以用它来最小化
+
+126
+00:11:26,258 --> 00:11:30,713
+to try to minimize any cost function J.
+Not the cost function J to be defined for
+最小化任何代价函数J 不只是线性回归中的代价函数J
+
+127
+00:11:30,713 --> 00:11:34,738
+linear regression. In the next video,
+we're going to take the function J, and
+在接下来的视频中 我们要用代价函数J
+
+128
+00:11:34,738 --> 00:11:38,549
+set that back to be exactly linear
+regression's cost function. The, the
+回到它的本质 线性回归中的代价函数
+
+129
+00:11:38,549 --> 00:11:43,057
+square cost function that we came up with
+earlier. And taking gradient descent, and
+也就是我们前面得出的平方误差函数 结合梯度下降法
+
+130
+00:11:43,057 --> 00:11:47,351
+the square cost function, and putting
+them together. That will give us our first
+以及平方代价函数 我们会得出第一个机器学习算法
+
+131
+00:11:47,351 --> 00:11:50,948
+learning algorithm, that'll give us our
+linear regression algorithm.
+即线性回归算法
+【果壳教育无边界字幕组】翻译:10号少年 校对:Femtoyue 审核:所罗门捷列夫
+
diff --git a/srt/2 - 7 - Gradient Descent For Linear Regression (10 min).srt b/srt/2 - 7 - Gradient Descent For Linear Regression (10 min).srt
new file mode 100644
index 00000000..d72f37d2
Binary files /dev/null and b/srt/2 - 7 - Gradient Descent For Linear Regression (10 min).srt differ
diff --git a/srt/2 - 8 - What_'s Next (6 min).srt b/srt/2 - 8 - What_'s Next (6 min).srt
new file mode 100644
index 00000000..4de9f002
--- /dev/null
+++ b/srt/2 - 8 - What_'s Next (6 min).srt
@@ -0,0 +1,439 @@
+1
+00:00:00,000 --> 00:00:04,839
+You now know about linear regression
+and gradient descent. The plan from here
+你现在已经了解了线性回归和梯度下降 接下来我想
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:04,839 --> 00:00:09,437
+on out is to tell you about a couple of
+important extensions of these ideas.
+给大家介绍这些概念的一些重要扩展
+
+3
+00:00:09,437 --> 00:00:13,668
+Concretely here they are. First it turns
+out that in order to solve this
+具体来讲是这样的 首先在解决这个
+
+4
+00:00:13,668 --> 00:00:18,468
+minimization problem, turns out there's an
+algorithm for solving for theta zero and
+最小化问题时 有一个算法可以直接解出theta 0和
+
+5
+00:00:18,468 --> 00:00:22,978
+theta one exactly without needing an
+iterative algorithm. Without needing this
+theta 1 而不必借助迭代 也就是梯度下降这一类
+
+6
+00:00:22,978 --> 00:00:27,555
+algorithm like gradient descent that we
+had to iterate, you know, multiple times
+要求多次迭代的算法
+
+7
+00:00:27,555 --> 00:00:32,285
+over. So it turns out there are advantages
+and disadvantages of this algorithm that
+这个算法可以让你直接求出Theta 0 和Theta 1
+
+8
+00:00:32,285 --> 00:00:36,897
+lets you just solve for theta zero and
+theta one, basically in just one shot. One
+因此也同时带来了优点和缺点
+
+9
+00:00:36,897 --> 00:00:41,685
+advantage is that there is no longer a
+learning rate alpha that you need to worry
+好处之一是你不再需要设定学习速率
+
+10
+00:00:41,685 --> 00:00:46,414
+about and set. And so it can be much
+faster for some problems. We'll talk about
+因此你可以更快地解决一些问题 关于它的其他优缺点
+
+11
+00:00:46,414 --> 00:00:51,051
+its advantages and disadvantages later.
+Second, we'll also talk about algorithms
+我们将在后面继续讨论的 其次 我们会讲到
+
+12
+00:00:51,051 --> 00:00:55,424
+for learning with a larger number of
+features. So, so far we've been learning
+应用于学习多个特征时所用的算法 至今为止
+
+13
+00:00:55,424 --> 00:01:00,027
+with just one feature, the size of the
+house and using that to predict the price,
+我们一直在只有一个特征的情况下学习 也就是用房子的大小来预测它的价格
+
+14
+00:01:00,027 --> 00:01:04,688
+so we're trying to take x and use that to
+predict y. But for other learning problems
+也就是我们试图用一个变量x来预测另一个变量y 但对于其他学习问题来说
+
+15
+00:01:04,688 --> 00:01:09,899
+we may have a larger number of features.
+So for example let's say that you know not
+我们可能会面对更多的特征作为变量 比方说 你不仅知道房子的大小
+
+16
+00:01:09,899 --> 00:01:15,448
+only the size, but also the number of
+bedrooms, the number of floors, and the age
+还知道卧室的数量 楼层数 和房子的年份
+
+17
+00:01:15,448 --> 00:01:20,930
+of these houses. And you want
+to use that to predict the price of the
+接着你想利用这些变量来预测房子的价格
+
+18
+00:01:20,930 --> 00:01:26,005
+houses. In that case maybe we'll call
+these features x1, x2, x3, and x4. So now
+在这种情况下 我们可以叫这些特征为x1 x2 x3 x4 所以现在我们就
+
+19
+00:01:26,005 --> 00:01:31,554
+we have, you know, four features. We want to
+use these four features to predict why the
+有了四个特征 我们希望使用这些特征来预测房价
+
+20
+00:01:31,554 --> 00:01:36,797
+price of the house. It turns out with all
+of these features, four of them in this
+事实上 在有多个这些特征的情况下 在此例中是四个
+
+21
+00:01:36,797 --> 00:01:41,858
+case, it turns out that with multiple
+features it becomes harder to
+想要描绘并可视化这些变量变得十分困难
+
+22
+00:01:41,858 --> 00:01:47,243
+plot or visualize the data. So for example
+here we try to plot this type of data
+比如这里 我们要想绘制这种类型的数据集
+
+23
+00:01:47,243 --> 00:01:52,823
+set. Maybe we will have the vertical axis
+be the price and maybe we can have one
+我们可以将垂直的轴标为房价
+
+24
+00:01:52,823 --> 00:01:58,078
+axis here, and another one here where this
+axis is the size of the house, and that
+将一条轴标为房子的大小
+
+25
+00:01:58,078 --> 00:02:02,822
+axis is the number of bedrooms. You know,
+but this is just plotting, right my first
+并将另一条轴标为卧室的数量 这只是描点而已 使用我们的前两个特征
+
+26
+00:02:02,822 --> 00:02:07,414
+two features: size and number of bedrooms.
+And when we have these additional features
+房子的大小和卧室数量 但是 当我们有了这些更多的特征时
+
+27
+00:02:07,414 --> 00:02:11,677
+I don't know, I just don't know how to
+plot all of these data, right cuz I need
+我就不知道如何绘制出所有的这些数据 因为我需要
+
+28
+00:02:11,677 --> 00:02:15,886
+like a 4-dimensional or 5-dimensional
+figure. I don't really know how to plot
+绘制一个四维或五维的图 我们也确实不知道如何绘制
+
+29
+00:02:15,886 --> 00:02:19,930
+you know something more than like a
+3-dimensional figure, like, like what I
+超过三维的图像 就像我这里的例子一样
+
+30
+00:02:19,930 --> 00:02:24,139
+have over here. Also as you can tell, the
+notation starts to get a little more
+另外你也一定发现了 我们使用的符号开始变得更加复杂
+
+31
+00:02:24,139 --> 00:02:28,238
+complicated, right. So rather than just
+having x our features we now have x1
+比起之前我们只有x一个特征 现在我们面对着从x1到x4
+
+32
+00:02:28,238 --> 00:02:33,519
+through x4. And we're using these
+subscripts to denote my four different
+总共4个特征 所以我们使用这些下标来区别这些不同的特征
+
+33
+00:02:33,519 --> 00:02:38,059
+features. It turns out the best notation
+to keep all of this straight and to
+事实上 我们有一套数学标记可以很好地
+
+34
+00:02:38,059 --> 00:02:42,828
+understand what's going on with the data
+even when we don't quite know how to plot
+对这些数据集进行标注 即使是在我们无法绘制它们的情况下
+
+35
+00:02:42,828 --> 00:02:47,425
+it. It turns out that the best notation is
+the notation of linear algebra. Linear
+也就是运用线性代数的符号
+
+36
+00:02:47,425 --> 00:02:52,194
+algebra gives us a notation and a set of
+things or a set of operations that we can
+线性代数赋予了我们一套符号系统和操作来进行
+
+37
+00:02:52,194 --> 00:02:58,234
+do with matrices and vectors. For example.
+Here's a matrix where the columns of this
+矩阵和向量的处理 举例来说 这里是一个矩阵 我们来看它的每一列
+
+38
+00:02:58,234 --> 00:03:03,377
+matrix are: The first column is the sizes
+of the four houses, the second column was
+第一列是四间房子的大小 第二列是
+
+39
+00:03:03,377 --> 00:03:08,025
+the number of bedrooms, that's the number
+of floors and that was the age of the
+卧室的数量 这里是楼层数 这里是房子的年份
+
+40
+00:03:08,025 --> 00:03:12,496
+home. And so a matrix is a block of
+numbers that lets me take all of my data,
+所以这个矩阵是一些数字的组合 其中包括了我们所有的数据 所有的x
+
+41
+00:03:12,496 --> 00:03:17,881
+all of my x's. All of my features and
+organize them efficiently into sort of one
+我们将所有的特征有效地组织起来 排入这样一整块的数字中
+
+42
+00:03:17,881 --> 00:03:23,565
+big block of numbers like that. And here
+is what we call a vector in linear algebra
+接着 这是我们在线性代数中所说的向量
+
+43
+00:03:23,565 --> 00:03:29,118
+where the four numbers here are the prices
+of the four houses that we saw on the
+这里的四个数字就是在之前课件上的四间房子的价格
+
+44
+00:03:29,118 --> 00:03:34,334
+previous slide. So. In the next set of
+videos what I'm going to do is do a quick
+所以 在接下来的一组视频中 我会对线性代数
+
+45
+00:03:34,334 --> 00:03:38,730
+review of linear algebra. If you haven't
+seen matrices and vectors before, so if
+进行一个快速的复习回顾 如果你从来没有接触过向量和矩阵
+
+46
+00:03:38,730 --> 00:03:43,293
+all of this, everything on this slide is
+brand new to you or if you've seen linear
+这张课件上所有的一切对你来说都是新知识
+
+47
+00:03:43,293 --> 00:03:47,745
+algebra before, but it's been a while so
+you aren't completely familiar with it
+或者你之前对线性代数有所了解 但由于隔得久了 对其有所遗忘
+
+48
+00:03:47,745 --> 00:03:52,419
+anymore, then please watch the next set of
+videos. And I'll quickly review the linear
+那就请学习接下来的一组视频 我会快速地回顾你将用到的线性代数知识
+
+49
+00:03:52,419 --> 00:03:57,093
+algebra you need in order to implement and
+use the more powerful versions of linear
+通过它们 你可以实现和使用更强大的线性回归模型
+
+50
+00:03:57,093 --> 00:04:01,489
+regression. It turns out linear algebra
+isn't just useful for linear regression
+事实上 线性代数不仅仅在线性回归中应用广泛
+
+51
+00:04:01,489 --> 00:04:05,972
+models but these ideas of matrices and
+vectors will be useful for helping us
+它其中的矩阵和向量将有助于帮助我们
+
+52
+00:04:05,972 --> 00:04:10,272
+to implement and actually get
+computationally efficient implementations
+实现之后更多的机器学习模型
+
+53
+00:04:10,272 --> 00:04:15,088
+for many later machines learning models as
+well. And as you can tell these sorts of
+并在计算上更有效率 正是因为这些
+
+54
+00:04:15,088 --> 00:04:19,617
+matrices and vectors will give us an
+efficient way to start to organize large
+矩阵和向量提供了一种有效的方式来组织
+
+55
+00:04:19,617 --> 00:04:23,918
+amounts of data, when we work with larger
+training sets. So, in case, in case
+大量的数据 特别是当我们处理巨大的训练集时
+
+56
+00:04:23,918 --> 00:04:28,619
+you're not familiar with linear algebra or
+in case linear algebra seems like a
+如果你不熟悉线性代数 如果你觉得线性代数看上去是一个
+
+57
+00:04:28,619 --> 00:04:33,263
+complicated, scary concept for those of you who've
+never seen it before, don't worry about
+复杂 可怕的概念 特别是对于之前从未接触过它的人
+
+58
+00:04:33,263 --> 00:04:37,793
+it. It turns out in order to implement
+machine learning algorithms we need only
+不必担心 事实上 为了实现机器学习算法 我们只需要
+
+59
+00:04:37,793 --> 00:04:42,002
+the very, very basics of
+linear algebra and you'll be able to very
+一些非常非常基础的线性代数知识 通过接下来几个视频
+
+60
+00:04:42,002 --> 00:04:46,840
+quickly pick up everything you need to
+know in the next few videos.
+你可以很快地学会所有你需要了解的线性代数知识
+
+61
+00:04:46,840 --> 00:04:53,386
+Concretely, to decide if you should
+watch the next set of videos, here are the
+具体来说 为了帮助你判断是否有需要学习接下来的一组视频
+
+62
+00:04:53,386 --> 00:04:57,804
+topics I'm going to cover. Talk about
+what are matrices and vectors. Talk about how
+这里是一些我会涵盖的主题 我会讨论什么是矩阵和向量 谈谈如何
+
+63
+00:05:00,013 --> 00:05:02,222
+to add, subtract, multiply matrices and vectors.
+Talk about the ideas of matrix inverses
+加 、减 、乘矩阵和向量 讨论逆矩阵和转置矩阵的概念
+
+64
+00:05:02,222 --> 00:05:06,696
+and transposes. And so, if you are not
+sure if you should watch the next set of
+所以如果你不确定自己是否需要学习接下来的视频
+
+65
+00:05:06,696 --> 00:05:11,393
+videos take a look at these two things. So
+if you think you know how to compute this
+你可以看看这两个式子 如果你知道该如何计算这个数值
+
+66
+00:05:11,393 --> 00:05:15,643
+quantity, this matrix transpose times
+another matrix. If you think you know, if
+一个转置矩阵乘以另一个矩阵
+
+67
+00:05:15,643 --> 00:05:20,173
+you have seen this stuff before, if you
+know how to compute the inverse of matrix
+如果你之前见过这样的式子 知道如何计算一个逆矩阵
+
+68
+00:05:20,173 --> 00:05:24,423
+times a vector, minus a number, times
+another vector. If these two things look
+乘以一个向量减去一个数乘以另一个向量 如果你十分熟悉这些概念
+
+69
+00:05:24,423 --> 00:05:29,309
+completely familiar to you then you can
+safely skip the optional set of videos on
+那么你完全可以跳过这组关于线性代数的选修视频
+
+70
+00:05:29,309 --> 00:05:34,607
+linear algebra. But if these, concepts, if you're
+slightly uncertain what these blocks of
+但是如果你对这些概念仍有些许的不确定
+
+71
+00:05:34,607 --> 00:05:39,906
+numbers or these matrices of numbers mean,
+then please take a look of the next set of
+不确定这些数字或这些矩阵的意思 那么请看一看下一组的视频
+
+72
+00:05:39,906 --> 00:05:45,142
+videos and, it'll very quickly teach you what
+you need to know about linear algebra in
+它会很快地教你一些你需要知道的线性代数的知识
+
+73
+00:05:45,142 --> 00:05:49,936
+order to program machine learning
+algorithms and deal with large amounts of data.
+便于之后编写机器学习算法和处理大量数据
+
diff --git a/srt/3 - 1 - Matrices and Vectors (9 min).srt b/srt/3 - 1 - Matrices and Vectors (9 min).srt
new file mode 100644
index 00000000..5870d239
--- /dev/null
+++ b/srt/3 - 1 - Matrices and Vectors (9 min).srt
@@ -0,0 +1,1197 @@
+1
+00:00:00,100 --> 00:00:01,850
+Let's get started with our linear algebra review.
+我们先复习一下线性代数的知识
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,880 --> 00:00:03,850
+In this video I want to
+在这段视频中
+
+3
+00:00:03,910 --> 00:00:06,210
+tell you what are matrices and what are vectors.
+我会向大家介绍矩阵和向量的概念
+
+4
+00:00:09,280 --> 00:00:10,770
+A matrix is a
+矩阵是指
+
+5
+00:00:11,020 --> 00:00:12,590
+rectangular array of numbers
+由数字组成的矩形阵列
+
+6
+00:00:13,570 --> 00:00:14,810
+written between square brackets.
+并写在方括号中间
+
+7
+00:00:16,070 --> 00:00:17,250
+So, for example, here is a
+例如
+
+8
+00:00:17,280 --> 00:00:20,180
+matrix on the right, a left square bracket.
+屏幕中所示的一个矩阵
+
+9
+00:00:22,000 --> 00:00:24,660
+And then, write in a bunch of numbers.
+先写一个左括号 然后是一些数字
+
+10
+00:00:27,020 --> 00:00:29,100
+These could be features from
+这些数字可能是
+
+11
+00:00:29,550 --> 00:00:30,660
+a learning problem or it could
+机器学习问题的特征值
+
+12
+00:00:30,800 --> 00:00:33,740
+be data from somewhere else, but
+也可能表示其他意思
+
+13
+00:00:35,080 --> 00:00:36,900
+the specific values don't matter,
+不过现在不用管具体的数字
+
+14
+00:00:37,440 --> 00:00:40,470
+and then I'm going to close it with another right bracket on the right.
+然后我用右方括号将其括起来
+
+15
+00:00:40,680 --> 00:00:41,440
+And so that's one matrix.
+这样就得到了一个矩阵
+
+16
+00:00:41,930 --> 00:00:43,520
+And, here's another example of
+接下来 看一下其他矩阵的例子
+
+17
+00:00:44,290 --> 00:00:46,360
+the matrix, let's write 3, 4, 5,6.
+依次写下1 2 3 4 5 6
+
+18
+00:00:46,810 --> 00:00:48,020
+So matrix is just another
+因此实际上矩阵
+
+19
+00:00:48,300 --> 00:00:49,630
+way for saying, is a
+可以说是二维数组的
+
+20
+00:00:49,690 --> 00:00:51,540
+2D or a two dimensional array.
+另一个名字
+
+21
+00:00:53,260 --> 00:00:54,920
+And the other piece
+另外
+
+22
+00:00:55,260 --> 00:00:56,320
+of knowledge that we need is
+我们还需要知道的是
+
+23
+00:00:56,650 --> 00:00:57,740
+that the dimension of the
+矩阵的
+
+24
+00:00:57,810 --> 00:00:58,980
+matrix is going to
+维度
+
+25
+00:00:59,110 --> 00:01:01,070
+be written as the
+应该写作
+
+26
+00:01:01,170 --> 00:01:04,750
+number of row times the number of columns in the matrix.
+矩阵的行数乘以列数
+
+27
+00:01:05,480 --> 00:01:07,190
+So, concretely, this example
+具体到这个例子
+
+28
+00:01:07,830 --> 00:01:09,700
+on the left, this
+看左边
+
+29
+00:01:09,900 --> 00:01:11,000
+has 1, 2, 3, 4
+包括1 2 3 4共4行
+
+30
+00:01:11,290 --> 00:01:13,370
+rows and has 2 columns,
+以及2列
+
+31
+00:01:14,540 --> 00:01:15,950
+and so this example on the
+因此 这个例子是一个
+
+32
+00:01:16,110 --> 00:01:17,850
+left is a 4 by
+4 × 2的矩阵
+
+33
+00:01:18,640 --> 00:01:23,320
+2 matrix - number of rows by number of columns.
+即行数乘以列数
+
+34
+00:01:23,600 --> 00:01:24,380
+So, four rows, two columns.
+4行乘2列
+
+35
+00:01:25,290 --> 00:01:27,740
+This one on the right, this matrix has two rows.
+右边的矩阵有两行
+
+36
+00:01:28,330 --> 00:01:29,790
+That's the first row, that's
+这是第一行
+
+37
+00:01:30,040 --> 00:01:32,580
+the second row, and it has three columns.
+这是第二行 此外包括三列
+
+38
+00:01:35,430 --> 00:01:36,890
+That's the first column, that's the
+这是第一列
+
+39
+00:01:37,070 --> 00:01:38,350
+second column, that's the third
+第二列 第三列
+
+40
+00:01:38,610 --> 00:01:41,340
+column So, this second
+因此 我们把
+
+41
+00:01:41,670 --> 00:01:42,800
+matrix we say it is
+这个矩阵称为一个
+
+42
+00:01:42,970 --> 00:01:44,660
+a 2 by 3 matrix.
+2 × 3维的矩阵
+
+43
+00:01:45,700 --> 00:01:48,230
+So we say that the dimension of this matrix is 2 by 3.
+所以我们说这个矩阵的维度是2 × 3维
+
+44
+00:01:50,460 --> 00:01:51,690
+Sometimes you also see this
+有时候大家会发现
+
+45
+00:01:51,850 --> 00:01:53,480
+written out, in the case
+书写有些不同
+
+46
+00:01:53,740 --> 00:01:54,510
+of left, you will see this
+比如左边的矩阵
+
+47
+00:01:55,000 --> 00:01:56,360
+written out as R4 by 2
+写成了R4 × 2
+
+48
+00:01:56,460 --> 00:01:58,090
+or concretely what people
+具体而言
+
+49
+00:01:58,470 --> 00:02:00,280
+will sometimes say this matrix
+大家会将该矩阵称作
+
+50
+00:02:00,930 --> 00:02:02,840
+is an element of the set R 4 by 2.
+是集合R4×2的元素
+
+51
+00:02:03,060 --> 00:02:04,270
+So, this thing here, this
+因此
+
+52
+00:02:04,410 --> 00:02:05,180
+just means the set of all
+也就是说
+
+53
+00:02:05,790 --> 00:02:07,020
+matrices that of dimension
+这个矩阵
+
+54
+00:02:07,520 --> 00:02:08,960
+4 by 2 and this thing
+R4×2代表所有4×2的矩阵的集合
+
+55
+00:02:09,100 --> 00:02:10,650
+on the right, sometimes this is
+而右边的这个矩阵
+
+56
+00:02:10,880 --> 00:02:12,800
+written out as a matrix that is an R 2 by 3.
+有时候也写作一个R2×3的矩阵
+
+57
+00:02:13,130 --> 00:02:16,080
+So if you ever see, 2 by 3.
+因此 如果你看到2×3
+
+58
+00:02:16,560 --> 00:02:17,460
+So if you ever see
+如果你看到
+
+59
+00:02:17,570 --> 00:02:18,700
+something like this are 4 by
+有些地方表达为
+
+60
+00:02:18,880 --> 00:02:19,960
+2 or are 2 by 3,
+4×2的或者2×3的
+
+61
+00:02:20,320 --> 00:02:21,450
+people are just referring to
+一般都是指
+
+62
+00:02:21,900 --> 00:02:23,830
+matrices of a specific dimension.
+一个特定维度的矩阵
+
+63
+00:02:26,760 --> 00:02:28,240
+Next, let's talk about how
+接下来 让我们来谈谈如何
+
+64
+00:02:28,590 --> 00:02:31,370
+to refer to specific elements of the matrix.
+表达矩阵的某个特定元素
+
+65
+00:02:31,980 --> 00:02:32,850
+And by matrix elements, other than
+这里我说矩阵元素
+
+66
+00:02:33,020 --> 00:02:34,090
+the matrix I just mean
+而不是矩阵 我的意思是
+
+67
+00:02:34,360 --> 00:02:35,930
+the entries, so the numbers inside the matrix.
+矩阵的条目数 也就是矩阵内部的某个数
+
+68
+00:02:37,200 --> 00:02:38,270
+So, in the standard notation,
+所以 标准的表达是
+
+69
+00:02:38,890 --> 00:02:40,110
+if A is this
+如果A是
+
+70
+00:02:40,290 --> 00:02:41,860
+matrix here, then A sub-strip
+这个矩阵 那么A下标 ij
+
+71
+00:02:42,830 --> 00:02:44,050
+IJ is going to refer
+表示的是
+
+72
+00:02:44,420 --> 00:02:46,060
+to the i, j entry,
+i j对应的那个数字
+
+73
+00:02:46,950 --> 00:02:48,490
+meaning the entry in
+意思是矩阵的第i行和第j列
+
+74
+00:02:48,570 --> 00:02:50,690
+the matrix in the ith row and jth column.
+对应的那个数
+
+75
+00:02:51,880 --> 00:02:54,200
+So for example a1-1 is
+因此 例如 A11
+
+76
+00:02:54,530 --> 00:02:55,660
+going to refer to the entry
+表示的是第1行
+
+77
+00:02:56,220 --> 00:02:57,510
+in the 1st row and
+第1列所对应的那个元素
+
+78
+00:02:57,600 --> 00:02:58,900
+the 1st column, so that's the
+所以这是
+
+79
+00:02:58,960 --> 00:02:59,720
+first row and the first
+第一行和第一列
+
+80
+00:03:00,090 --> 00:03:02,600
+column and so a1-1
+因此A11
+
+81
+00:03:02,640 --> 00:03:03,920
+is going to be equal to
+就等于
+
+82
+00:03:04,240 --> 00:03:05,880
+1, 4, 0, 2.
+
+83
+00:03:06,420 --> 00:03:07,620
+Another example, 8 1
+另一个例子 A12
+
+84
+00:03:07,780 --> 00:03:10,020
+2 is going to refer to
+表示的是
+
+85
+00:03:10,160 --> 00:03:11,160
+the entry in the first
+第一行第二列
+
+86
+00:03:11,660 --> 00:03:13,860
+row and the second
+对应的那个数
+
+87
+00:03:14,290 --> 00:03:16,170
+column and so A
+所以A12
+
+88
+00:03:16,270 --> 00:03:19,000
+1 2 is going to be equal to one nine one.
+将等于191
+
+89
+00:03:20,430 --> 00:03:21,190
+This come from a quick examples.
+再看一个简单的例子
+
+90
+00:03:22,430 --> 00:03:24,360
+Let's see, A, oh let's
+让我们来看看
+
+91
+00:03:24,530 --> 00:03:26,970
+say A 3 2, is going to refer
+比如说A32 表达的是
+
+92
+00:03:27,350 --> 00:03:29,240
+to the entry in the 3rd
+第3行第2列
+
+93
+00:03:30,040 --> 00:03:32,340
+row, and second column,
+对应的那个数
+
+94
+00:03:33,750 --> 00:03:35,030
+right, because that's 3 2
+是吧 因为这是3 2
+
+95
+00:03:35,470 --> 00:03:41,270
+so that's equal to 1 4 3 7.
+所以这等于1437
+
+96
+00:03:41,490 --> 00:03:42,480
+And finally, 8 4 1
+最后 A41
+
+97
+00:03:43,370 --> 00:03:44,540
+is going to refer to
+应该等于
+
+98
+00:03:45,320 --> 00:03:47,320
+this one right, fourth row,
+第四行第一列
+
+99
+00:03:47,710 --> 00:03:49,220
+first column is equal to
+对应的数
+
+100
+00:03:49,520 --> 00:03:53,150
+1 4 7 and if,
+所以是等于 147
+
+101
+00:03:53,770 --> 00:03:54,600
+hopefully you won't, but if
+我希望你不会犯下面的错误
+
+102
+00:03:54,660 --> 00:03:55,560
+you were to write and say
+但如果你这么写的话
+
+103
+00:03:55,660 --> 00:03:57,540
+well this A 4
+如果你写出了A43
+
+104
+00:03:57,870 --> 00:03:59,200
+3, well, that refers to
+这应该表示的是
+
+105
+00:03:59,610 --> 00:04:01,130
+the fourth row, and the
+第四行第三列
+
+106
+00:04:01,230 --> 00:04:02,730
+third column that, you know,
+而你知道
+
+107
+00:04:02,850 --> 00:04:03,940
+this matrix has no third
+这个矩阵没有第三列
+
+108
+00:04:04,190 --> 00:04:05,420
+column so this is undefined,
+因此这是未定义的
+
+109
+00:04:06,640 --> 00:04:08,280
+you know, or you can think of this as an error.
+或者你可以认为这是一个错误
+
+110
+00:04:08,830 --> 00:04:10,720
+There's no such element as
+根本就没有什么A43
+
+111
+00:04:10,840 --> 00:04:12,540
+8 4 3, so, you know, you
+对应的元素 所以
+
+112
+00:04:12,950 --> 00:04:14,500
+shouldn't be referring to 8 4 3.
+你不能写A43
+
+113
+00:04:14,620 --> 00:04:17,120
+So, the matrix
+因此 矩阵
+
+114
+00:04:17,640 --> 00:04:19,070
+gets you a way of letting
+提供了一种很好的方式
+
+115
+00:04:19,380 --> 00:04:22,280
+you quickly organize, index and access lots of data.
+让你快速整理 索引和访问大量数据
+
+116
+00:04:22,670 --> 00:04:24,200
+In case I seem to be
+可能你觉得 我似乎是
+
+117
+00:04:24,320 --> 00:04:25,140
+tossing up a lot of
+介绍了很多概念
+
+118
+00:04:25,440 --> 00:04:26,110
+concepts, a lot of new notations
+很多新的符号
+
+119
+00:04:26,570 --> 00:04:27,920
+very rapidly, you don't need
+我讲得很快 你不需要
+
+120
+00:04:28,140 --> 00:04:29,230
+to memorize all of this, but
+把这些都记住
+
+121
+00:04:29,530 --> 00:04:31,500
+on the course website where we
+但在课程网站上
+
+122
+00:04:31,700 --> 00:04:33,340
+have posted the lecture notes,
+我们已经发布了讲义
+
+123
+00:04:33,700 --> 00:04:35,960
+we also have all of these definitions written down.
+所有这些定义都写在讲义里
+
+124
+00:04:36,650 --> 00:04:37,740
+So you can always refer back,
+所以 你可以随时参考
+
+125
+00:04:38,160 --> 00:04:39,200
+you know, either to these slides,
+包括这些幻灯片
+
+126
+00:04:39,560 --> 00:04:40,950
+possible coursework, so audible lecture
+你可以随时回来观看视频
+
+127
+00:04:41,260 --> 00:04:44,060
+notes if you forget well, A41 was that?
+如果你忘了A41到底是表示什么?
+
+128
+00:04:44,290 --> 00:04:45,320
+Which row, which column was that?
+哪一行 哪一列是什么?
+
+129
+00:04:45,650 --> 00:04:47,160
+Don't worry about memorizing everything now.
+所以现在不要担心记忆问题
+
+130
+00:04:47,470 --> 00:04:48,960
+You can always refer back to
+你可以随时回来参考
+
+131
+00:04:49,100 --> 00:04:51,590
+the written materials on the course website, and use that as a reference.
+课程网站上的材料
+
+132
+00:04:52,500 --> 00:04:53,740
+So that's what a matrix is.
+所以 这就是矩阵的定义
+
+133
+00:04:54,160 --> 00:04:57,000
+Next, let's talk about what is a vector.
+接下来 让我们来谈谈什么是向量
+
+134
+00:04:57,300 --> 00:04:59,400
+A vector turns out to be a special case of a matrix.
+一个向量是一种特殊的矩阵
+
+135
+00:04:59,890 --> 00:05:01,170
+A vector is a matrix
+向量是只有一列的
+
+136
+00:05:02,070 --> 00:05:03,590
+that has only 1 column so
+矩阵 所以
+
+137
+00:05:03,740 --> 00:05:04,650
+you have an N x 1
+你有一个 n×1
+
+138
+00:05:04,850 --> 00:05:07,330
+matrix, then that's a remember, right?
+矩阵 还记得吗
+
+139
+00:05:07,820 --> 00:05:08,970
+N is the number of
+N是行数
+
+140
+00:05:09,190 --> 00:05:10,750
+rows, and 1 here
+而这里的1
+
+141
+00:05:10,870 --> 00:05:12,540
+is the number of columns, so, so
+表示的是列数 所以
+
+142
+00:05:12,710 --> 00:05:13,760
+matrix with just one column
+只有一列的矩阵
+
+143
+00:05:14,720 --> 00:05:15,730
+is what we call a vector.
+就是我们所说的向量
+
+144
+00:05:16,700 --> 00:05:17,950
+So here's an example
+因此 这里是一个向量的
+
+145
+00:05:18,310 --> 00:05:19,800
+of a vector, with I
+例子 比如说
+
+146
+00:05:20,120 --> 00:05:22,700
+guess I have N equals four elements here.
+我有 n = 4 个元素
+
+147
+00:05:23,860 --> 00:05:25,090
+so we also call this
+所以我们也把这个称为
+
+148
+00:05:25,370 --> 00:05:26,560
+thing, another term for
+另一个术语是
+
+149
+00:05:26,660 --> 00:05:28,300
+this is a four dmensional
+这是一个四维的
+
+150
+00:05:30,130 --> 00:05:31,410
+vector, just means that
+向量 也就意味着
+
+151
+00:05:32,880 --> 00:05:34,410
+this is a vector with four
+这是一个含有
+
+152
+00:05:34,800 --> 00:05:36,400
+elements, with four numbers in it.
+4个元素的向量
+
+153
+00:05:36,870 --> 00:05:38,130
+And, just as earlier
+而且 前面我们讲
+
+154
+00:05:38,510 --> 00:05:39,520
+for matrices you saw this
+矩阵的时候提到过
+
+155
+00:05:39,740 --> 00:05:40,960
+notation R3 by 2
+这个符号R3×2
+
+156
+00:05:41,120 --> 00:05:42,340
+to refer to 2 by
+表示的是一个3行2列的矩阵
+
+157
+00:05:42,340 --> 00:05:43,770
+3 matrices, for this vector
+而对于这个向量
+
+158
+00:05:44,660 --> 00:05:46,340
+we are going to refer to this
+我们也同样可以
+
+159
+00:05:46,500 --> 00:05:48,270
+as a vector in the set R4.
+表示为集合R4
+
+160
+00:05:49,640 --> 00:05:50,900
+So this R4 means a
+因此 这个R4是指
+
+161
+00:05:51,020 --> 00:05:53,480
+set of four-dimensional vectors.
+一个四维向量的集合
+
+162
+00:05:56,350 --> 00:05:59,230
+Next let's talk about how to refer to the elements of the vector.
+接下来让我们来谈谈如何引用向量的元素
+
+163
+00:06:01,790 --> 00:06:02,970
+We are going to use the notation
+我们将使用符号
+
+164
+00:06:03,730 --> 00:06:06,030
+yi to refer to
+yi来代表
+
+165
+00:06:06,310 --> 00:06:07,620
+the ith element of the
+向量y的第i个元素
+
+166
+00:06:07,690 --> 00:06:08,650
+vector y. So if y
+所以 如果这个向量是y
+
+167
+00:06:08,810 --> 00:06:11,470
+is this vector, y subscript i is the ith element.
+那么y下标i 则表示它的第i个元素
+
+168
+00:06:12,050 --> 00:06:13,080
+So y1 is the
+所以y1表示第一个元素
+
+169
+00:06:13,450 --> 00:06:16,320
+first element,four sixty, y2
+
+170
+00:06:16,540 --> 00:06:18,670
+is equal to the second element,
+y2表示第二个元素
+
+171
+00:06:19,690 --> 00:06:21,030
+two thirty two -there's the first.
+
+172
+00:06:21,380 --> 00:06:21,780
+There's the second.
+这是第二个元素
+
+173
+00:06:22,570 --> 00:06:24,840
+Y3 is equal to
+还有y3等于
+
+174
+00:06:24,970 --> 00:06:26,380
+315 and so on, and
+315 等等
+
+175
+00:06:26,760 --> 00:06:28,240
+only y1 through y4 are
+只有y1至y4是有意义的
+
+176
+00:06:28,650 --> 00:06:31,600
+defined consistency 4-dimensional vector.
+因为这定义的是一个四维向量
+
+177
+00:06:32,940 --> 00:06:33,990
+Also it turns out that
+此外 事实上
+
+178
+00:06:34,560 --> 00:06:35,950
+there are actually 2 conventions
+有两种方法来表达
+
+179
+00:06:36,320 --> 00:06:37,590
+for how to index into a
+某个向量中某个索引
+
+180
+00:06:37,730 --> 00:06:39,250
+vector and here they are.
+是这两种
+
+181
+00:06:39,560 --> 00:06:41,020
+Sometimes, people will use
+有时候 人们会使用
+
+182
+00:06:41,630 --> 00:06:43,820
+one index and sometimes zero index factors.
+1-索引 有时候用0-索引
+
+183
+00:06:44,770 --> 00:06:45,620
+So this example on the left
+因此 左边这个例子
+
+184
+00:06:46,090 --> 00:06:47,980
+is a one in that
+是一个1-索引向量
+
+185
+00:06:48,180 --> 00:06:49,240
+specter where the element
+它的元素写作
+
+186
+00:06:49,650 --> 00:06:51,870
+we write is y1, y2, y3, y4.
+y1 y2 y3 y4
+
+187
+00:06:53,540 --> 00:06:54,710
+And this example in the right
+而右边这个向量
+
+188
+00:06:54,870 --> 00:06:56,340
+is an example of a zero index
+是0-索引的一个例子
+
+189
+00:06:56,840 --> 00:06:58,380
+factor where we start
+我们的索引
+
+190
+00:06:58,730 --> 00:07:00,460
+the indexing of the elements from zero.
+从下标0开始
+
+191
+00:07:01,520 --> 00:07:04,620
+So the elements go from a zero up to y three.
+因此 元素从y0至y3
+
+192
+00:07:05,450 --> 00:07:07,170
+And this is a bit like the
+这有点像
+
+193
+00:07:07,380 --> 00:07:08,780
+arrays of some primary languages
+一些初级语言中的数组
+
+194
+00:07:09,940 --> 00:07:11,080
+where the arrays can either
+数组是从1开始
+
+195
+00:07:11,440 --> 00:07:12,740
+be indexed starting from one.
+排序的
+
+196
+00:07:13,140 --> 00:07:14,390
+The first element of an
+数组的第一个元素
+
+197
+00:07:14,510 --> 00:07:15,590
+array is sometimes a Y1,
+一般时从y1开始
+
+198
+00:07:16,160 --> 00:07:17,480
+this is sequence notation I guess,
+这是表示序列的符号
+
+199
+00:07:17,940 --> 00:07:20,580
+and sometimes it's zero index
+有时是从0开始排序
+
+200
+00:07:21,260 --> 00:07:22,860
+depending on what programming language you use.
+这取决于你用??什么编程语言
+
+201
+00:07:23,640 --> 00:07:25,000
+So it turns out that in
+所以 事实上
+
+202
+00:07:25,190 --> 00:07:26,680
+most of math, the one
+在数学中
+
+203
+00:07:27,120 --> 00:07:28,390
+index version is more
+1-索引的情况比较多
+
+204
+00:07:28,570 --> 00:07:30,150
+common For a lot
+而对于很多
+
+205
+00:07:30,380 --> 00:07:32,640
+of machine learning applications, zero index
+机器学习的应用问题来说
+
+206
+00:07:33,680 --> 00:07:35,400
+vectors gives us a more convenient notation.
+0-索引向量为我们提供了一个更方便的符号表达
+
+207
+00:07:36,810 --> 00:07:37,650
+So what you should usually
+所以你通常应该
+
+208
+00:07:37,970 --> 00:07:39,580
+do is, unless otherwised specified,
+做的是 除非特别指定
+
+209
+00:07:40,630 --> 00:07:43,070
+you should assume we are using one index vectors.
+你应该默认我们使用的是1-索引法表示向量
+
+210
+00:07:43,680 --> 00:07:44,750
+In fact, throughout the rest
+在本课程的后面所有
+
+211
+00:07:44,890 --> 00:07:46,380
+of these videos on linear algebra
+关于线性代数的视频中
+
+212
+00:07:46,770 --> 00:07:49,190
+review, I will be using one index vectors.
+我都将使用1-索引法表示向量
+
+213
+00:07:50,210 --> 00:07:51,170
+But just be aware that
+但你要明白
+
+214
+00:07:51,280 --> 00:07:52,150
+when we are talking about machine learning
+当我们谈论到机器学习的
+
+215
+00:07:52,390 --> 00:07:53,980
+applications, sometimes I will
+应用问题时
+
+216
+00:07:54,220 --> 00:07:55,340
+explicitly say when we
+如果我们需要使用0-索引向量的话
+
+217
+00:07:55,480 --> 00:07:56,640
+need to switch to, when we
+我会明确地告诉你
+
+218
+00:07:56,740 --> 00:07:57,760
+need to use the zero index
+我们什么时候换成
+
+219
+00:07:59,020 --> 00:07:59,280
+vectors as well.
+使用0-索引表达
+
+220
+00:08:00,240 --> 00:08:02,470
+Finally, by convention, usually
+最后 按照惯例
+
+221
+00:08:02,940 --> 00:08:04,470
+when writing matrices and vectors,
+通常在书写矩阵和向量时
+
+222
+00:08:05,060 --> 00:08:06,710
+most people will use upper
+大多数人会使用大写字母
+
+223
+00:08:06,900 --> 00:08:08,450
+case to refer to matrices.
+来表示矩阵
+
+224
+00:08:09,000 --> 00:08:09,750
+So we're going to use
+因此 我们要使用
+
+225
+00:08:09,930 --> 00:08:12,030
+capital letters like
+大写字母 如
+
+226
+00:08:12,260 --> 00:08:13,840
+A, B, C, you know,
+A B C
+
+227
+00:08:14,100 --> 00:08:15,370
+X, to refer to matrices,
+X 来表示矩阵
+
+228
+00:08:16,630 --> 00:08:17,910
+and usually we'll use lowercase,
+而通常我们会使用小写字母
+
+229
+00:08:18,660 --> 00:08:19,630
+like a, b, x, y,
+像a b x y
+
+230
+00:08:21,140 --> 00:08:22,460
+to refer to either numbers,
+来表示数字
+
+231
+00:08:23,060 --> 00:08:25,400
+or just raw numbers or scalars or to vectors.
+或是原始的数字 或标量 或向量
+
+232
+00:08:26,150 --> 00:08:27,860
+This isn't always true but
+这是实际的使用习惯
+
+233
+00:08:28,110 --> 00:08:29,210
+this is the more common
+我们也经常看到
+
+234
+00:08:29,460 --> 00:08:30,610
+notation where we use
+使用小写字母y
+
+235
+00:08:30,940 --> 00:08:31,870
+lower case "Y" for referring
+来表示向量
+
+236
+00:08:32,020 --> 00:08:33,360
+to vector and we usually
+但我们平时
+
+237
+00:08:34,150 --> 00:08:35,460
+use upper case to refer to a matrix.
+是用大写字母来表示矩阵
+
+238
+00:08:37,200 --> 00:08:39,820
+So, you now know what are matrices and vectors.
+所以 你现在知道了什么是矩阵和向量
+
+239
+00:08:40,800 --> 00:08:42,310
+Next, we'll talk about some
+接下来 我们将继续讨论关于它们一些内容
+
+240
+00:08:42,500 --> 00:08:44,330
+of the things you can do with them
+
diff --git a/srt/3 - 2 - Addition and Scalar Multiplication (7 min).srt b/srt/3 - 2 - Addition and Scalar Multiplication (7 min).srt
new file mode 100644
index 00000000..16f3efdf
--- /dev/null
+++ b/srt/3 - 2 - Addition and Scalar Multiplication (7 min).srt
@@ -0,0 +1,886 @@
+1
+00:00:00,250 --> 00:00:01,612
+In this video we'll talk about
+在这段视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,612 --> 00:00:03,503
+matrix addition and subtraction,
+我们将讨论矩阵的加法和减法运算
+
+3
+00:00:03,503 --> 00:00:04,950
+as well as how to
+以及如何进行
+
+4
+00:00:04,950 --> 00:00:06,582
+multiply a matrix by a
+数和矩阵的乘法
+
+5
+00:00:06,582 --> 00:00:09,292
+number, also called Scalar Multiplication.
+也就是标量乘法
+
+6
+00:00:09,292 --> 00:00:11,825
+Let's start an example.
+让我们从下面这个例子开始
+
+7
+00:00:11,825 --> 00:00:14,725
+Given two matrices like these,
+假设有这样两个矩阵
+
+8
+00:00:14,725 --> 00:00:16,735
+let's say I want to add them together.
+如果想对它们做求和运算
+
+9
+00:00:16,735 --> 00:00:18,038
+How do I do that?
+应该怎么做呢?
+
+10
+00:00:18,038 --> 00:00:20,538
+And so, what does addition of matrices mean?
+或者说 矩阵的加法到底是如何进行的?
+
+11
+00:00:20,538 --> 00:00:21,632
+It turns out that if you
+答案是
+
+12
+00:00:21,632 --> 00:00:24,312
+want to add two matrices, what
+如果你想将两个矩阵相加
+
+13
+00:00:24,312 --> 00:00:25,762
+you do is you just add
+你只需要将这两个矩阵的
+
+14
+00:00:25,762 --> 00:00:28,076
+up the elements of these matrices one at a time.
+每一个元素都逐个相加
+
+15
+00:00:28,076 --> 00:00:30,363
+So, my result of adding
+因此 两个矩阵相加
+
+16
+00:00:30,363 --> 00:00:31,480
+two matrices is going to
+所得到的结果
+
+17
+00:00:31,480 --> 00:00:33,415
+be itself another matrix and
+就是一个新的矩阵
+
+18
+00:00:33,415 --> 00:00:34,972
+the first element again just by
+它的第一个元素
+
+19
+00:00:34,972 --> 00:00:36,732
+taking one and four and
+是1和4相加的结果
+
+20
+00:00:36,732 --> 00:00:39,470
+multiplying them and adding them together, so I get five.
+因此我们得到5
+
+21
+00:00:39,470 --> 00:00:41,578
+The second element I get
+接下来是第二个元素
+
+22
+00:00:41,578 --> 00:00:43,092
+by taking two and two
+用2和2相加
+
+23
+00:00:43,092 --> 00:00:44,169
+and adding them, so I get
+因此得到4
+
+24
+00:00:44,169 --> 00:00:47,240
+four; three plus three
+然后是3加0得到3
+
+25
+00:00:47,255 --> 00:00:49,568
+plus zero is three, and so on.
+以此类推
+
+26
+00:00:49,570 --> 00:00:51,442
+I'm going to stop changing colors, I guess.
+这里我用不同颜色区别一下
+
+27
+00:00:51,442 --> 00:00:52,768
+And, on the right is open
+接下来右边这一列元素
+
+28
+00:00:52,768 --> 00:00:54,820
+five, ten and two.
+就是0.5 10和2
+
+29
+00:00:56,140 --> 00:00:57,182
+And it turns out you can
+这里大家不难发现
+
+30
+00:00:57,182 --> 00:01:00,408
+add only two matrices that are of the same dimensions.
+只有相同维度的两个矩阵才能相加
+
+31
+00:01:00,408 --> 00:01:02,789
+So this example is
+对于这个例子而言
+
+32
+00:01:02,789 --> 00:01:05,595
+a three by two matrix,
+这是一个3行2列的矩阵
+
+33
+00:01:07,120 --> 00:01:09,029
+because this has 3
+也就是说矩阵的行数为3
+
+34
+00:01:09,029 --> 00:01:11,917
+rows and 2 columns, so it's 3 by 2.
+列数是2 因此是3行2列
+
+35
+00:01:11,917 --> 00:01:13,451
+This is also a 3
+第二个矩阵
+
+36
+00:01:13,451 --> 00:01:15,113
+by 2 matrix, and the
+也是一个3行2列的矩阵
+
+37
+00:01:15,113 --> 00:01:16,202
+result of adding these two
+因此这两个矩阵相加的结果
+
+38
+00:01:16,202 --> 00:01:19,415
+matrices is a 3 by 2 matrix again.
+也是一个3行2列的矩阵
+
+39
+00:01:19,415 --> 00:01:20,468
+So you can only add
+所以你只能将相同维度的矩阵
+
+40
+00:01:20,470 --> 00:01:21,837
+matrices of the same
+进行相加运算
+
+41
+00:01:21,837 --> 00:01:23,533
+dimension, and the result
+同时 所得到的结果
+
+42
+00:01:23,550 --> 00:01:24,959
+will be another matrix that's of
+将会是一个新的矩阵
+
+43
+00:01:24,959 --> 00:01:28,057
+the same dimension as the ones you just added.
+这个矩阵与相加的两个矩阵维度相同
+
+44
+00:01:29,180 --> 00:01:30,785
+Where as in contrast, if you
+反过来
+
+45
+00:01:30,785 --> 00:01:31,803
+were to take these two matrices, so this
+如果你想将这样两个矩阵相加
+
+46
+00:01:31,803 --> 00:01:32,894
+one is a 3 by
+这是一个3行2列的矩阵
+
+47
+00:01:32,894 --> 00:01:36,208
+2 matrix, okay, 3 rows, 2 columns.
+行数为3 列数为2
+
+48
+00:01:36,230 --> 00:01:38,659
+This here is a 2 by 2 matrix.
+而这一个是2行2列的矩阵
+
+49
+00:01:39,190 --> 00:01:41,190
+And because these two matrices
+那么由于这两个矩阵
+
+50
+00:01:41,200 --> 00:01:42,837
+are not of the same dimension,
+维度是不相同的
+
+51
+00:01:43,160 --> 00:01:44,635
+you know, this is an error,
+这就出现错误了
+
+52
+00:01:44,635 --> 00:01:46,400
+so you cannot add these
+所以我们不能将它们相加
+
+53
+00:01:46,430 --> 00:01:48,508
+two matrices and, you know,
+也就是说
+
+54
+00:01:48,508 --> 00:01:52,184
+their sum is not well-defined.
+这两个矩阵的和是没有意义的
+
+55
+00:01:52,642 --> 00:01:54,561
+So that's matrix addition.
+这就是矩阵的加法运算
+
+56
+00:01:54,561 --> 00:01:58,382
+Next, let's talk about multiplying matrices by a scalar number.
+接下来 我们讨论矩阵和标量的乘法运算
+
+57
+00:01:58,382 --> 00:02:00,069
+And the scalar is just a,
+这里所说的标量
+
+58
+00:02:00,069 --> 00:02:02,028
+maybe a overly fancy term for,
+可能是一个复杂的结构
+
+59
+00:02:02,028 --> 00:02:04,342
+you know, a number or a real number.
+或者只是一个简单的数字 或者说实数
+
+60
+00:02:04,760 --> 00:02:07,075
+Alright, this means real number.
+标量在这里指的就是实数
+
+61
+00:02:07,076 --> 00:02:10,280
+So let's take the number 3 and multiply it by this matrix.
+如果我们用数字3来和这个矩阵相乘
+
+62
+00:02:10,280 --> 00:02:13,182
+And if you do that, the result is pretty much what you'll expect.
+那么结果是显而易见的
+
+63
+00:02:13,182 --> 00:02:14,926
+You just take your elements
+你只需要将矩阵中的所有元素
+
+64
+00:02:14,926 --> 00:02:16,184
+of the matrix and multiply
+都和3相乘
+
+65
+00:02:16,184 --> 00:02:18,114
+them by 3, one at a time.
+每一个都逐一与3相乘
+
+66
+00:02:18,114 --> 00:02:19,428
+So, you know, one
+因此 1和3相乘
+
+67
+00:02:19,428 --> 00:02:21,708
+times three is three.
+结果是3
+
+68
+00:02:21,708 --> 00:02:24,011
+What, two times three is
+2和3相乘
+
+69
+00:02:24,011 --> 00:02:25,988
+six, 3 times 3
+结果是6
+
+70
+00:02:25,988 --> 00:02:28,181
+is 9, and let's see, I'm
+最后3乘以3得9
+
+71
+00:02:28,181 --> 00:02:30,152
+going to stop changing colors again.
+我再换一下颜色
+
+72
+00:02:30,157 --> 00:02:31,654
+Zero times 3 is zero.
+0乘以3得0
+
+73
+00:02:31,654 --> 00:02:35,992
+Three times 5 is 15, and 3 times 1 is three.
+3乘以5得15 最后3乘以1得3
+
+74
+00:02:35,992 --> 00:02:37,849
+And so this matrix is the
+这样得到的这个矩阵
+
+75
+00:02:37,849 --> 00:02:40,702
+result of multiplying that matrix on the left by 3.
+就是左边这个矩阵和3相乘的结果
+
+76
+00:02:40,702 --> 00:02:42,173
+And you notice, again,
+我们再次注意到
+
+77
+00:02:42,173 --> 00:02:43,443
+this is a 3 by 2
+这是一个3行2列的矩阵
+
+78
+00:02:43,443 --> 00:02:44,903
+matrix and the result is
+得到的结果矩阵
+
+79
+00:02:44,903 --> 00:02:47,505
+a matrix of the same dimension.
+维度也是相同的
+
+80
+00:02:47,505 --> 00:02:48,634
+This is a 3 by
+也就是说这两个矩阵
+
+81
+00:02:48,634 --> 00:02:49,920
+2, both of these are
+都是3行2列
+
+82
+00:02:49,920 --> 00:02:52,607
+3 by 2 dimensional matrices.
+这也是3行2列
+
+83
+00:02:52,634 --> 00:02:54,334
+And by the way,
+顺便说一下
+
+84
+00:02:54,334 --> 00:02:57,050
+you can write multiplication, you know, either way.
+你也可以写成另一种方式
+
+85
+00:02:57,050 --> 00:02:59,491
+So, I have three times this matrix.
+这里是3和这个矩阵相乘
+
+86
+00:02:59,491 --> 00:03:01,468
+I could also have written this
+你也可以把这个矩阵写在前面
+
+87
+00:03:01,470 --> 00:03:05,256
+matrix and 0, 2, 5, 3, 1, right.
+1 0 2 5 3 1
+
+88
+00:03:05,256 --> 00:03:07,672
+I just copied this matrix over to the right.
+把左边这个矩阵照抄过来
+
+89
+00:03:07,672 --> 00:03:11,228
+I can also take this matrix and multiply this by three.
+我们也可以用这个矩阵乘以3
+
+90
+00:03:11,228 --> 00:03:12,040
+So whether it's you know, 3
+也就是说
+
+91
+00:03:12,060 --> 00:03:13,388
+times the matrix or the
+3乘以这个矩阵
+
+92
+00:03:13,388 --> 00:03:14,983
+matrix times three is
+和这个矩阵乘以3
+
+93
+00:03:14,983 --> 00:03:18,771
+the same thing and this thing here in the middle is the result.
+结果都是一回事 都是中间的这个矩阵
+
+94
+00:03:19,380 --> 00:03:22,869
+You can also take a matrix and divide it by a number.
+你也可以用矩阵除以一个数
+
+95
+00:03:22,869 --> 00:03:24,275
+So, turns out taking
+那么 我们可以看到
+
+96
+00:03:24,275 --> 00:03:25,716
+this matrix and dividing it by
+用这个矩阵除以4
+
+97
+00:03:25,716 --> 00:03:27,140
+four, this is actually the
+实际上就是
+
+98
+00:03:27,172 --> 00:03:29,055
+same as taking the number
+用四分之一
+
+99
+00:03:29,055 --> 00:03:32,819
+one quarter, and multiplying it by this matrix.
+来和这个矩阵相乘
+
+100
+00:03:32,819 --> 00:03:35,318
+4, 0, 6, 3 and
+4 0 6 3
+
+101
+00:03:35,318 --> 00:03:36,803
+so, you can figure
+不难发现
+
+102
+00:03:36,820 --> 00:03:38,593
+the answer, the result of
+相乘的结果是
+
+103
+00:03:38,593 --> 00:03:40,365
+this product is, one quarter
+1/4和4相乘为1
+
+104
+00:03:40,365 --> 00:03:43,274
+times four is one, one quarter times zero is zero.
+1/4和0相乘得0
+
+105
+00:03:43,282 --> 00:03:46,570
+One quarter times six is,
+1/4乘以6
+
+106
+00:03:46,590 --> 00:03:49,353
+what, three halves, about six over
+结果是3/2
+
+107
+00:03:49,353 --> 00:03:50,369
+four is three halves, and
+6/4也就是3/2
+
+108
+00:03:50,369 --> 00:03:53,862
+one quarter times three is three quarters.
+最后1/4乘以3得3/4
+
+109
+00:03:54,410 --> 00:03:55,880
+And so that's the results
+这样我们就得到了
+
+110
+00:03:55,920 --> 00:03:59,207
+of computing this matrix divided by four.
+这个矩阵除以4的结果
+
+111
+00:03:59,207 --> 00:04:01,677
+Vectors give you the result.
+结果就是是右边这个矩阵
+
+112
+00:04:01,697 --> 00:04:03,805
+Finally, for a slightly
+最后
+
+113
+00:04:03,805 --> 00:04:05,714
+more complicated example, you can
+我们来看一个稍微复杂一点的例子
+
+114
+00:04:05,714 --> 00:04:09,460
+also take these operations and combine them together.
+我们可以把所有这些运算结合起来
+
+115
+00:04:09,513 --> 00:04:11,448
+So in this calculation, I
+在这个运算中
+
+116
+00:04:11,448 --> 00:04:12,801
+have three times a vector
+需要用3来乘以这个向量
+
+117
+00:04:12,801 --> 00:04:16,370
+plus a vector minus another vector divided by three.
+然后加上一个向量 再减去另一个向量除以3的结果
+
+118
+00:04:16,370 --> 00:04:18,344
+So just make sure we know where these are, right.
+让我们先来整理一下这几项运算
+
+119
+00:04:18,344 --> 00:04:20,031
+This multiplication.
+首先第一个运算
+
+120
+00:04:20,031 --> 00:04:23,648
+This is an example of
+很明显这是标量乘法的例子
+
+121
+00:04:23,680 --> 00:04:27,986
+scalar multiplication because I am taking three and multiplying it.
+因为这里是用3来乘以一个矩阵
+
+122
+00:04:27,986 --> 00:04:30,240
+And this is, you know, another
+然后这一项
+
+123
+00:04:30,240 --> 00:04:32,067
+scalar multiplication.
+很显然这是另一个标量乘法
+
+124
+00:04:32,067 --> 00:04:34,182
+Or more like scalar division, I guess.
+或者可以叫标量除法
+
+125
+00:04:34,182 --> 00:04:36,503
+It really just means one zero times this.
+其实也就是1/3乘以这个矩阵
+
+126
+00:04:36,503 --> 00:04:39,445
+And so if we evaluate
+因此
+
+127
+00:04:39,509 --> 00:04:43,044
+these two operations first, then
+如果我们先考虑这两项运算
+
+128
+00:04:43,044 --> 00:04:44,612
+what we get is this thing
+那么我们将得到的是
+
+129
+00:04:44,612 --> 00:04:47,127
+is equal to, let's see,
+我们看一下
+
+130
+00:04:47,127 --> 00:04:49,902
+so three times that vector is three,
+3乘以这个矩阵
+
+131
+00:04:49,912 --> 00:04:53,200
+twelve, six, plus
+结果是3 12 6
+
+132
+00:04:53,200 --> 00:04:55,088
+my vector in the middle which
+然后和中间的矩阵相加
+
+133
+00:04:55,088 --> 00:04:58,552
+is a 005 minus
+也就是0 0 5
+
+134
+00:04:59,850 --> 00:05:03,733
+one, zero, two-thirds, right?
+最后再减去1 0 2/3
+
+135
+00:05:03,740 --> 00:05:05,318
+And again, just to make
+同样地 为了便于理解
+
+136
+00:05:05,318 --> 00:05:07,064
+sure we understand what is going on here,
+我们再来梳理一下这几项
+
+137
+00:05:07,064 --> 00:05:11,504
+this plus symbol, that is
+这里的这个加号
+
+138
+00:05:11,520 --> 00:05:15,690
+matrix addition, right?
+表明这是一个矩阵加法 对吧?
+
+139
+00:05:15,690 --> 00:05:16,973
+I really, since these are
+当然这里是向量
+
+140
+00:05:16,973 --> 00:05:20,204
+vectors, remember, vectors are special cases of matrices, right?
+别忘了 向量是特殊的矩阵 对吧?
+
+141
+00:05:20,204 --> 00:05:21,538
+This, you can also call
+或者你也可以称之为
+
+142
+00:05:21,538 --> 00:05:25,106
+this vector addition This
+向量加法运算
+
+143
+00:05:25,110 --> 00:05:27,148
+minus sign here, this is
+同样 这里的减号表明
+
+144
+00:05:27,160 --> 00:05:30,162
+again a matrix subtraction,
+这是一个矩阵减法运算
+
+145
+00:05:30,162 --> 00:05:32,249
+but because this is an
+但由于这是一个n行1列的矩阵
+
+146
+00:05:32,249 --> 00:05:33,432
+n by 1, really a three
+实际上是3行1列
+
+147
+00:05:33,432 --> 00:05:35,547
+by one matrix, that this
+因此这个矩阵
+
+148
+00:05:35,547 --> 00:05:36,494
+is actually a vector, so this is
+实际上是也一个向量
+
+149
+00:05:36,494 --> 00:05:39,822
+also vector, this column.
+一个列向量
+
+150
+00:05:39,850 --> 00:05:43,677
+We call this matrix a vector subtraction, as well.
+因此也可以把它称作向量的减法运算
+
+151
+00:05:43,677 --> 00:05:44,392
+OK?
+好了!
+
+152
+00:05:44,392 --> 00:05:46,073
+And finally to wrap this up.
+最后再整理一下
+
+153
+00:05:46,110 --> 00:05:48,103
+This therefore gives me a
+最终的结果依然是一个向量
+
+154
+00:05:48,118 --> 00:05:49,952
+vector, whose first element is
+向量的第一个元素
+
+155
+00:05:49,952 --> 00:05:53,632
+going to be 3+0-1,
+是3+0-1
+
+156
+00:05:53,632 --> 00:05:56,150
+so that's 3-1, which is 2.
+就是3-1 也就是2
+
+157
+00:05:56,150 --> 00:06:01,204
+The second element is 12+0-0, which is 12.
+第二个元素是12+0-0 也就是12
+
+158
+00:06:01,214 --> 00:06:03,970
+And the third element
+最后第三个元素
+
+159
+00:06:03,970 --> 00:06:07,222
+of this is, what, 6+5-(2/3),
+6+5-(2/3)
+
+160
+00:06:07,222 --> 00:06:10,678
+which is 11-(2/3), so
+也就是11-(2/3)
+
+161
+00:06:10,678 --> 00:06:14,021
+that's 10 and one-third
+结果是10又三分之一
+
+162
+00:06:14,021 --> 00:06:16,029
+and see, you close this square bracket.
+关闭右括号
+
+163
+00:06:16,029 --> 00:06:17,983
+And so this gives me a
+我们得到了最终的结果
+
+164
+00:06:17,983 --> 00:06:21,671
+3 by 1 matrix, which is
+这是一个3行1列的矩阵
+
+165
+00:06:21,671 --> 00:06:23,901
+also just called a 3
+或者也可以说是
+
+166
+00:06:23,901 --> 00:06:29,005
+dimensional vector, which
+一个维度为3的向量
+
+167
+00:06:29,030 --> 00:06:32,847
+is the outcome of this calculation over here.
+这就是这个运算式的计算结果
+
+168
+00:06:32,847 --> 00:06:34,984
+So that's how you
+所以
+
+169
+00:06:34,984 --> 00:06:36,698
+add and subtract matrices and
+你学会了矩阵或向量的加减运算
+
+170
+00:06:36,698 --> 00:06:41,488
+vectors and multiply them by scalars or by row numbers.
+以及矩阵或向量跟标量 或者说实数 的乘法运算
+
+171
+00:06:41,488 --> 00:06:42,767
+So far I have only talked
+到目前为止
+
+172
+00:06:42,767 --> 00:06:44,718
+about how to multiply matrices and
+我只介绍了如何进行
+
+173
+00:06:44,718 --> 00:06:46,994
+vectors by scalars, by row numbers.
+矩阵或向量与数的乘法运算
+
+174
+00:06:46,994 --> 00:06:48,128
+In the next video we will
+在下一讲中
+
+175
+00:06:48,128 --> 00:06:49,418
+talk about a much more
+我们将讨论一个更有趣的话题
+
+176
+00:06:49,418 --> 00:06:51,035
+interesting step, of taking
+那就是如何进行
+
+177
+00:06:51,035 --> 00:06:54,112
+2 matrices and multiplying 2 matrices together.
+两个矩阵的乘法运算
+
diff --git a/srt/3 - 3 - Matrix Vector Multiplication (14 min).srt b/srt/3 - 3 - Matrix Vector Multiplication (14 min).srt
new file mode 100644
index 00000000..ff173fa2
--- /dev/null
+++ b/srt/3 - 3 - Matrix Vector Multiplication (14 min).srt
@@ -0,0 +1,1824 @@
+1
+00:00:00,230 --> 00:00:01,364
+In this video, I'd like
+在这段视频中 我想
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,364 --> 00:00:02,699
+to start talking about how to
+讨论如何
+
+3
+00:00:02,699 --> 00:00:05,020
+multiply together two matrices.
+将两个矩阵相乘
+
+4
+00:00:05,020 --> 00:00:06,618
+We'll start with a special case
+我们将从矩阵相乘的
+
+5
+00:00:06,618 --> 00:00:08,347
+of that, of matrix vector
+特例 向量相乘开始
+
+6
+00:00:08,350 --> 00:00:12,530
+multiplication - multiplying a matrix together with a vector.
+即 一个矩阵与一个向量相乘
+
+7
+00:00:12,530 --> 00:00:13,975
+Let's start with an example.
+让我们从一个示例开始
+
+8
+00:00:13,975 --> 00:00:15,722
+Here is a matrix,
+左边是一个矩阵
+
+9
+00:00:15,722 --> 00:00:17,283
+and here is a vector, and
+右边是一个向量
+
+10
+00:00:17,283 --> 00:00:18,351
+let's say we want to
+假设我们要
+
+11
+00:00:18,351 --> 00:00:21,281
+multiply together this matrix
+将这个矩阵
+
+12
+00:00:21,281 --> 00:00:24,202
+with this vector, what's the result?
+与这个向量相乘 结果会怎样呢?
+
+13
+00:00:24,202 --> 00:00:25,209
+Let me just work through this
+我先快速计算出结果
+
+14
+00:00:25,210 --> 00:00:27,058
+example and then we
+然后我们再
+
+15
+00:00:27,058 --> 00:00:29,886
+can step back and look at just what the steps were.
+退回去 查看每一个步骤
+
+16
+00:00:29,886 --> 00:00:31,104
+It turns out the result of
+很明显 相乘的结果
+
+17
+00:00:31,104 --> 00:00:32,912
+this multiplication process is going
+将是
+
+18
+00:00:32,912 --> 00:00:34,554
+to be, itself, a vector.
+一个向量
+
+19
+00:00:34,560 --> 00:00:35,931
+And I'm just going work
+我先将这部分完成
+
+20
+00:00:35,931 --> 00:00:37,108
+with this first and later we'll
+然后再来解释
+
+21
+00:00:37,108 --> 00:00:39,650
+come back and see just what I did here.
+我刚刚是怎么做的
+
+22
+00:00:39,652 --> 00:00:41,228
+To get the first element of
+要计算出结果向量的第一个元素
+
+23
+00:00:41,228 --> 00:00:42,445
+this vector I am going
+我将会
+
+24
+00:00:42,445 --> 00:00:44,840
+to take these two numbers
+取这两个数字
+
+25
+00:00:44,849 --> 00:00:47,682
+and multiply them with
+并把他们
+
+26
+00:00:47,682 --> 00:00:49,463
+the first row of the
+与矩阵的第一行相乘
+
+27
+00:00:49,463 --> 00:00:51,884
+matrix and add up the corresponding numbers.
+然后把对应相乘的结果加起来
+
+28
+00:00:51,884 --> 00:00:54,223
+Take one multiplied by
+取1乘以1
+
+29
+00:00:54,223 --> 00:00:57,430
+one, and take
+同时取3
+
+30
+00:00:57,430 --> 00:00:58,616
+three and multiply it by
+乘以
+
+31
+00:00:58,616 --> 00:01:01,557
+five, and that's
+
+32
+00:01:01,580 --> 00:01:04,542
+what, that's one plus fifteen so that gives me sixteen.
+计算得到1和15 相加得16
+
+33
+00:01:04,542 --> 00:01:06,879
+I'm going to write sixteen here.
+我将在这儿写上16
+
+34
+00:01:06,880 --> 00:01:09,926
+then for the second row,
+要计算第二行
+
+35
+00:01:09,926 --> 00:01:12,555
+second element, I am
+的第二个元素
+
+36
+00:01:12,555 --> 00:01:14,022
+going to take the second row
+我需要将第二行
+
+37
+00:01:14,022 --> 00:01:15,255
+and multiply it by this vector,
+与这个向量相乘
+
+38
+00:01:15,255 --> 00:01:17,762
+so I have four
+所以我得到
+
+39
+00:01:17,800 --> 00:01:20,554
+times one, plus zero
+4乘以1
+
+40
+00:01:20,554 --> 00:01:21,894
+times five, which is
+加上0乘以5
+
+41
+00:01:21,894 --> 00:01:25,625
+equal to four, so you'll have four there.
+结果等于4 因此在这里写上4
+
+42
+00:01:25,625 --> 00:01:28,168
+And finally for the last
+对于最后一个元素
+
+43
+00:01:28,168 --> 00:01:30,015
+one I have two one times
+我需要计算(2, 1) 乘以 (1, 5)
+
+44
+00:01:30,015 --> 00:01:31,540
+one five, so two
+所以先计算2乘以1
+
+45
+00:01:31,540 --> 00:01:33,791
+by one, plus one
+再加上
+
+46
+00:01:33,791 --> 00:01:36,361
+by 5, which is equal
+1乘以5
+
+47
+00:01:36,361 --> 00:01:39,422
+to a 7, and
+最后结果为7
+
+48
+00:01:39,422 --> 00:01:43,145
+so I get a 7 over there.
+所以我在这儿写上7
+
+49
+00:01:43,810 --> 00:01:45,464
+It turns out that the
+事实证明
+
+50
+00:01:45,464 --> 00:01:48,102
+results of multiplying that's
+3x2的矩阵
+
+51
+00:01:48,102 --> 00:01:50,750
+a 3x2 matrix by a
+和一个2x1的矩阵
+
+52
+00:01:51,030 --> 00:01:53,498
+2x1 matrix is also
+即一个二维向量
+
+53
+00:01:53,498 --> 00:01:55,504
+just a two-dimensional vector.
+相乘的结果
+
+54
+00:01:55,504 --> 00:01:57,034
+The result of this is
+我们得到的
+
+55
+00:01:57,040 --> 00:02:01,975
+going to be a 3x1
+将是一个3x1
+
+56
+00:02:01,980 --> 00:02:03,945
+matrix, so that's why
+的矩阵
+
+57
+00:02:03,960 --> 00:02:05,737
+three by one 3x1
+这个3x1的矩阵
+
+58
+00:02:05,750 --> 00:02:07,534
+matrix, in other words
+就是这么得来的
+
+59
+00:02:07,550 --> 00:02:13,141
+a 3x1 matrix is just a three dimensional vector.
+也就是一个三维向量
+
+60
+00:02:13,170 --> 00:02:14,359
+So I realize that I
+我想
+
+61
+00:02:14,359 --> 00:02:16,072
+did that pretty quickly, and you're
+我可能计算时做得很快
+
+62
+00:02:16,072 --> 00:02:17,078
+probably not sure that you can
+你们并不一定能够
+
+63
+00:02:17,078 --> 00:02:18,530
+repeat this process yourself, but
+自己重复这个过程
+
+64
+00:02:18,530 --> 00:02:20,196
+let's look in more detail
+下面让我们更加仔细的看一下
+
+65
+00:02:20,196 --> 00:02:22,019
+at what just happened and what
+刚刚我做了些什么以及一个向量
+
+66
+00:02:22,020 --> 00:02:26,618
+this process of multiplying a matrix by a vector looks like.
+和一个矩阵相乘的计算过程是怎样的
+
+67
+00:02:26,618 --> 00:02:28,478
+Here's the details of how to
+下面详细介绍了如何
+
+68
+00:02:28,478 --> 00:02:30,532
+multiply a matrix by a vector.
+计算一个矩阵与一个向量相乘
+
+69
+00:02:30,540 --> 00:02:32,014
+Let's say I have a matrix A
+假设这是一个矩阵A
+
+70
+00:02:32,014 --> 00:02:33,355
+and want to multiply it by
+我希望将它乘以
+
+71
+00:02:33,355 --> 00:02:35,637
+a vector x. The
+一个向量x
+
+72
+00:02:35,637 --> 00:02:37,220
+result is going to be some
+结果记为
+
+73
+00:02:37,220 --> 00:02:39,569
+vector y. So the
+向量y 所以
+
+74
+00:02:39,569 --> 00:02:41,334
+matrix A is a m
+矩阵A是一个
+
+75
+00:02:41,334 --> 00:02:43,388
+by n dimensional matrix, so
+mxn维矩阵
+
+76
+00:02:43,388 --> 00:02:45,062
+m rows and n columns and
+有m行和n列
+
+77
+00:02:45,062 --> 00:02:46,570
+we are going to multiply that by a
+我们让它与一个
+
+78
+00:02:46,570 --> 00:02:49,651
+n by 1 matrix, in other words an n dimensional vector.
+nx1的矩阵相乘 换言之 一个n维向量
+
+79
+00:02:49,651 --> 00:02:51,203
+It turns out this
+明显地
+
+80
+00:02:51,203 --> 00:02:54,694
+"n" here has to match this "n" here.
+这里的两个n是相等的
+
+81
+00:02:54,694 --> 00:02:55,933
+In other words, the number of
+也就是说
+
+82
+00:02:55,933 --> 00:02:58,560
+columns in this matrix, so
+这个矩阵的列数
+
+83
+00:02:58,580 --> 00:03:01,821
+it's the number of n columns.
+有n列
+
+84
+00:03:01,821 --> 00:03:03,457
+The number of columns here has
+必须要与
+
+85
+00:03:03,457 --> 00:03:06,442
+to match the number of rows here.
+另一个相乘矩阵的行数相同
+
+86
+00:03:06,442 --> 00:03:09,274
+It has to match the dimension of this vector.
+即必须匹配这个向量的维数。
+
+87
+00:03:09,280 --> 00:03:10,645
+And the result of this product
+这样相乘的结果
+
+88
+00:03:10,645 --> 00:03:15,681
+is going to be an n-dimensional
+将会是一个n维
+
+89
+00:03:15,761 --> 00:03:19,858
+vector y. Rows here.
+向量y
+
+90
+00:03:19,858 --> 00:03:23,009
+"M" is going
+“M”将与
+
+91
+00:03:23,010 --> 00:03:24,972
+to be equal to the number
+矩阵A的行数
+
+92
+00:03:24,972 --> 00:03:28,237
+of rows in this matrix "A".
+相同
+
+93
+00:03:28,250 --> 00:03:31,082
+So how do you actually compute this vector "Y"?
+那么如何计算这个向量“Y”呢?
+
+94
+00:03:31,082 --> 00:03:32,110
+Well it turns out to compute
+事实上
+
+95
+00:03:32,110 --> 00:03:34,280
+this vector "Y", the process
+计算“Y”的过程可以分解为
+
+96
+00:03:34,280 --> 00:03:36,860
+is to get "Y""I", multiply "A's"
+计算“Y”“I”的值
+
+97
+00:03:37,200 --> 00:03:38,799
+"I'th" row with the
+让“A”的第I行元素
+
+98
+00:03:38,799 --> 00:03:40,218
+elements of the vector "X"
+分别乘以向量“X”中的元素
+
+99
+00:03:40,218 --> 00:03:41,623
+and add them up.
+并且相加
+
+100
+00:03:41,625 --> 00:03:42,464
+So here's what I mean.
+就是这样子
+
+101
+00:03:42,470 --> 00:03:45,035
+In order to get the
+为了得到
+
+102
+00:03:45,060 --> 00:03:47,847
+first element of "Y",
+“Y”的第一个元素
+
+103
+00:03:47,847 --> 00:03:49,980
+that first number--whatever that turns
+无论是多少
+
+104
+00:03:49,980 --> 00:03:51,424
+out to be--we're gonna take
+我们将会
+
+105
+00:03:51,424 --> 00:03:53,012
+the first row of the
+把矩阵“A”的
+
+106
+00:03:53,020 --> 00:03:55,486
+matrix "A" and multiply
+第一行元素
+
+107
+00:03:55,486 --> 00:03:57,680
+them one at a time
+每次同一个向量“X”的元素
+
+108
+00:03:57,680 --> 00:03:59,842
+with the elements of this vector "X".
+相乘
+
+109
+00:03:59,842 --> 00:04:01,755
+So I take this first number
+我取第一个数
+
+110
+00:04:01,760 --> 00:04:03,912
+multiply it by this first number.
+与第一个数相乘
+
+111
+00:04:03,912 --> 00:04:07,331
+Then take the second number multiply it by this second number.
+然后取第二个数同第二个数相乘
+
+112
+00:04:07,331 --> 00:04:09,264
+Take this third number whatever
+取第三个数
+
+113
+00:04:09,264 --> 00:04:10,603
+that is, multiply it the third number
+与第三个数相乘
+
+114
+00:04:10,603 --> 00:04:12,871
+and so on until you get to the end.
+直到全部乘完
+
+115
+00:04:13,320 --> 00:04:14,578
+And I'm gonna add up the
+最后 将这些相乘的结果
+
+116
+00:04:14,578 --> 00:04:16,289
+results of these products and the
+加起来
+
+117
+00:04:16,300 --> 00:04:19,918
+result of paying that out is going to give us this first element of "Y".
+这样我们就得到了“Y”的第一个元素
+
+118
+00:04:19,922 --> 00:04:21,690
+Then when we want to get
+然后我们
+
+119
+00:04:21,690 --> 00:04:25,334
+the second element of "Y", let's say this element.
+来计算“Y”的第二个元素
+
+120
+00:04:25,340 --> 00:04:26,735
+The way we do that is we
+接下来我们
+
+121
+00:04:26,735 --> 00:04:28,688
+take the second row of
+取A的第二行
+
+122
+00:04:28,688 --> 00:04:30,078
+A and we repeat the whole thing.
+然后重复整个过程
+
+123
+00:04:30,078 --> 00:04:31,265
+So we take the second row
+现在 我们取A的第二行
+
+124
+00:04:31,265 --> 00:04:32,994
+of A, and multiply it
+将它
+
+125
+00:04:32,994 --> 00:04:34,407
+elements-wise, so the elements
+与其他元素相乘
+
+126
+00:04:34,407 --> 00:04:35,814
+of X and add
+也就是X的元素
+
+127
+00:04:35,830 --> 00:04:37,460
+up the results of the products
+将结果相加
+
+128
+00:04:37,460 --> 00:04:38,402
+and that would give me the
+这样我们就得到了
+
+129
+00:04:38,402 --> 00:04:40,107
+second element of Y. And
+Y的第二个元素
+
+130
+00:04:40,107 --> 00:04:41,598
+you keep going to get and we
+依次计算下去
+
+131
+00:04:41,600 --> 00:04:42,839
+going to take the third row
+我们取A得第三行
+
+132
+00:04:42,850 --> 00:04:44,720
+of A, multiply element Ys
+把元素Ys与
+
+133
+00:04:44,720 --> 00:04:47,558
+with the vector x,
+向量x相乘
+
+134
+00:04:47,560 --> 00:04:48,682
+sum up the results and then
+将结果加起来
+
+135
+00:04:48,682 --> 00:04:50,246
+I get the third element and so
+然后得到第三个元素
+
+136
+00:04:50,260 --> 00:04:51,600
+on, until I get down
+以此类推
+
+137
+00:04:51,600 --> 00:04:55,139
+to the last row like so, okay?
+直到最后一行
+
+138
+00:04:55,676 --> 00:04:57,930
+So that's the procedure.
+所以 上述就是具体步骤
+
+139
+00:04:58,340 --> 00:05:00,685
+Let's do one more example.
+让我们再举一个例子
+
+140
+00:05:00,685 --> 00:05:05,240
+Here's the example: So let's look at the dimensions.
+在这个例子中 我们先看一下矩阵的维度
+
+141
+00:05:05,240 --> 00:05:08,428
+Here, this is a three
+左边是一个
+
+142
+00:05:08,428 --> 00:05:11,086
+by four dimensional matrix.
+3x4矩阵
+
+143
+00:05:11,086 --> 00:05:13,280
+This is a four-dimensional vector,
+右边是一个四维向量
+
+144
+00:05:13,280 --> 00:05:15,292
+or a 4 x 1 matrix, and
+也就是4x1矩阵
+
+145
+00:05:15,292 --> 00:05:16,825
+so the result of this, the
+所以这样相乘的结果
+
+146
+00:05:16,825 --> 00:05:18,210
+result of this product is going
+将是
+
+147
+00:05:18,220 --> 00:05:20,881
+to be a three-dimensional vector.
+一个三维向量
+
+148
+00:05:20,890 --> 00:05:23,169
+Write, you know, the vector,
+我们在写的时候要给这个向量
+
+149
+00:05:23,180 --> 00:05:26,531
+with room for three elements.
+留三个元素的空间
+
+150
+00:05:26,531 --> 00:05:30,256
+Let's do the, let's carry out the products.
+现在让我们一起来算一下
+
+151
+00:05:30,256 --> 00:05:32,915
+So for the first element, I'm
+首先是第一个元素
+
+152
+00:05:32,915 --> 00:05:35,068
+going to take these four numbers
+我将会取这四个数
+
+153
+00:05:35,068 --> 00:05:36,272
+and multiply them with the
+并将它们与向量X相乘
+
+154
+00:05:36,272 --> 00:05:38,873
+vector X. So I have
+所以我需要计算
+
+155
+00:05:38,873 --> 00:05:42,227
+1x1, plus 2x3,
+1x1 加上2X3
+
+156
+00:05:42,568 --> 00:05:47,301
+plus 1x2, plus 5x1, which
+加1X2 加5X1
+
+157
+00:05:47,301 --> 00:05:49,994
+is equal to - that's
+等于
+
+158
+00:05:50,050 --> 00:05:55,602
+1+6, plus 2+6, which gives me 14.
+1 +6 再加上2 +6 也就是14。
+
+159
+00:05:55,630 --> 00:05:58,156
+And then for the
+然后计算
+
+160
+00:05:58,156 --> 00:05:59,754
+second element, I'm going
+第二个元素 我要
+
+161
+00:05:59,754 --> 00:06:01,422
+to take this row now and
+取这一行
+
+162
+00:06:01,422 --> 00:06:04,604
+multiply it with this vector (0x1)+3.
+然后与向量 (0X1)+3相乘
+
+163
+00:06:04,604 --> 00:06:06,196
+All right, so
+我们将得到
+
+164
+00:06:06,243 --> 00:06:12,764
+0x1+ 3x3 plus
+0x1 + 3x3
+
+165
+00:06:12,764 --> 00:06:19,958
+0x2 plus 4x1,
+0X2 + 4X1
+
+166
+00:06:20,840 --> 00:06:22,974
+which is equal to, let's
+等于
+
+167
+00:06:22,974 --> 00:06:26,105
+see that's 9+4, which is 13.
+9 +4 也就是13。
+
+168
+00:06:26,105 --> 00:06:28,093
+And finally, for the last
+最后
+
+169
+00:06:28,093 --> 00:06:29,455
+element, I'm going to take
+对最后一个元素
+
+170
+00:06:29,455 --> 00:06:30,847
+this last row, so I
+我将取最后一行
+
+171
+00:06:30,847 --> 00:06:33,978
+have minus one times one.
+所以我得到了-1x1
+
+172
+00:06:34,110 --> 00:06:38,068
+You have minus two, or really there's a plus next to a two I guess.
+-2x3
+
+173
+00:06:38,080 --> 00:06:40,656
+Times three plus zero
+加上0x2
+
+174
+00:06:40,656 --> 00:06:42,441
+times two plus zero times
+加上0x1
+
+175
+00:06:42,441 --> 00:06:44,047
+one, and so that's
+所以
+
+176
+00:06:44,047 --> 00:06:45,496
+going to be minus one minus
+我们将得到-1和-6
+
+177
+00:06:45,496 --> 00:06:46,474
+six, which is going to make
+相加得
+
+178
+00:06:46,474 --> 00:06:49,636
+this seven, and so that's vector seven.
+
+179
+00:06:49,636 --> 00:06:50,136
+Okay?
+明白了?
+
+180
+00:06:50,136 --> 00:06:51,097
+So my final answer is this
+所以我最后的答案是
+
+181
+00:06:51,097 --> 00:06:54,033
+vector fourteen, just to
+一个向量 其中的元素为 14
+
+182
+00:06:54,033 --> 00:06:56,117
+write to that without the colors, fourteen,
+我将不给这些字涂上颜色
+
+183
+00:06:56,117 --> 00:06:59,843
+thirteen, negative seven.
+13 -7
+
+184
+00:07:01,190 --> 00:07:03,567
+And as promised, the
+如前面说的
+
+185
+00:07:03,567 --> 00:07:07,775
+result here is a three by one matrix.
+计算结果是一个3X1的矩阵
+
+186
+00:07:07,775 --> 00:07:11,147
+So that's how you multiply a matrix and a vector.
+上述就是矩阵和向量相乘的方法
+
+187
+00:07:11,170 --> 00:07:12,309
+I know that a lot just
+我知道
+
+188
+00:07:12,309 --> 00:07:13,710
+happened on this slide, so
+这张幻灯片上内容很多
+
+189
+00:07:13,710 --> 00:07:14,662
+if you're not quite sure where all
+如果你在看的过程中
+
+190
+00:07:14,680 --> 00:07:16,228
+these numbers went, you know,
+不是很确定这些数字怎么来的
+
+191
+00:07:16,228 --> 00:07:17,260
+feel free to pause the video
+你可以随时暂停视频
+
+192
+00:07:17,280 --> 00:07:18,345
+you know, and so take a
+慢慢地
+
+193
+00:07:18,345 --> 00:07:19,980
+slow careful look at this
+仔细琢磨
+
+194
+00:07:19,980 --> 00:07:21,195
+big calculation that we just
+整个计算过程
+
+195
+00:07:21,195 --> 00:07:22,318
+did and try to make
+尽量
+
+196
+00:07:22,318 --> 00:07:23,755
+sure that you understand the steps
+确保自己理解了
+
+197
+00:07:23,760 --> 00:07:25,144
+of what just happened to get
+得到14 13 11
+
+198
+00:07:25,144 --> 00:07:29,570
+us these numbers,fourteen, thirteen and eleven.
+这些结果的每一个步骤
+
+199
+00:07:29,650 --> 00:07:31,959
+Finally, let me show you a neat trick.
+最后 我将教你们一个小技巧。
+
+200
+00:07:31,959 --> 00:07:33,939
+Let's say we have
+假设我
+
+201
+00:07:33,940 --> 00:07:36,462
+a set of four houses so 4
+有四间房子
+
+202
+00:07:36,462 --> 00:07:38,650
+houses with 4 sizes like these.
+这些房子有四种大小
+
+203
+00:07:38,650 --> 00:07:39,908
+And let's say I have a
+我有一个
+
+204
+00:07:39,908 --> 00:07:41,418
+hypotheses for predicting what is
+假设函数
+
+205
+00:07:41,420 --> 00:07:43,885
+the price of a house, and
+用于预测房子的价格
+
+206
+00:07:43,890 --> 00:07:45,861
+let's say I want to compute,
+我需要计算
+
+207
+00:07:45,861 --> 00:07:49,347
+you know, H of X for each of my 4 houses here.
+四间房子的大小作为X值时 H的大小(即预测的房价)
+
+208
+00:07:49,347 --> 00:07:51,039
+It turns out there's neat way
+这里有一种简单的方法
+
+209
+00:07:51,039 --> 00:07:52,979
+of posing this, applying this
+可以同时
+
+210
+00:07:52,980 --> 00:07:56,780
+hypothesis to all of my houses at the same time.
+计算四间房子的预测价格
+
+211
+00:07:56,780 --> 00:07:57,795
+It turns out there's a neat
+我可以将它简单地
+
+212
+00:07:57,795 --> 00:07:59,509
+way to pose this as a
+利用
+
+213
+00:07:59,509 --> 00:08:01,798
+Matrix Vector multiplication.
+矩阵向量相乘的思想来计算
+
+214
+00:08:02,240 --> 00:08:03,672
+So, here's how I'm going to do it.
+所以 对于这个问题我会这么计算
+
+215
+00:08:03,672 --> 00:08:06,717
+I am going to construct a matrix as follows.
+首先我要构建一个
+
+216
+00:08:06,717 --> 00:08:08,122
+My matrix is going to be
+如下所示的矩阵
+
+217
+00:08:08,122 --> 00:08:11,892
+1111 times, and I'm
+元素是1111
+
+218
+00:08:11,892 --> 00:08:15,495
+going to write down the sizes
+然后我把四个房子的大小
+
+219
+00:08:15,510 --> 00:08:19,935
+of my four houses here and
+写在这儿
+
+220
+00:08:19,935 --> 00:08:21,249
+I'm going to construct a vector
+我还需要构造一个向量
+
+221
+00:08:21,249 --> 00:08:23,354
+as well, And my
+我的向量
+
+222
+00:08:23,354 --> 00:08:25,609
+vector is going to this
+它将是一个
+
+223
+00:08:25,609 --> 00:08:30,072
+vector of two elements, that's
+二维向量
+
+224
+00:08:30,072 --> 00:08:32,182
+minus 40 and 0.25.
+即 40 和 0.25
+
+225
+00:08:32,182 --> 00:08:34,607
+That's these two co-efficients;
+这是预测函数的两个系数
+
+226
+00:08:34,607 --> 00:08:35,432
+data 0 and data 1.
+theta0 和 theta1
+
+227
+00:08:35,432 --> 00:08:36,835
+And what I am going
+接下来
+
+228
+00:08:36,835 --> 00:08:38,048
+to do is to take matrix
+我要做的就是
+
+229
+00:08:38,060 --> 00:08:39,708
+and that vector and multiply them
+将我构造好的矩阵和向量相乘
+
+230
+00:08:39,708 --> 00:08:42,465
+together, that times is that multiplication symbol.
+这是相乘符号
+
+231
+00:08:42,465 --> 00:08:43,288
+So what do I get?
+我将得到??什么结果呢?
+
+232
+00:08:43,288 --> 00:08:46,412
+Well this is a
+左边是一个
+
+233
+00:08:46,420 --> 00:08:48,228
+four by two matrix.
+4X2 矩阵。
+
+234
+00:08:48,228 --> 00:08:52,005
+This is a two by one matrix.
+右边是一个 2X1 矩阵
+
+235
+00:08:52,005 --> 00:08:53,952
+So the outcome is going
+所以结果
+
+236
+00:08:53,952 --> 00:08:55,355
+to be a four by one
+将是一个4X1向量
+
+237
+00:08:55,355 --> 00:08:59,506
+vector, all right.
+对吧
+
+238
+00:08:59,520 --> 00:09:02,860
+So, let me,
+所以 让我在幻灯片上写上
+
+239
+00:09:02,870 --> 00:09:05,334
+so this is
+结果将是
+
+240
+00:09:05,334 --> 00:09:06,188
+going to be a 4 by
+一个4X1的矩阵
+
+241
+00:09:06,188 --> 00:09:06,957
+1 matrix is the outcome or
+输出结果也就是
+
+242
+00:09:06,957 --> 00:09:10,035
+really a four diminsonal vector,
+一个四维向量
+
+243
+00:09:10,035 --> 00:09:11,562
+so let me write it as
+让我来把它写出来
+
+244
+00:09:11,562 --> 00:09:15,991
+one of my four elements in my four real numbers here.
+用四个实数表示我的四个元素
+
+245
+00:09:16,010 --> 00:09:17,202
+Now it turns out and so
+事实上
+
+246
+00:09:17,202 --> 00:09:18,952
+this first element of this
+结果的第一个元素
+
+247
+00:09:18,952 --> 00:09:20,497
+result, the way I
+我的计算方式
+
+248
+00:09:20,497 --> 00:09:21,505
+am going to get that is, I
+是
+
+249
+00:09:21,505 --> 00:09:25,526
+am going to take this and multiply it by the vector.
+将这一行同我的向量相乘
+
+250
+00:09:25,526 --> 00:09:29,381
+And so this is going to
+结果将是
+
+251
+00:09:29,381 --> 00:09:33,053
+be -40 x
+-40x1
+
+252
+00:09:33,053 --> 00:09:37,645
+1 + 4.25 x 2104.
++ 4.25×2104。
+
+253
+00:09:37,645 --> 00:09:38,998
+By the way, on
+顺便说一下
+
+254
+00:09:38,998 --> 00:09:40,915
+the earlier slides I was
+在先前的幻灯片中
+
+255
+00:09:40,915 --> 00:09:42,257
+writing 1 x -40 and
+我写的是 1x-40
+
+256
+00:09:42,260 --> 00:09:44,405
+2104 x 0.25, but
++ 2104x0.25
+
+257
+00:09:44,405 --> 00:09:46,570
+the order doesn't matter, right?
+但是顺序无关紧要 对吧?
+
+258
+00:09:46,580 --> 00:09:49,637
+-40 x 1 is the same as 1 x -40.
+-40×1 和 1×-40是一样的
+
+259
+00:09:49,637 --> 00:09:52,115
+And this first element, of course,
+这第一个元素
+
+260
+00:09:52,115 --> 00:09:55,288
+is "H" applied to 2104.
+就是当x为2104时的H值
+
+261
+00:09:55,288 --> 00:09:57,395
+So it's really the
+因此
+
+262
+00:09:57,395 --> 00:09:59,969
+predicted price of my first house.
+这是我的第一个房子的预测价格
+
+263
+00:09:59,969 --> 00:10:02,351
+Well, how about the second element?
+那么 第二个元素呢?
+
+264
+00:10:02,390 --> 00:10:04,089
+Hope you can see
+你应该已经想到了
+
+265
+00:10:04,089 --> 00:10:07,912
+where I am going to get the second element.
+我要怎么计算第二个元素了
+
+266
+00:10:07,912 --> 00:10:08,750
+Right?
+对吗?
+
+267
+00:10:08,750 --> 00:10:11,052
+I'm gonna take this and multiply it by my vector.
+我要把这个乘以我的向量
+
+268
+00:10:11,052 --> 00:10:13,154
+And so that's gonna be
+所以就是
+
+269
+00:10:13,180 --> 00:10:15,038
+-40 x 1 + 0.25 x 1416.
+-40×1 + 0.25×1416
+
+270
+00:10:15,038 --> 00:10:23,037
+And so this is going be "H" of 1416.
+这就是x为1416的“H”值。
+
+271
+00:10:23,110 --> 00:10:23,110
+Right?
+理解了吗?
+
+272
+00:10:25,810 --> 00:10:27,024
+And so on for the
+这是第三个
+
+273
+00:10:27,024 --> 00:10:30,720
+third and the fourth
+和第四个
+
+274
+00:10:30,760 --> 00:10:33,797
+elements of this 4 x 1 vector.
+后面就依次计算这个4X1矩阵的第三和第四个元素
+
+275
+00:10:33,800 --> 00:10:37,142
+And just there, right?
+得出结果
+
+276
+00:10:37,142 --> 00:10:39,239
+This thing here that I
+这里
+
+277
+00:10:39,239 --> 00:10:41,131
+just drew the green box around,
+我画了绿色边框的部分
+
+278
+00:10:41,131 --> 00:10:42,752
+that's a real number, OK?
+是一个实数 对吧?
+
+279
+00:10:42,752 --> 00:10:44,169
+That's a single real number,
+它是一个实数
+
+280
+00:10:44,180 --> 00:10:45,673
+and this thing here that
+这里
+
+281
+00:10:45,680 --> 00:10:47,812
+I drew the magenta box around--the
+我画了洋红色边框的部分
+
+282
+00:10:47,812 --> 00:10:49,826
+purple, magenta color box
+紫色 洋红色 边框
+
+283
+00:10:49,850 --> 00:10:50,908
+around--that's a real number, right?
+是一个实数 对吧?
+
+284
+00:10:50,920 --> 00:10:52,683
+And so this thing on
+所以右边
+
+285
+00:10:52,683 --> 00:10:54,104
+the right--this thing on the
+最右边
+
+286
+00:10:54,104 --> 00:10:55,200
+right overall, this is a
+ 就是一个
+
+287
+00:10:55,220 --> 00:10:59,288
+4 by 1 dimensional matrix, was a 4 dimensional vector.
+4X1矩阵 是一个4维向量
+
+288
+00:10:59,288 --> 00:11:00,728
+And, the neat thing about
+这个例子的一个小技巧是
+
+289
+00:11:00,728 --> 00:11:02,128
+this is that when you're
+当你
+
+290
+00:11:02,130 --> 00:11:04,613
+actually implementing this in software--so
+在程序中实现这个过程的时候
+
+291
+00:11:04,613 --> 00:11:06,344
+when you have four houses and
+当你有四间房子
+
+292
+00:11:06,350 --> 00:11:08,525
+when you want to use your hypothesis
+你想使用自己的预测函数
+
+293
+00:11:08,525 --> 00:11:12,308
+to predict the prices, predict the price "Y" of all of these four houses.
+来预测房子的价格
+
+294
+00:11:12,308 --> 00:11:13,553
+What this means is that, you
+完成这些工作
+
+295
+00:11:13,553 --> 00:11:16,130
+know, you can write this in one line of code.
+你可以用一行代码搞定
+
+296
+00:11:16,140 --> 00:11:17,878
+When we talk about octave and
+我们后面会谈到Octave
+
+297
+00:11:17,878 --> 00:11:19,782
+program languages later, you can
+以及编程语言
+
+298
+00:11:19,790 --> 00:11:22,120
+actually, you'll actually write this in one line of code.
+你可以只写一行代码就完成整个过程
+
+299
+00:11:22,120 --> 00:11:24,879
+You write prediction equals my,
+你可以这样写
+
+300
+00:11:24,879 --> 00:11:29,697
+you know, data matrix times
+prediction = data matrix * parameters
+
+301
+00:11:30,582 --> 00:11:33,888
+parameters, right?
+对吧
+
+302
+00:11:33,890 --> 00:11:36,994
+Where data matrix is
+数据矩阵是这一部分
+
+303
+00:11:36,994 --> 00:11:38,661
+this thing here, and parameters
+参数
+
+304
+00:11:38,661 --> 00:11:40,447
+is this thing here, and this
+是这一部分
+
+305
+00:11:40,447 --> 00:11:44,138
+times is a matrix vector multiplication.
+这就是一个矩阵向量乘法
+
+306
+00:11:44,138 --> 00:11:45,834
+And if you just do this then
+如果你这么做了
+
+307
+00:11:45,834 --> 00:11:47,579
+this variable prediction - sorry
+这个变量prediction
+
+308
+00:11:47,579 --> 00:11:49,270
+for my bad handwriting - then
+抱歉 我的字写得很潦草
+
+309
+00:11:49,270 --> 00:11:50,942
+just implement this one
+只需要实现
+
+310
+00:11:50,942 --> 00:11:52,357
+line of code assuming you have
+这一行代码
+
+311
+00:11:52,357 --> 00:11:55,328
+an appropriate library to do matrix vector multiplication.
+如果你有一个做矩阵向量相乘的函数库的话
+
+312
+00:11:55,328 --> 00:11:56,518
+If you just do this,
+如果你这样做的话
+
+313
+00:11:56,518 --> 00:11:58,965
+then prediction becomes this
+右侧的prediction变量就会形成
+
+314
+00:11:58,965 --> 00:12:00,714
+4 by 1 dimensional vector, on
+一个4维向量
+
+315
+00:12:00,714 --> 00:12:04,655
+the right, that just gives you all the predicted prices.
+给你所有的预测价格。
+
+316
+00:12:04,655 --> 00:12:07,163
+And your alternative to doing
+另一种计算方式是
+
+317
+00:12:07,163 --> 00:12:09,286
+this as a matrix vector multiplication
+作为一种矩阵向量相乘的方式
+
+318
+00:12:09,310 --> 00:12:11,241
+would be to write eomething like
+实际上就是一种
+
+319
+00:12:11,241 --> 00:12:13,542
+, you know, for I equals 1 to 4, right?
+通过for循环 for 1 to 4 对吧?
+
+320
+00:12:13,542 --> 00:12:15,150
+And you have say a thousand houses
+如果说你有一千间房子
+
+321
+00:12:15,160 --> 00:12:17,451
+it would be for I equals 1 to a thousand or whatever.
+就将是 for 1 to 1000 或者别的任何数
+
+322
+00:12:17,451 --> 00:12:18,772
+And then you have to write a
+然后如果i等于的话 你必须写一个
+
+323
+00:12:18,772 --> 00:12:21,898
+prediction, you know, if I equals.
+有一个假设条件 I相等
+
+324
+00:12:21,910 --> 00:12:23,123
+and then do a bunch
+然后需要做
+
+325
+00:12:23,130 --> 00:12:25,645
+more work over there and it
+比矩阵向量相乘多得多的工作
+
+326
+00:12:25,645 --> 00:12:27,188
+turns out that When you
+当你有
+
+327
+00:12:27,188 --> 00:12:28,549
+have a large number of houses,
+大量的房子的时候
+
+328
+00:12:28,549 --> 00:12:29,928
+if you're trying to predict the prices
+如果你试图预测
+
+329
+00:12:29,930 --> 00:12:31,033
+of not just four but maybe
+不只是四座
+
+330
+00:12:31,033 --> 00:12:33,230
+of a thousand houses then
+或许是一千座房子的时候
+
+331
+00:12:33,410 --> 00:12:35,175
+it turns out that when
+事实证明
+
+332
+00:12:35,175 --> 00:12:36,118
+you implement this in the
+当你使用矩阵向量相乘的方法时
+
+333
+00:12:36,118 --> 00:12:40,087
+computer, implementing it like this, in any of the various languages.
+在计算机中 使用任何语言
+
+334
+00:12:40,087 --> 00:12:41,535
+This is not only true for
+不仅仅是Octave
+
+335
+00:12:41,535 --> 00:12:43,022
+Octave, but for Supra Server
+还有Supra Server Java Python
+
+336
+00:12:43,030 --> 00:12:46,252
+Java or Python, other high-level, other languages as well.
+等高级语言 以及其他语言 都可以很快的实现
+
+337
+00:12:46,252 --> 00:12:48,045
+It turns out, that, by writing
+事实证明
+
+338
+00:12:48,045 --> 00:12:49,811
+code in this style on the
+像左边这样子写代码
+
+339
+00:12:49,811 --> 00:12:51,552
+left, it allows you to
+不仅可以
+
+340
+00:12:51,552 --> 00:12:53,283
+not only simplify the
+简化你的代码
+
+341
+00:12:53,283 --> 00:12:54,677
+code, because, now, you're just
+现在你只需要
+
+342
+00:12:54,677 --> 00:12:55,857
+writing one line of code
+写一行代码
+
+343
+00:12:55,870 --> 00:12:58,427
+rather than the form of a bunch of things inside.
+而不是一堆代码
+
+344
+00:12:58,450 --> 00:12:59,727
+But, for subtle reasons, that we
+而且 还有一个微妙的好处
+
+345
+00:12:59,730 --> 00:13:01,398
+will see later, it turns
+我们后面将会了解到
+
+346
+00:13:01,400 --> 00:13:03,392
+out to be much more computationally
+就是基于你所有的房子
+
+347
+00:13:03,392 --> 00:13:05,617
+efficient to make predictions
+这样做计算效率将会更高
+
+348
+00:13:05,617 --> 00:13:06,583
+on all of the prices of
+比你像右边那样
+
+349
+00:13:06,583 --> 00:13:08,348
+all of your houses doing it
+用代码实现公式
+
+350
+00:13:08,360 --> 00:13:09,693
+the way on the left than the
+的方式效率
+
+351
+00:13:09,693 --> 00:13:13,334
+way on the right than if you were to write your own formula.
+将会高很多
+
+352
+00:13:13,334 --> 00:13:14,596
+I'll say more about this
+我后面在讨论向量化的时候
+
+353
+00:13:14,596 --> 00:13:15,978
+later when we talk about
+会详细地
+
+354
+00:13:15,978 --> 00:13:17,684
+vectorization, but, so, by
+讨论这个问题
+
+355
+00:13:17,684 --> 00:13:19,145
+posing a prediction this way, you
+所以 通过这种方式计算预测值
+
+356
+00:13:19,145 --> 00:13:20,511
+get not only a simpler piece
+不仅代码更加简洁
+
+357
+00:13:20,511 --> 00:13:23,200
+of code, but a more efficient one.
+而且效率更高
+
+358
+00:13:23,200 --> 00:13:25,151
+So, that's it for
+以上就是矩阵向量相乘的全部内容
+
+359
+00:13:25,151 --> 00:13:27,063
+matrix vector multiplication and we'll
+我们在后面
+
+360
+00:13:27,063 --> 00:13:28,432
+make good use of these sorts
+在其他模型中
+
+361
+00:13:28,432 --> 00:13:30,352
+of operations as we develop
+计算实例的回归的时候
+
+362
+00:13:30,370 --> 00:13:32,888
+the living regression in other models further.
+将会有效地利用到这一讲的内容
+
+363
+00:13:32,910 --> 00:13:34,259
+But, in the next video we're
+在接下来的视频中
+
+364
+00:13:34,259 --> 00:13:36,150
+going to take this and generalize this
+我将会从特殊到一般
+
+365
+00:13:36,150 --> 00:13:39,527
+to the case of matrix matrix multiplication.
+讲讲矩阵与矩阵相乘的情况
+
diff --git a/srt/3 - 4 - Matrix Matrix Multiplication (11 min).srt b/srt/3 - 4 - Matrix Matrix Multiplication (11 min).srt
new file mode 100644
index 00000000..2d10b4f9
--- /dev/null
+++ b/srt/3 - 4 - Matrix Matrix Multiplication (11 min).srt
@@ -0,0 +1,1570 @@
+1
+00:00:00,190 --> 00:00:01,558
+In this video we talk about
+在这段视频中我们将会讨论
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,558 --> 00:00:03,577
+matrix, matrix multiplication or
+矩阵 矩阵的乘法以及
+
+3
+00:00:03,580 --> 00:00:06,262
+how to multiply two matrices together.
+如何将两个矩阵相乘
+
+4
+00:00:06,590 --> 00:00:07,935
+When we talk about the method
+我们会使用这样一种方法
+
+5
+00:00:07,935 --> 00:00:09,412
+in linear regression for how
+在线性回归中用以解决
+
+6
+00:00:09,412 --> 00:00:11,251
+to solve for the parameters,
+参数计算的问题
+
+7
+00:00:11,251 --> 00:00:13,195
+theta zero and theta one, all in one shot.
+这种方法会把θ0、θ1等参数都放在一起来计算
+
+8
+00:00:13,195 --> 00:00:16,601
+So, without needing an iterative algorithm like gradient descent.
+也就是说 我们不需要一个迭代的梯度下降算法
+
+9
+00:00:16,601 --> 00:00:18,005
+When we talk about that algorithm,
+当我们谈到这个算法的时候
+
+10
+00:00:18,005 --> 00:00:19,982
+it turns out that matrix, matrix
+就会发现矩阵以及矩阵间的乘法运算
+
+11
+00:00:19,982 --> 00:00:23,086
+multiplication is one of the key steps that you need to know.
+是你必须理解的关键步骤之一
+
+12
+00:00:24,050 --> 00:00:27,885
+So, let's, as usual, start with an example.
+所以让我们像往常那样 从一个例子开始
+
+13
+00:00:28,790 --> 00:00:30,558
+Let's say I have two matrices
+比方说 我有两个矩阵
+
+14
+00:00:30,558 --> 00:00:33,060
+and I want to multiply them together.
+我想将它们相乘
+
+15
+00:00:33,060 --> 00:00:34,343
+Let me again just reference this
+让我先只是按照这个例子做一遍(乘法)
+
+16
+00:00:34,343 --> 00:00:37,441
+example and then I'll tell you in a little bit what happens.
+然后告诉你这其中运算的细节
+
+17
+00:00:38,000 --> 00:00:39,154
+So, the first thing
+那么 我要做的第一件事是
+
+18
+00:00:39,160 --> 00:00:40,589
+I'm gonna do is, I'm going
+我先把
+
+19
+00:00:40,589 --> 00:00:43,154
+to pull out the first
+右边这个矩阵的第一列
+
+20
+00:00:43,170 --> 00:00:45,545
+column of this matrix on the right.
+提取出来
+
+21
+00:00:46,340 --> 00:00:48,135
+And I'm going to take this
+然后我将会把
+
+22
+00:00:48,135 --> 00:00:49,163
+matrix on the left and
+左边的这个矩阵和
+
+23
+00:00:49,170 --> 00:00:52,385
+multiply it by, you know, a vector.
+之前取出来的这一列(前面提过的,向量)相乘
+
+24
+00:00:52,385 --> 00:00:55,188
+That's just this first column, OK?
+这只是第一列 是吧?
+
+25
+00:00:55,188 --> 00:00:56,385
+And it turns out if I
+然后我们可以看到 如果我
+
+26
+00:00:56,385 --> 00:00:59,067
+do that I am going to get the vector 11, 9.
+这么做 我就会得到向量(11,9)
+
+27
+00:00:59,070 --> 00:01:02,068
+So, this is the same matrix
+所以这是与上个视频的矩阵
+
+28
+00:01:02,068 --> 00:01:05,932
+vector multiplication as you saw in the last videos.
+和向量的乘法是一样的
+
+29
+00:01:05,950 --> 00:01:08,934
+I worked this out in advance so, I know it's 11, 9.
+我已经提前算出了这个结果 是(11,9)
+
+30
+00:01:08,934 --> 00:01:10,519
+And, then, the second thing
+那么 之后的第二件事
+
+31
+00:01:10,519 --> 00:01:12,811
+I'm going to do is, I'm going
+我要做的就是
+
+32
+00:01:12,811 --> 00:01:14,752
+to pull out the second column,
+我将把第二列再单独提出出来
+
+33
+00:01:14,752 --> 00:01:16,537
+this matrix on the right and
+右边这个矩阵的第二列
+
+34
+00:01:16,537 --> 00:01:18,575
+I am then going to
+然后我将要把它和
+
+35
+00:01:18,575 --> 00:01:20,174
+take this matrix on the left,
+左边这个矩阵相乘
+
+36
+00:01:20,174 --> 00:01:21,398
+right, so, it will be that matrix,
+是的吧 所以 这就是那个矩阵
+
+37
+00:01:21,410 --> 00:01:23,476
+and multiply it by
+用右边的第二列
+
+38
+00:01:23,480 --> 00:01:24,902
+that second column on the right.
+来乘以这个矩阵
+
+39
+00:01:24,902 --> 00:01:26,360
+So, again, this is a matrix
+因此 同样的 这是一个矩阵和
+
+40
+00:01:27,060 --> 00:01:28,960
+vector multiplication set, which
+向量的乘法运算 这
+
+41
+00:01:28,960 --> 00:01:30,643
+you saw from the previous video, and
+就是你从上一个视频所学到的
+
+42
+00:01:30,643 --> 00:01:31,623
+it turns out that if you
+如果你这么做
+
+43
+00:01:31,623 --> 00:01:32,768
+multiply this matrix and this
+把这个矩阵和这个向量相乘
+
+44
+00:01:32,780 --> 00:01:34,250
+vector, you get 10,
+你会得到
+
+45
+00:01:34,250 --> 00:01:36,214
+14 and by
+(10,14)这个结果
+
+46
+00:01:36,214 --> 00:01:37,472
+the way, if you want to practice
+顺便说一下 如果你想练习
+
+47
+00:01:37,472 --> 00:01:39,776
+your matrix vector multiplication, feel
+矩阵和向量的乘法运算
+
+48
+00:01:39,776 --> 00:01:42,810
+free to pause the video and check this product yourself.
+那么就先暂停下视频 自己算一算结果对不对
+
+49
+00:01:43,170 --> 00:01:44,248
+Then, I'm just going
+好吧 现在我仅仅需要
+
+50
+00:01:44,248 --> 00:01:45,743
+to take these two results and
+将得到的这两个结果放在一起
+
+51
+00:01:45,743 --> 00:01:48,398
+put them together, and that will be my answer.
+那么这就是我的答案了
+
+52
+00:01:48,400 --> 00:01:49,962
+So, turns out the
+那么 我们可以看到
+
+53
+00:01:49,962 --> 00:01:51,350
+outcome of this product is going
+计算结果是
+
+54
+00:01:51,350 --> 00:01:53,449
+to be a 2 by 2 matrix, and
+一个2 x 2的矩阵
+
+55
+00:01:53,449 --> 00:01:54,467
+The way I am going to fill
+我用来填充这个矩阵的方法
+
+56
+00:01:54,467 --> 00:01:56,294
+in this matrix is just by
+就是
+
+57
+00:01:56,294 --> 00:01:57,914
+taking my elements 11,
+把我的(11,9)
+
+58
+00:01:57,914 --> 00:02:00,137
+9 and plugging them here, and
+填在这里
+
+59
+00:02:00,140 --> 00:02:03,753
+taking 10, 14 and plugging
+把(10,14)填在
+
+60
+00:02:03,753 --> 00:02:06,386
+them into the second column.
+第二列
+
+61
+00:02:06,720 --> 00:02:06,720
+Okay?
+是的吧?
+
+62
+00:02:07,430 --> 00:02:08,824
+So, that was the mechanics of
+所以 这就是如何
+
+63
+00:02:08,824 --> 00:02:11,086
+how to multiply a matrix by
+将两个矩阵相乘的
+
+64
+00:02:11,086 --> 00:02:12,248
+another matrix.
+详细方法与过程
+
+65
+00:02:12,265 --> 00:02:14,094
+You basically look at the
+每次你只需要看
+
+66
+00:02:14,094 --> 00:02:17,045
+second matrix one column at a time, and you assemble the answers.
+第二个矩阵的一列 然后把你的答案拼凑起来
+
+67
+00:02:17,070 --> 00:02:18,199
+And again, we will step
+再次强调下 我们将一步步的来计算
+
+68
+00:02:18,199 --> 00:02:19,455
+through this much more carefully in
+几秒中的时间里需要非常仔细
+
+69
+00:02:19,455 --> 00:02:20,754
+a second, but I just
+但我也要指出
+
+70
+00:02:20,754 --> 00:02:22,852
+want to point out also, this
+我也要指出的是
+
+71
+00:02:22,852 --> 00:02:26,301
+first example is a 2x3 matrix matrix.
+第一个例子是一个2X3矩阵
+
+72
+00:02:26,301 --> 00:02:28,548
+Multiplying that by a
+乘以一个
+
+73
+00:02:28,550 --> 00:02:30,649
+3x2 matrix, and the
+3x2的矩阵 他们相乘
+
+74
+00:02:30,649 --> 00:02:32,497
+outcome of this product, it
+得到的结果
+
+75
+00:02:32,497 --> 00:02:35,518
+turns out to be a 2x2
+是一个2x2的
+
+76
+00:02:35,518 --> 00:02:36,802
+matrix.
+矩阵
+
+77
+00:02:36,802 --> 00:02:39,121
+And again, we'll see in a second why this was the case.
+我们将很快知道为什么是这个结果
+
+78
+00:02:39,122 --> 00:02:40,484
+All right.
+好的
+
+79
+00:02:40,790 --> 00:02:42,637
+That was the mechanics of the calculation.
+这是计算的技巧
+
+80
+00:02:42,637 --> 00:02:43,745
+Let's actually look at the
+让我们再看看
+
+81
+00:02:43,745 --> 00:02:44,953
+details and look at what
+这其中的细节
+
+82
+00:02:44,960 --> 00:02:46,305
+exactly happened.
+看看究竟发生了什么
+
+83
+00:02:46,305 --> 00:02:48,082
+Here are details.
+下面就是详细的过程
+
+84
+00:02:48,082 --> 00:02:49,471
+I have a matrix A and
+我有一个矩阵A
+
+85
+00:02:49,471 --> 00:02:51,325
+I want to multiply that
+我要把它乘以
+
+86
+00:02:51,350 --> 00:02:53,088
+with a matrix B, and the result
+矩阵B 其结果
+
+87
+00:02:53,088 --> 00:02:56,143
+will be some new matrix C. And
+会是一个新的矩阵C
+
+88
+00:02:56,143 --> 00:02:57,168
+it turns out you can only
+并且你会发现你只能
+
+89
+00:02:57,168 --> 00:02:59,238
+multiply together matrices whose
+相乘那些维度
+
+90
+00:02:59,238 --> 00:03:00,714
+dimensions match so A
+匹配的矩阵
+
+91
+00:03:00,714 --> 00:03:02,239
+is an m by n matrix,
+因此如果A是一个m×n的矩阵
+
+92
+00:03:02,240 --> 00:03:04,468
+so m columns, n columns and
+就是说m行n列
+
+93
+00:03:04,468 --> 00:03:05,394
+I am going to multiply
+我将要用它与
+
+94
+00:03:05,394 --> 00:03:06,480
+that with an n by o
+一个n×o的矩阵相乘
+
+95
+00:03:06,500 --> 00:03:08,232
+and it turns out this n
+并且实际上这里的n
+
+96
+00:03:08,232 --> 00:03:10,306
+here must match this n
+必须匹配这里的这个n
+
+97
+00:03:10,330 --> 00:03:11,978
+here, so the number of columns
+所以第一个矩阵的列的数目
+
+98
+00:03:11,978 --> 00:03:16,778
+in first matrix must equal to the number of rows in second matrix.
+必须等于第二矩阵中的行的数目
+
+99
+00:03:16,800 --> 00:03:18,035
+And the result of this
+并且相乘得到的结果
+
+100
+00:03:18,035 --> 00:03:20,639
+product will be an M
+结果会是一个m×o的矩阵
+
+101
+00:03:20,639 --> 00:03:25,204
+by O matrix, like the the matrix C here.
+就像这个矩阵C这样
+
+102
+00:03:25,390 --> 00:03:26,822
+And, in the previous
+并且 在前面的视频中
+
+103
+00:03:26,830 --> 00:03:28,743
+video, everything we did corresponded
+我们所做的一切都符合这个规则
+
+104
+00:03:28,770 --> 00:03:31,380
+to this special case of OB
+这是一种当矩阵B的o值
+
+105
+00:03:31,380 --> 00:03:32,588
+equal to 1.
+等于1的特殊情况(指的是矩阵和向量相乘)
+
+106
+00:03:32,588 --> 00:03:33,150
+Okay?
+明白了吗?
+
+107
+00:03:33,150 --> 00:03:35,469
+That was, that was in case of B being a vector.
+这是在B是一个向量的情况下
+
+108
+00:03:35,480 --> 00:03:36,522
+But now, we are going to
+但是现在 我们要处理
+
+109
+00:03:36,530 --> 00:03:39,805
+view of the case of values of O larger than 1.
+O的值大于1的情况
+
+110
+00:03:39,805 --> 00:03:41,533
+So, here's how you
+所以 这里就是你怎样
+
+111
+00:03:41,540 --> 00:03:44,564
+multiply together the two matrices.
+把两个矩阵相乘
+
+112
+00:03:44,564 --> 00:03:46,349
+In order to get, what
+为了得到结果
+
+113
+00:03:46,349 --> 00:03:47,775
+I am going to do is
+我要做的就是
+
+114
+00:03:47,775 --> 00:03:49,180
+I am going to take the
+我将要取
+
+115
+00:03:49,270 --> 00:03:52,025
+first column of B
+B矩阵的第一列
+
+116
+00:03:52,025 --> 00:03:53,782
+and treat that as a vector,
+把取出的这列看成一个向量
+
+117
+00:03:53,782 --> 00:03:56,098
+and multiply the matrix A,
+并乘以矩阵A
+
+118
+00:03:56,120 --> 00:03:57,909
+with the first column of B,
+用B矩阵的第一列
+
+119
+00:03:57,930 --> 00:03:59,632
+and the result of that will
+这个计算结果将是
+
+120
+00:03:59,632 --> 00:04:00,370
+be a M by 1 vector,
+m×1的矩阵(也就是一个向量)
+
+121
+00:04:00,400 --> 00:04:04,726
+and we're going to put that over here.
+我们把结果先放在这里
+
+122
+00:04:05,070 --> 00:04:06,481
+Then, I'm going to
+然后 我将要取
+
+123
+00:04:06,481 --> 00:04:09,048
+take the second column
+B矩阵的
+
+124
+00:04:09,048 --> 00:04:11,920
+of B, right, so,
+第二列
+
+125
+00:04:12,010 --> 00:04:13,775
+this is another n by
+那么我会又得到一个n×1的向量
+
+126
+00:04:13,790 --> 00:04:15,501
+one vector, so, this column
+也就是 这里的这一列
+
+127
+00:04:15,501 --> 00:04:16,690
+here, this is right, n
+这是正确的
+
+128
+00:04:16,690 --> 00:04:17,910
+by one, those are n dimensional
+n×1的矩阵 也就是n维的向量
+
+129
+00:04:17,910 --> 00:04:19,782
+vector, gonna multiply this
+我将要把这个矩阵
+
+130
+00:04:19,782 --> 00:04:21,678
+matrix with this n by one vector.
+和这些n乘1的向量相乘
+
+131
+00:04:21,678 --> 00:04:23,775
+The result will be
+其结果将是
+
+132
+00:04:23,775 --> 00:04:26,018
+a M dimensional vector,
+一个m维的向量
+
+133
+00:04:26,450 --> 00:04:28,035
+which we'll put there.
+然后我会把结果先放在那里
+
+134
+00:04:28,035 --> 00:04:29,273
+And, so on.
+依此类推
+
+135
+00:04:29,273 --> 00:04:30,035
+Okay?
+对吧?
+
+136
+00:04:30,035 --> 00:04:31,135
+And, so, you know, and then
+那么 你知道的
+
+137
+00:04:31,135 --> 00:04:32,099
+I'm going to take the third
+我开始取第三列
+
+138
+00:04:32,099 --> 00:04:33,475
+column, multiply it by
+把它和这个矩阵相乘
+
+139
+00:04:33,475 --> 00:04:37,507
+this matrix, I get a M dimensional vector.
+我又得到了一个M维向量
+
+140
+00:04:37,510 --> 00:04:39,368
+And so on, until you get
+依此类推 直到你计算到了
+
+141
+00:04:39,368 --> 00:04:40,610
+to the last column times,
+最后一列
+
+142
+00:04:40,610 --> 00:04:41,870
+the matrix times the
+矩阵乘以
+
+143
+00:04:41,950 --> 00:04:43,420
+lost column gives you
+你取到的最后一列
+
+144
+00:04:43,530 --> 00:04:45,757
+the lost column of C.
+就是C的最后一列
+
+145
+00:04:46,460 --> 00:04:48,808
+Just to say that again.
+再说一遍
+
+146
+00:04:49,310 --> 00:04:51,510
+The ith column of the
+矩阵C的第i列
+
+147
+00:04:51,600 --> 00:04:53,777
+matrix C is attained
+是根据把
+
+148
+00:04:53,810 --> 00:04:56,108
+by taking the matrix A and
+矩阵A与
+
+149
+00:04:56,110 --> 00:04:57,641
+multiplying the matrix A with
+矩阵B的第i列
+
+150
+00:04:57,660 --> 00:04:59,638
+the ith column of the
+相乘得到的
+
+151
+00:04:59,638 --> 00:05:01,539
+matrix B for the values
+结果
+
+152
+00:05:01,560 --> 00:05:03,387
+of I equals 1, 2
+依次相加
+
+153
+00:05:03,387 --> 00:05:04,936
+up through O. Okay ?
+从1,2到o依次相加的 对吧?
+
+154
+00:05:04,950 --> 00:05:06,752
+So, this is just a summary
+那么 我们在这里做一个总结
+
+155
+00:05:06,760 --> 00:05:08,765
+of what we did up there
+我们总结了我们为了
+
+156
+00:05:08,765 --> 00:05:10,163
+in order to compute the matrix
+计算矩阵C所做的步骤
+
+157
+00:05:10,163 --> 00:05:12,909
+C. Let's look at just one more example.
+让我们再看一个例子
+
+158
+00:05:12,940 --> 00:05:17,235
+Let 's say, I want to multiply together these two matrices.
+比方说我想把这两个矩阵相乘
+
+159
+00:05:17,235 --> 00:05:18,208
+So, what I'm going to
+那么我首先要做的是
+
+160
+00:05:18,208 --> 00:05:20,178
+do is, first pull
+先取出
+
+161
+00:05:20,178 --> 00:05:22,535
+out the first column
+我的第二个矩阵的
+
+162
+00:05:22,535 --> 00:05:24,370
+of my second matrix, that
+第一列
+
+163
+00:05:24,370 --> 00:05:26,185
+was matrix B, that was
+就是这个矩阵B 这就是
+
+164
+00:05:26,185 --> 00:05:29,133
+my matrix B on the previous slide.
+上一张幻灯片上出现的矩阵B
+
+165
+00:05:29,160 --> 00:05:30,660
+And, I therefore, have this
+因此 我就这么
+
+166
+00:05:30,660 --> 00:05:32,917
+matrix times my vector and
+用矩阵和我取的向量相乘
+
+167
+00:05:32,920 --> 00:05:35,350
+so, oh, let's do this calculation quickly.
+所以 让我们快速的计算这个结果
+
+168
+00:05:35,350 --> 00:05:37,518
+There's going to be equal to,
+这等于
+
+169
+00:05:37,518 --> 00:05:39,048
+right, 1, 3 times 0,
+没错 (1,3)乘以(0,3)
+
+170
+00:05:39,048 --> 00:05:41,238
+3 so that gives 1
+所以 就是
+
+171
+00:05:41,270 --> 00:05:45,930
+times 0, plus 3 times 3.
+1x0+3x3(9)
+
+172
+00:05:45,930 --> 00:05:48,322
+And, the second element
+此外 第二元素
+
+173
+00:05:48,322 --> 00:05:49,530
+is going to be 2,
+就是(2,5)
+
+174
+00:05:49,530 --> 00:05:51,678
+5 times 0, 3 so, that's going to
+乘以(0,3)
+
+175
+00:05:51,678 --> 00:05:52,739
+be two times 0 plus 5
+就是0x2+5x3(15)
+
+176
+00:05:52,740 --> 00:05:57,276
+times 3 and that is
+那么结果出来了
+
+177
+00:05:57,276 --> 00:06:02,242
+9,15, actually didn't
+(9,15)实际上该用绿色的颜色标记
+
+178
+00:06:02,242 --> 00:06:03,672
+write that in green, so this
+所以这就是(9.15)
+
+179
+00:06:03,672 --> 00:06:09,365
+is nine fifteen, and then mix.
+那么
+
+180
+00:06:09,365 --> 00:06:12,061
+I am going to pull out
+我将同样的取出
+
+181
+00:06:12,090 --> 00:06:14,451
+the second column of this,
+这个的第二列
+
+182
+00:06:14,451 --> 00:06:16,174
+and do the corresponding
+做相同的计算
+
+183
+00:06:16,190 --> 00:06:18,170
+calculation so there's this
+所以
+
+184
+00:06:18,200 --> 00:06:20,477
+matrix times this vector 1, 2.
+这是这个矩阵乘以(1,2)
+
+185
+00:06:20,477 --> 00:06:22,288
+Let's also do this
+让我们快点算吧
+
+186
+00:06:22,290 --> 00:06:23,814
+quickly, so that's one times
+所以这是
+
+187
+00:06:23,814 --> 00:06:27,362
+one plus three times two.
+1x1 + 3x2
+
+188
+00:06:27,362 --> 00:06:28,973
+So that deals with that
+那么这就处理了这一行
+
+189
+00:06:28,973 --> 00:06:30,868
+row, let's do the
+让我们计算另一行
+
+190
+00:06:30,868 --> 00:06:34,223
+other one, so let's see,
+让我们来看看
+
+191
+00:06:34,223 --> 00:06:37,510
+that gives me two times
+这次是
+
+192
+00:06:37,510 --> 00:06:41,926
+one plus times two,
+2x1 + 5x2
+
+193
+00:06:41,926 --> 00:06:43,493
+so that is going to
+因此这就等于
+
+194
+00:06:43,493 --> 00:06:46,176
+be equal to, let's see,
+我们看一下
+
+195
+00:06:46,176 --> 00:06:47,464
+one times one plus three times
+1x1 + 3x1结果是4
+
+196
+00:06:47,464 --> 00:06:50,378
+one is four and two
+2x1 + 5x2
+
+197
+00:06:50,378 --> 00:06:52,282
+times one plus five times two
+结果是
+
+198
+00:06:52,282 --> 00:06:53,923
+is twelve.
+
+199
+00:06:55,570 --> 00:06:56,660
+So now I have these two
+所以现在我有两个这个了
+
+200
+00:06:56,660 --> 00:06:58,448
+you, and so my
+因此我得到的
+
+201
+00:06:58,448 --> 00:07:00,343
+outcome, so the product
+这两个矩阵
+
+202
+00:07:00,343 --> 00:07:01,714
+of these two matrices is going
+相乘的结果就是
+
+203
+00:07:01,714 --> 00:07:03,831
+to be, this goes
+这个在这儿
+
+204
+00:07:03,831 --> 00:07:07,232
+here and this
+那个放那边
+
+205
+00:07:07,232 --> 00:07:09,828
+goes here, so I
+所以我得到了
+
+206
+00:07:09,828 --> 00:07:14,632
+get nine fifteen and
+9,15和
+
+207
+00:07:14,660 --> 00:07:17,831
+four twelve and you
+4,12
+
+208
+00:07:17,831 --> 00:07:19,657
+may notice also that the result
+你也可能会注意到这个结果
+
+209
+00:07:19,670 --> 00:07:21,616
+of multiplying a 2x2 matrix
+一个2×2的矩阵
+
+210
+00:07:21,616 --> 00:07:23,687
+with another 2x2 matrix.
+乘以另一个2x2的矩阵
+
+211
+00:07:23,687 --> 00:07:25,215
+The resulting dimension is going
+这个维度会是
+
+212
+00:07:25,215 --> 00:07:26,609
+to be that first two times
+第一个矩阵的2乘以第二的矩阵的2
+
+213
+00:07:26,609 --> 00:07:28,415
+that second two, so the result
+所以这个结果本身
+
+214
+00:07:28,430 --> 00:07:31,460
+is itself also a two by two matrix.
+也是一个2x2的矩阵
+
+215
+00:07:35,000 --> 00:07:36,304
+Finally let me show you
+最后让我告诉你
+
+216
+00:07:36,304 --> 00:07:37,795
+one more neat trick you can
+一个更加具体的技巧 你可以
+
+217
+00:07:37,795 --> 00:07:40,699
+do with matrix matrix multiplication.
+在矩阵和矩阵的乘法中使用
+
+218
+00:07:40,980 --> 00:07:42,455
+Let's say as before that we
+比方说 在这之前 我们
+
+219
+00:07:42,455 --> 00:07:45,823
+have four houses whose
+有四间房子
+
+220
+00:07:45,823 --> 00:07:47,970
+prices we want to predict,
+我们要预测其价格
+
+221
+00:07:48,410 --> 00:07:49,825
+only now we have three
+但是现在我们有三个
+
+222
+00:07:49,825 --> 00:07:51,967
+competing hypothesis shown here
+不同的竞争假设集在这儿
+
+223
+00:07:51,970 --> 00:07:54,145
+on the right, so if
+在右侧 因此 如果
+
+224
+00:07:54,145 --> 00:07:55,720
+you want to So apply all
+你想要去把
+
+225
+00:07:55,720 --> 00:07:57,745
+3 competing hypotheses to
+这三个竞争假设集用来
+
+226
+00:07:57,745 --> 00:07:58,951
+all four of the houses, it
+适应这4个房屋的数据
+
+227
+00:07:58,951 --> 00:07:59,926
+turns out you can do that
+那么你可以这样做
+
+228
+00:07:59,926 --> 00:08:01,718
+very efficiently using a
+这将非常高效
+
+229
+00:08:01,718 --> 00:08:05,080
+matrix matrix multiplication so here
+我们使用这里的矩阵乘法来计算
+
+230
+00:08:05,110 --> 00:08:07,347
+on the left is my usual
+左边是我通常使用的矩阵
+
+231
+00:08:07,370 --> 00:08:08,626
+matrix, same as from the
+这与我上个视频
+
+232
+00:08:08,626 --> 00:08:11,063
+last video where these values
+一样 这些值就是
+
+233
+00:08:11,063 --> 00:08:15,012
+are my housing prices and I put ones there on the left as well.
+我的住房价格 我把这些值也放在左边
+
+234
+00:08:15,012 --> 00:08:16,626
+And, what I'm going to
+那么 我要去做的就是
+
+235
+00:08:16,626 --> 00:08:19,029
+do is construct another matrix, where
+构造另一个矩阵
+
+236
+00:08:19,110 --> 00:08:21,693
+here these, the first
+这个矩阵的第一列
+
+237
+00:08:21,700 --> 00:08:23,477
+column, is this minus
+是-40
+
+238
+00:08:23,480 --> 00:08:26,062
+40 and two five and
+0.25
+
+239
+00:08:26,070 --> 00:08:28,372
+the second column is this two
+第二列是
+
+240
+00:08:28,372 --> 00:08:30,945
+hundred open one and so
+(200.0.1)
+
+241
+00:08:31,460 --> 00:08:34,278
+on and it
+以此类推
+
+242
+00:08:34,278 --> 00:08:35,925
+turns out that if you
+事实证明 如果你
+
+243
+00:08:35,925 --> 00:08:37,893
+multiply these two matrices
+把这两个矩阵相乘
+
+244
+00:08:37,910 --> 00:08:40,448
+what you find is that, this
+你就会发现得到了结果的
+
+245
+00:08:40,448 --> 00:08:43,467
+first column, you know,
+第一列 你知道的
+
+246
+00:08:43,467 --> 00:08:46,340
+oh, well how do you get this first column, right?
+那么你怎么得到这个第一列呢?
+
+247
+00:08:46,400 --> 00:08:48,850
+A procedure from matrix
+这就要用到我们讲过的
+
+248
+00:08:48,850 --> 00:08:50,565
+matrix multiplication is the way
+矩阵和矩阵相乘的过程
+
+249
+00:08:50,565 --> 00:08:51,945
+you get this first column, is
+你得到的这个矩阵的第一列
+
+250
+00:08:51,960 --> 00:08:53,360
+you take this matrix and you
+通过你用这个矩阵
+
+251
+00:08:53,420 --> 00:08:54,816
+multiply it by this
+乘以
+
+252
+00:08:54,840 --> 00:08:56,724
+first column, and we
+这个矩阵的第一列
+
+253
+00:08:56,724 --> 00:08:58,540
+saw in the previous video that this
+这是我们从之前的视频中看到过的
+
+254
+00:08:58,540 --> 00:09:00,472
+is exactly the predicted
+这就是从第一个假设
+
+255
+00:09:00,490 --> 00:09:02,050
+housing prices of the
+预测出的
+
+256
+00:09:02,150 --> 00:09:05,701
+first hypothesis, right?
+住房价格 对吗?
+
+257
+00:09:05,701 --> 00:09:08,775
+Of this first hypothesis here.
+就是这里的这个假设集
+
+258
+00:09:08,790 --> 00:09:10,794
+And, how about a second column?
+那么 第二列是什么呢?
+
+259
+00:09:10,794 --> 00:09:12,955
+Well, how do setup the second column?
+那么 我们应该怎么计算第二列呢?
+
+260
+00:09:12,990 --> 00:09:14,332
+The way you get the second column
+用来得到第二列的方法
+
+261
+00:09:14,332 --> 00:09:15,548
+is, well, you take this
+就是
+
+262
+00:09:15,590 --> 00:09:19,270
+matrix and you multiply by this second column.
+用这个矩阵乘以这个矩阵的第二列
+
+263
+00:09:19,270 --> 00:09:21,293
+And so this second column turns
+那么得到的第二列就是
+
+264
+00:09:21,293 --> 00:09:24,651
+out to be the predictions of
+基于第二个假设
+
+265
+00:09:24,651 --> 00:09:27,728
+the second hypothesis of
+做出的预测结果
+
+266
+00:09:27,750 --> 00:09:30,228
+the second hypothesis up there,
+第二个假设集是在那里
+
+267
+00:09:30,228 --> 00:09:34,450
+and similarly for the third column.
+对于第三列 我们也能得到类似的结果
+
+268
+00:09:34,450 --> 00:09:35,809
+And so, I didn't step
+那么 我并没有
+
+269
+00:09:35,810 --> 00:09:38,058
+through all the details but hopefully
+把详细的细节列出
+
+270
+00:09:38,058 --> 00:09:39,139
+you just, feel free to
+不过 我还是希望你们能够把
+
+271
+00:09:39,140 --> 00:09:40,448
+pause the video and check
+视频暂停下 自己算一算
+
+272
+00:09:40,448 --> 00:09:41,786
+the math yourself and check
+检查下结果对不对
+
+273
+00:09:41,786 --> 00:09:43,972
+that what I just claimed really is true.
+检验下我刚才计算的结果的正确性
+
+274
+00:09:43,990 --> 00:09:45,611
+But it turns out that by
+那么 实际上通过
+
+275
+00:09:45,611 --> 00:09:47,454
+constructing these two matrices, what
+构建这两个矩阵
+
+276
+00:09:47,454 --> 00:09:48,937
+you can therefore do is very
+你就可以
+
+277
+00:09:48,940 --> 00:09:51,180
+quickly apply all three
+快速的把这三个假设集
+
+278
+00:09:51,180 --> 00:09:52,602
+hypotheses to all four
+应用到所有四个
+
+279
+00:09:52,602 --> 00:09:54,455
+house sizes to get,
+房子的尺寸中来计算价格了
+
+280
+00:09:54,455 --> 00:09:56,452
+you know, all twelve predicted
+你看 所有的12种预测到的价格是
+
+281
+00:09:56,452 --> 00:09:57,721
+prices output by your
+通过你的假设集
+
+282
+00:09:57,721 --> 00:10:00,928
+three hypotheses on your four houses.
+以及你的四个房屋数据集得到的
+
+283
+00:10:00,928 --> 00:10:03,366
+So one matrix multiplications
+所以 一次矩阵乘法操作
+
+284
+00:10:03,366 --> 00:10:05,072
+that you manage to make 12
+就使你做出了12种预测
+
+285
+00:10:05,080 --> 00:10:07,130
+predictions and, even
+更好的是
+
+286
+00:10:07,130 --> 00:10:08,446
+better, it turns out that
+事实证明
+
+287
+00:10:08,446 --> 00:10:09,937
+in order to do that matrix
+为了做到
+
+288
+00:10:09,937 --> 00:10:11,408
+multiplication and there are
+矩阵间的乘法
+
+289
+00:10:11,408 --> 00:10:13,130
+lots of good linear algebra libraries
+有很多很好的线性代数库函数
+
+290
+00:10:13,150 --> 00:10:14,767
+in order to do this
+都是为了做到这一点
+
+291
+00:10:14,767 --> 00:10:16,676
+multiplication step for you,
+为你实现矩阵乘法
+
+292
+00:10:16,676 --> 00:10:18,250
+and no matter so pretty
+而且不管你用的是
+
+293
+00:10:18,250 --> 00:10:22,025
+much any reasonable programming language that you might be using.
+多么合理的编程语言
+
+294
+00:10:22,025 --> 00:10:24,005
+Certainly all the top ten
+当然 当下最流行的
+
+295
+00:10:24,005 --> 00:10:27,898
+most popular programming languages will have great linear algebra libraries.
+编程语言中的前十名都有很棒的线性代数函数库
+
+296
+00:10:27,898 --> 00:10:29,554
+And they'll be good thing are
+这是很好的事情
+
+297
+00:10:29,554 --> 00:10:31,463
+highly optimized in order
+我们能够在高度优化下
+
+298
+00:10:31,463 --> 00:10:33,415
+to do that, matrix matrix
+做到矩阵
+
+299
+00:10:33,440 --> 00:10:36,531
+multiplication very efficiently, including
+和矩阵间高效的乘法 包括
+
+300
+00:10:36,531 --> 00:10:38,501
+taking, taking advantage of
+采取了一些优化处理的方式
+
+301
+00:10:38,501 --> 00:10:41,119
+any parallel computation that
+如并行计算
+
+302
+00:10:41,130 --> 00:10:42,886
+your computer may be capable
+如果你的电脑支持的话
+
+303
+00:10:42,886 --> 00:10:46,297
+of, when your computer has multiple
+当你的计算机有多个
+
+304
+00:10:46,330 --> 00:10:48,016
+calls or lots of
+调度或者
+
+305
+00:10:48,016 --> 00:10:49,866
+multiple processors, within a processor sometimes
+多个处理器 一个处理器内有时
+
+306
+00:10:49,866 --> 00:10:53,285
+there's there's parallelism as well called symdiparallelism [sp].
+存在并行的计算 我们称之为SIMD Parallelism
+
+307
+00:10:53,285 --> 00:10:55,242
+The computer take care of
+在计算机管理机制下
+
+308
+00:10:55,242 --> 00:10:56,727
+and you should, there are
+你应该有
+
+309
+00:10:56,730 --> 00:10:58,826
+very good free libraries
+非常不错的免费类库
+
+310
+00:10:58,826 --> 00:11:00,146
+that you can use to do
+你可以用来做
+
+311
+00:11:00,146 --> 00:11:02,326
+this matrix matrix multiplication very
+高效的矩阵间的乘法计算
+
+312
+00:11:02,326 --> 00:11:04,104
+efficiently so that you
+因此你就能
+
+313
+00:11:04,110 --> 00:11:05,908
+can very efficiently, you
+你知道的
+
+314
+00:11:05,930 --> 00:11:08,738
+know, makes lots of predictions of lots of hypotheses.
+方便地计算有很多假设集时的预测数据
+
diff --git a/srt/3 - 5 - Matrix Multiplication Properties (9 min).srt b/srt/3 - 5 - Matrix Multiplication Properties (9 min).srt
new file mode 100644
index 00000000..4672f3ad
--- /dev/null
+++ b/srt/3 - 5 - Matrix Multiplication Properties (9 min).srt
@@ -0,0 +1,1276 @@
+1
+00:00:00,060 --> 00:00:01,920
+Matrix multiplication is really
+矩阵乘法运算非常实用
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,920 --> 00:00:03,302
+useful since you can pack
+因为你可以通过矩阵乘法
+
+3
+00:00:03,302 --> 00:00:05,494
+a lot of computation into just
+将大量运算打包到
+
+4
+00:00:05,494 --> 00:00:08,092
+one matrix multiplication operation.
+一次矩阵的乘法运算中
+
+5
+00:00:08,110 --> 00:00:10,829
+But you should be careful of how you use them.
+但是怎样使用这个方法你要提起注意
+
+6
+00:00:10,829 --> 00:00:12,103
+In this video I want to
+在这个视频中 我想
+
+7
+00:00:12,103 --> 00:00:16,974
+tell you about a few properties of matrix multiplication.
+介绍一些矩阵乘法的特性
+
+8
+00:00:18,328 --> 00:00:19,678
+When working with just raw
+运算时 如果仅仅使用实数
+
+9
+00:00:19,680 --> 00:00:21,653
+numbers or when working with
+或者说是标量
+
+10
+00:00:21,653 --> 00:00:25,797
+scalars, multiplication is commutative.
+乘法是可以交换的
+
+11
+00:00:25,797 --> 00:00:27,459
+And what I mean by that is
+我的意思是说
+
+12
+00:00:27,459 --> 00:00:29,272
+if you take three times
+如果你用3乘以5
+
+13
+00:00:29,272 --> 00:00:30,873
+five, that is equal
+那么这个运算等同于
+
+14
+00:00:30,873 --> 00:00:32,368
+to five times three and
+5乘以3
+
+15
+00:00:32,380 --> 00:00:35,371
+the ordering of this multiplication doesn't matter.
+这个乘法运算中两个量的顺序不重要
+
+16
+00:00:35,371 --> 00:00:38,271
+And this is called the commutative
+这就是实数乘法
+
+17
+00:00:38,271 --> 00:00:41,952
+property of multiplication of real numbers.
+的交换律
+
+18
+00:00:41,952 --> 00:00:43,765
+It turns out this property that
+通过这个定律
+
+19
+00:00:43,770 --> 00:00:45,299
+you can, you know, reverse
+你能够颠倒
+
+20
+00:00:45,310 --> 00:00:46,317
+the order in which you
+乘法运算中变量的顺序
+
+21
+00:00:46,317 --> 00:00:50,217
+multiply things, this is not true for matrix multiplication.So
+但是这不能应用在矩阵乘法中
+
+22
+00:00:50,260 --> 00:00:52,294
+concretely, if A and
+所以 具体来讲 如果变量A和
+
+23
+00:00:52,294 --> 00:00:53,423
+B are matrices, then in
+B是矩阵 通常情况下
+
+24
+00:00:53,423 --> 00:00:55,120
+general, A times B is
+A乘以B
+
+25
+00:00:55,120 --> 00:00:56,653
+not equal to B times
+不等于B乘以A
+
+26
+00:00:56,653 --> 00:00:58,220
+A. So just be careful of that.
+因此应该注意到这一点
+
+27
+00:00:58,220 --> 00:01:00,530
+It's not okay to arbitrarily reverse
+随意颠倒矩阵
+
+28
+00:01:00,550 --> 00:01:02,545
+the order in which you are multiplying matrices.
+乘法的顺序是不可行的
+
+29
+00:01:02,545 --> 00:01:04,892
+So, we say that matrix multiplication
+因此 我们说 矩阵乘法
+
+30
+00:01:04,892 --> 00:01:06,420
+is not commutative, it's a fancy
+是不可交换的,这是一个奇特
+
+31
+00:01:06,420 --> 00:01:08,480
+way of saying it.
+的说法
+
+32
+00:01:08,560 --> 00:01:11,028
+As a concrete example, here
+举一个具体的例子
+
+33
+00:01:11,028 --> 00:01:13,156
+are two matrices, matrix 1100
+有两个矩阵 矩阵[1100]
+
+34
+00:01:13,156 --> 00:01:14,302
+times 0020, and if you multiply
+乘以[0020] 如果你将
+
+35
+00:01:14,302 --> 00:01:17,018
+these two matrices, you get this result on the right.
+这两个矩阵相乘 你会得到右边的结果
+
+36
+00:01:17,020 --> 00:01:20,428
+Now, let's swap around the order of these two matrices.
+现在 让我们左右交换这两个矩阵的顺序
+
+37
+00:01:20,460 --> 00:01:21,857
+So, I'm going to take these
+我们对这两个
+
+38
+00:01:21,857 --> 00:01:24,244
+two matrices and just reverse them.
+矩阵 仅仅交换了一下位置
+
+39
+00:01:24,250 --> 00:01:25,511
+It turns out if you multiply
+结果呢 你对这两个
+
+40
+00:01:25,511 --> 00:01:27,629
+these two matrices, you get
+矩阵相乘的时候 你在右边得出了
+
+41
+00:01:27,630 --> 00:01:29,525
+the second answer on the
+第二种答案
+
+42
+00:01:29,525 --> 00:01:31,423
+right and, you know, real
+你看 很明显
+
+43
+00:01:31,423 --> 00:01:33,652
+clearly, these two matrices are
+这个矩阵不同于
+
+44
+00:01:33,652 --> 00:01:36,099
+not equal to each other.
+之前的结果
+
+45
+00:01:36,730 --> 00:01:38,159
+So, in fact, in
+所以 事实上
+
+46
+00:01:38,159 --> 00:01:39,120
+general, if you have
+在通常情况下 如果你
+
+47
+00:01:39,120 --> 00:01:41,585
+a matrix operation like
+做一个矩阵运算
+
+48
+00:01:41,585 --> 00:01:44,793
+A times B. If A
+比如A乘以B 如果
+
+49
+00:01:44,793 --> 00:01:47,301
+is an m by n matrix
+A是一个m×n的矩阵
+
+50
+00:01:47,301 --> 00:01:49,188
+and B is an by
+B是一个n×m的矩阵
+
+51
+00:01:49,210 --> 00:01:52,415
+M matrix, just as an example.
+以此举例
+
+52
+00:01:52,430 --> 00:01:53,974
+Then, it turns out
+这样 结果表明
+
+53
+00:01:53,980 --> 00:01:56,735
+that the matrix A times
+矩阵A乘以B
+
+54
+00:01:56,735 --> 00:01:59,042
+B right, is going
+会得到
+
+55
+00:01:59,042 --> 00:02:01,258
+to be an m by
+一个m×m
+
+56
+00:02:01,280 --> 00:02:03,792
+m matrix, where as
+的矩阵 当情况是
+
+57
+00:02:03,792 --> 00:02:06,410
+the matrix b x a
+矩阵 BXA
+
+58
+00:02:06,460 --> 00:02:08,390
+is going to be an n
+的时候会得到一个n×n的矩阵
+
+59
+00:02:08,450 --> 00:02:09,928
+by n matrix so the
+所以
+
+60
+00:02:09,928 --> 00:02:11,406
+dimensions don't even match, right,
+结果中即使是矩阵的维度都不相同 好吧
+
+61
+00:02:11,410 --> 00:02:13,283
+so A times B and
+所以矩阵A乘以B和
+
+62
+00:02:13,290 --> 00:02:16,647
+B times A may not even be the same dimension.
+B乘以A可能得到不相同的维度
+
+63
+00:02:16,647 --> 00:02:17,762
+In the example on the left,
+在左边的例子中
+
+64
+00:02:17,762 --> 00:02:19,265
+I have all two by two matrices,
+我有使用的都是2×2的矩阵
+
+65
+00:02:19,265 --> 00:02:20,342
+so the dimensions were the same,
+这样的维度是相同的
+
+66
+00:02:20,342 --> 00:02:22,688
+but in general reversing the
+但是通常情况
+
+67
+00:02:22,688 --> 00:02:25,285
+order of the matrices
+交换矩阵顺序
+
+68
+00:02:25,320 --> 00:02:27,301
+can even change the dimension
+会改变结果维度
+
+69
+00:02:27,301 --> 00:02:30,030
+of the outcome so
+因此
+
+70
+00:02:30,030 --> 00:02:33,291
+matrix multiplication is not commutative.
+矩阵乘法是不服从交换律的
+
+71
+00:02:34,310 --> 00:02:36,302
+Here's the next I want to talk about.
+这是我接下来要讲的
+
+72
+00:02:36,302 --> 00:02:37,663
+So, when talking about real
+所以当谈到实数
+
+73
+00:02:37,680 --> 00:02:39,731
+numbers, or scalars, let's
+或标量的时候 请看
+
+74
+00:02:39,731 --> 00:02:42,859
+see, I have 3 times 5 times 2.
+我用3×5×2.
+
+75
+00:02:42,860 --> 00:02:45,848
+I can either multiply 5
+我也可以先用5
+
+76
+00:02:45,848 --> 00:02:47,625
+times 2 first, and
+乘以2
+
+77
+00:02:47,625 --> 00:02:50,394
+I can compute this as 3 times 10.
+然后用3乘以10
+
+78
+00:02:50,430 --> 00:02:52,936
+Or, I can multiply
+再或者 我可以将
+
+79
+00:02:52,936 --> 00:02:54,635
+three times five for us and
+3乘以5
+
+80
+00:02:54,635 --> 00:02:55,804
+I can compute this as, you
+我可以这样计算
+
+81
+00:02:55,804 --> 00:02:58,029
+know fifteen times two and
+再用15乘以2
+
+82
+00:02:58,029 --> 00:02:59,885
+both of these give you the same answer, right?
+这两种算法得到了相同的答案 对吧?
+
+83
+00:02:59,885 --> 00:03:01,007
+Each, both of these is equal
+这两个方法 都等于30
+
+84
+00:03:01,060 --> 00:03:03,895
+to thirty so Whether I
+所以 无论
+
+85
+00:03:03,910 --> 00:03:06,433
+multiply five times
+先用5乘以2
+
+86
+00:03:06,433 --> 00:03:08,185
+two first or whether I
+或者
+
+87
+00:03:08,185 --> 00:03:09,746
+multiply three times five
+先用3乘以5
+
+88
+00:03:09,746 --> 00:03:12,663
+first because well, three
+都是因为
+
+89
+00:03:12,663 --> 00:03:14,670
+times five times two
+3乘以5再乘以2
+
+90
+00:03:14,670 --> 00:03:16,389
+is equal to three times
+都等于3乘以
+
+91
+00:03:16,389 --> 00:03:18,894
+five times two.
+5再乘以2
+
+92
+00:03:18,894 --> 00:03:20,445
+And this is called the
+这就是 所谓的
+
+93
+00:03:20,445 --> 00:03:27,022
+associative property of role number multiplication.
+数乘的结合律
+
+94
+00:03:27,022 --> 00:03:30,695
+It turns out that matrix multiplication is associative.
+事实证明 矩阵乘法也符合结合律
+
+95
+00:03:30,695 --> 00:03:32,335
+So concretely, let's say
+因此 具体来说
+
+96
+00:03:32,335 --> 00:03:33,452
+I have a product of three
+做三个矩阵的乘积
+
+97
+00:03:33,452 --> 00:03:34,762
+matrices, A times B times
+矩阵A乘以矩阵B再乘以
+
+98
+00:03:34,762 --> 00:03:36,189
+C. Then I can
+矩阵C.然后我就可以
+
+99
+00:03:36,189 --> 00:03:37,818
+compute this either as A
+A乘以
+
+100
+00:03:37,840 --> 00:03:41,412
+times, B times C
+B×C
+
+101
+00:03:41,460 --> 00:03:42,838
+or I can compute this as
+或者可以计算
+
+102
+00:03:42,838 --> 00:03:45,310
+A times B, times C
+A×B 再乘以C
+
+103
+00:03:45,710 --> 00:03:48,125
+and these will actually give me the same answer.
+这将给出相同的答案
+
+104
+00:03:48,125 --> 00:03:49,310
+I'm not going to prove this, but
+我不做这个运算的证明 但是
+
+105
+00:03:49,310 --> 00:03:51,556
+you can just take my word for it, I guess.
+我想你可以把我的话记下来
+
+106
+00:03:51,556 --> 00:03:52,692
+So just be clear what I mean by
+所以只要通过这两种情况明白我的意思
+
+107
+00:03:52,692 --> 00:03:54,340
+these two cases, let's look
+就行了 让我们来看看
+
+108
+00:03:54,340 --> 00:03:56,263
+at first one first case.
+第一种情况
+
+109
+00:03:56,270 --> 00:03:57,345
+What I mean by that is
+我是说
+
+110
+00:03:57,345 --> 00:03:58,405
+if you actually want to compute
+如果你要计算
+
+111
+00:03:58,405 --> 00:03:59,925
+A times B times C, what
+A×B×C
+
+112
+00:03:59,925 --> 00:04:01,410
+you can do is you can
+你可以通过
+
+113
+00:04:01,410 --> 00:04:03,078
+first compute B times C.
+先计算B矩阵和C矩阵的积
+
+114
+00:04:03,100 --> 00:04:04,423
+So that D equals B time
+这样D等于B×C
+
+115
+00:04:04,423 --> 00:04:05,848
+C, then compute A times
+这时候计算A×D
+
+116
+00:04:05,848 --> 00:04:07,178
+D. And so this is really
+事实上这就是
+
+117
+00:04:07,200 --> 00:04:09,605
+computing a times B
+A×B×C
+
+118
+00:04:09,605 --> 00:04:12,406
+times C. Or, for
+或者是
+
+119
+00:04:12,440 --> 00:04:14,895
+this second case, You can
+第二种情况下
+
+120
+00:04:14,895 --> 00:04:16,065
+compute this as, you can
+你可以这样计算
+
+121
+00:04:16,112 --> 00:04:17,673
+set E equals A
+设定E等于A×B
+
+122
+00:04:17,680 --> 00:04:19,142
+times B. Then compute E
+然后计算
+
+123
+00:04:19,142 --> 00:04:20,750
+times C. And this
+ExC
+
+124
+00:04:20,750 --> 00:04:22,912
+is then the same as a
+这就等同于
+
+125
+00:04:22,920 --> 00:04:25,526
+times B times C
+A×B×C
+
+126
+00:04:25,530 --> 00:04:27,322
+and it turns out that both
+结果表明
+
+127
+00:04:27,322 --> 00:04:30,115
+of these options will give
+这两种选择
+
+128
+00:04:30,115 --> 00:04:33,702
+you, is guaranteed to give you the same answer.
+都能保证给出相同的答案
+
+129
+00:04:33,702 --> 00:04:35,115
+And so we say that matrix
+所以说矩阵乘法
+
+130
+00:04:35,115 --> 00:04:39,692
+multiplication does enjoy the associative property.
+能够服从结合律
+
+131
+00:04:39,722 --> 00:04:40,592
+Okay?
+好吧?
+
+132
+00:04:40,592 --> 00:04:42,752
+And don't worry about the terminology
+同时不用担心术语
+
+133
+00:04:42,752 --> 00:04:44,609
+associative and commutative that's
+结合律和交换律
+
+134
+00:04:44,625 --> 00:04:46,083
+why there's not really going to use
+因为在这个课程后面的讲解中
+
+135
+00:04:46,083 --> 00:04:47,586
+this terminology later in these
+不会使用这些术语
+
+136
+00:04:47,586 --> 00:04:50,608
+class, so don't worry about memorizing those terms.
+不必担心这些条条框框
+
+137
+00:04:50,608 --> 00:04:52,841
+Finally, I want to
+最后 我想告诉大家的
+
+138
+00:04:52,841 --> 00:04:54,552
+tell you about the identity
+是单位矩阵
+
+139
+00:04:54,552 --> 00:04:56,676
+matrix, which is special matrix.
+一种特殊的矩阵
+
+140
+00:04:56,676 --> 00:04:58,202
+So let's again make the
+因此 我们再次类比
+
+141
+00:04:58,210 --> 00:04:59,296
+analogy to what we know
+之前用到的实数的情况
+
+142
+00:04:59,296 --> 00:05:01,342
+of raw numbers, so when dealing
+当处理
+
+143
+00:05:01,342 --> 00:05:02,842
+with raw numbers or scalar
+纯数字或标量的时候
+
+144
+00:05:02,842 --> 00:05:04,562
+numbers, the number one,
+数值1时
+
+145
+00:05:04,562 --> 00:05:06,131
+is you can think
+你可以认为
+
+146
+00:05:06,131 --> 00:05:09,756
+of it as the identity of multiplication,
+它是一个乘法单位
+
+147
+00:05:09,810 --> 00:05:10,853
+and what I mean by that
+我用1的意思是
+
+148
+00:05:10,853 --> 00:05:12,885
+is for any number
+对于任何实数Z
+
+149
+00:05:12,885 --> 00:05:14,942
+Z, the number 1
+数字1
+
+150
+00:05:14,950 --> 00:05:16,803
+times z is equal
+乘以Z
+
+151
+00:05:16,810 --> 00:05:19,754
+to z times one, and
+等于Z乘以1
+
+152
+00:05:19,754 --> 00:05:21,550
+that's just equal to
+它们的结果
+
+153
+00:05:21,550 --> 00:05:24,548
+the number z, right, for any raw number.
+对于任意实数都是右面的Z
+
+154
+00:05:24,548 --> 00:05:26,128
+Z. So 1 is
+因此1是
+
+155
+00:05:26,128 --> 00:05:29,891
+the identity operation and so it satisfies this equation.
+一个单位操作 而且它满足这个等式
+
+156
+00:05:29,900 --> 00:05:31,755
+So it turns out that
+事实就是
+
+157
+00:05:31,755 --> 00:05:33,297
+in the space of matrices as
+在矩阵空间
+
+158
+00:05:33,297 --> 00:05:35,453
+an identity matrix as well.
+它就是一个单位矩阵
+
+159
+00:05:35,453 --> 00:05:38,375
+And it's unusually denoted i,
+通常它记作I
+
+160
+00:05:38,380 --> 00:05:39,573
+or sometimes we write it
+有时我们把它记作
+
+161
+00:05:39,573 --> 00:05:40,945
+as i of n by
+n×n的矩阵i
+
+162
+00:05:40,970 --> 00:05:43,079
+n we want to make explicit the dimensions.
+以明确标出矩阵的维度
+
+163
+00:05:43,079 --> 00:05:44,355
+So I subscript n by n
+所以下标写作n×n的时候
+
+164
+00:05:44,355 --> 00:05:47,391
+is the n by n identity matrix.
+是指n行n列的单位矩阵
+
+165
+00:05:47,391 --> 00:05:49,339
+And so there's a different identity
+这是不同单位矩阵
+
+166
+00:05:49,339 --> 00:05:53,375
+matrix for each dimension n and are a few examples.
+在不同维度n中是不一样的 这里有一些例子
+
+167
+00:05:53,410 --> 00:05:54,912
+Here's the two by two identity
+这是一个2×2的单位矩阵
+
+168
+00:05:54,912 --> 00:05:56,447
+matrix, here's the three
+这个是3×3的
+
+169
+00:05:56,447 --> 00:05:59,882
+by three identity matrix, here's the four by four identity matrix.
+这是4×4的
+
+170
+00:05:59,882 --> 00:06:01,858
+So the identity matrix, has the
+所以单位矩阵
+
+171
+00:06:01,858 --> 00:06:03,602
+property that it has
+的特性就是
+
+172
+00:06:03,602 --> 00:06:06,348
+ones along the diagonals,
+沿对角线都是1
+
+173
+00:06:07,620 --> 00:06:10,325
+right, and so on and
+等等
+
+174
+00:06:10,325 --> 00:06:12,915
+is zero everywhere else, and
+其他位置都是0
+
+175
+00:06:12,915 --> 00:06:14,012
+so, by the way the
+所以顺便提一下
+
+176
+00:06:14,012 --> 00:06:17,425
+one by one identity matrix is just a number one.
+1×1单位矩阵就是个数字1
+
+177
+00:06:17,425 --> 00:06:18,740
+This is one by one matrix
+这是一个1×1矩阵
+
+178
+00:06:18,740 --> 00:06:20,090
+just and it's not a very
+它不是非常有意思的单位矩阵
+
+179
+00:06:20,090 --> 00:06:23,242
+interesting identity matrix and informally
+它比较非正式
+
+180
+00:06:23,285 --> 00:06:24,593
+when I or others are being
+我和多数人对这个矩阵都感到比较模糊
+
+181
+00:06:24,610 --> 00:06:26,438
+sloppy, very often, we will
+更多情况下
+
+182
+00:06:26,438 --> 00:06:28,878
+write the identity matrix using fine notation.
+我们使用更明确的符号表示
+
+183
+00:06:28,880 --> 00:06:30,574
+I draw, you know, let's
+我划出来
+
+184
+00:06:30,574 --> 00:06:31,675
+go back to it and just write 1111,
+回到这里 写1111
+
+185
+00:06:31,675 --> 00:06:33,565
+dot, dot, dot, 1
+点 点 点 1
+
+186
+00:06:33,565 --> 00:06:34,928
+and then we'll, maybe, somewhat
+这样也许一定程度上
+
+187
+00:06:34,940 --> 00:06:37,650
+sloppily write a bunch of zeros there.
+拖泥带水写了一堆零在这里
+
+188
+00:06:37,660 --> 00:06:40,750
+And these zeros, this
+这些零
+
+189
+00:06:40,750 --> 00:06:42,474
+big zero, this big zero
+大零 大零
+
+190
+00:06:42,474 --> 00:06:44,262
+that's meant to denote that this
+这意思是表示
+
+191
+00:06:44,262 --> 00:06:46,174
+matrix is zero everywhere except for
+这个矩阵到处是零
+
+192
+00:06:46,174 --> 00:06:47,367
+the diagonals, so this is just
+除了对角线上
+
+193
+00:06:47,367 --> 00:06:49,680
+how I might sloppily write
+这就是比较拖沓的写法
+
+194
+00:06:49,680 --> 00:06:52,251
+this identity matrix
+写这个单位矩阵
+
+195
+00:06:52,251 --> 00:06:55,138
+She says property that for
+这说明一个特性
+
+196
+00:06:55,138 --> 00:06:57,493
+any matrix A, A times
+对于任何矩阵A A乘以
+
+197
+00:06:57,493 --> 00:06:59,635
+identity i times A
+单位矩阵I
+
+198
+00:06:59,660 --> 00:07:00,892
+A. So that's a lot
+乘以A
+
+199
+00:07:00,892 --> 00:07:04,782
+like this equation that we have up here.
+这就跟举得这个等式非常相像
+
+200
+00:07:04,782 --> 00:07:06,502
+One times z equals z times
+1×Z等于Z×1
+
+201
+00:07:06,502 --> 00:07:08,427
+one, equals z itself so
+结果是Z本身
+
+202
+00:07:08,430 --> 00:07:09,972
+I times A equals A
+I×A等于
+
+203
+00:07:09,972 --> 00:07:12,566
+times I equals A. Just
+A×I
+
+204
+00:07:12,570 --> 00:07:14,095
+make sure we have the dimensions right, so
+只要确保它们有相同的维度
+
+205
+00:07:14,095 --> 00:07:15,721
+if A is a n
+所以如果A是一个n×n
+
+206
+00:07:15,721 --> 00:07:18,065
+by n matrix, then this
+的矩阵
+
+207
+00:07:18,080 --> 00:07:19,952
+identity matrix that's an
+单位矩阵就应该是
+
+208
+00:07:19,952 --> 00:07:22,797
+m by n identity matrix.
+n×n维度
+
+209
+00:07:23,260 --> 00:07:24,573
+And if A is m by
+如果A是m×n矩阵
+
+210
+00:07:24,573 --> 00:07:26,595
+n then this identity
+那么单位矩阵
+
+211
+00:07:26,595 --> 00:07:28,766
+matrix, right, for matrix
+使矩阵乘法
+
+212
+00:07:28,766 --> 00:07:30,270
+multiplication make sense that has a
+有意义 就需要一个
+
+213
+00:07:30,290 --> 00:07:33,008
+m by n matrix because
+m×n矩阵
+
+214
+00:07:33,008 --> 00:07:34,305
+this m has a match
+因为使这个m匹配
+
+215
+00:07:34,305 --> 00:07:36,948
+up that m And
+上一个m
+
+216
+00:07:36,948 --> 00:07:38,619
+in either case the outcome
+在任何情况下
+
+217
+00:07:38,619 --> 00:07:40,042
+of this process is you
+这个过程的结果是
+
+218
+00:07:40,042 --> 00:07:42,025
+get back to Matrix A, which
+你的到的结果A
+
+219
+00:07:42,030 --> 00:07:44,501
+is m by n.
+维度是m×n
+
+220
+00:07:44,530 --> 00:07:46,068
+So whenever we write
+任何时候我们写
+
+221
+00:07:46,068 --> 00:07:47,728
+the identity matrix I, you
+单位矩阵I
+
+222
+00:07:47,728 --> 00:07:50,798
+know, very often the dimension rightwill
+它所暗含的维度
+
+223
+00:07:50,810 --> 00:07:52,473
+be implicit from the context.
+是从上下文中得到的
+
+224
+00:07:52,473 --> 00:07:53,665
+So these two I's they' re
+对于这两个矩阵
+
+225
+00:07:53,665 --> 00:07:55,645
+actually different dimension matrices, one
+它们有不一样的维度
+
+226
+00:07:55,645 --> 00:07:56,789
+may be N by N, the other
+一个是n×n
+
+227
+00:07:56,789 --> 00:07:58,985
+is M by M But when
+另一个是m×m
+
+228
+00:07:58,985 --> 00:08:00,505
+we want to make the dimension
+我们想要
+
+229
+00:08:00,510 --> 00:08:02,831
+of the matrix explicit, then sometimes
+明确矩阵维度 有时候
+
+230
+00:08:02,840 --> 00:08:04,468
+we'll write to this I subscript
+我们需要在下标中写出
+
+231
+00:08:04,480 --> 00:08:06,470
+N by N, kind of like we have up here.
+n×n 跟前面的一样
+
+232
+00:08:06,470 --> 00:08:09,854
+But very often the dimension will be implicit.
+但很多时候维度是隐含的
+
+233
+00:08:10,040 --> 00:08:11,513
+Finally, just want to point
+最后要指出的是
+
+234
+00:08:11,513 --> 00:08:14,606
+out that earlier I
+之前我提到
+
+235
+00:08:14,606 --> 00:08:16,458
+said that A times B
+A乘以B
+
+236
+00:08:16,458 --> 00:08:19,069
+is not in general equal
+通常并不等于
+
+237
+00:08:19,069 --> 00:08:22,595
+to B times A, right?
+B乘以A 对不对?
+
+238
+00:08:22,595 --> 00:08:25,687
+That for most matrices A and B, this is not true.
+对于大多数矩阵A和B 这是不成立的
+
+239
+00:08:25,690 --> 00:08:29,558
+But when B is the identity matrix, this does hold true.
+但是B是单位矩阵的时候这不成立
+
+240
+00:08:29,580 --> 00:08:30,840
+That A times the identity
+A乘以单位矩阵
+
+241
+00:08:30,870 --> 00:08:33,390
+matrix does indeed equal to
+等于
+
+242
+00:08:33,390 --> 00:08:34,523
+identity times A, it's
+单位矩阵乘以A
+
+243
+00:08:34,523 --> 00:08:35,858
+just that this is not true
+这对于其他矩阵
+
+244
+00:08:35,858 --> 00:08:39,124
+for other matrices, B in general.
+B 一般是不成立的
+
+245
+00:08:39,900 --> 00:08:41,645
+So that's it for the
+这是
+
+246
+00:08:41,645 --> 00:08:43,994
+properties of matrix multiplication.
+基于矩阵乘法的特性
+
+247
+00:08:43,994 --> 00:08:45,416
+And the special matrices, like the
+特殊矩阵 诸如
+
+248
+00:08:45,416 --> 00:08:46,618
+identity matrix I want to
+单位矩阵 等
+
+249
+00:08:46,618 --> 00:08:48,505
+tell you about, in the next
+我会在接下来线性代数综述
+
+250
+00:08:48,505 --> 00:08:51,690
+and final video now linear algebra review.
+的视频中讲到
+
+251
+00:08:51,690 --> 00:08:53,337
+I am going to quickly tell you
+我会尽快讲解
+
+252
+00:08:53,350 --> 00:08:55,895
+about a couple of special
+一些特殊
+
+253
+00:08:55,895 --> 00:08:58,190
+matrix operations, and after
+矩阵运算
+
+254
+00:08:58,190 --> 00:08:59,328
+that you know everything you need
+之后你将了解到
+
+255
+00:08:59,328 --> 00:09:01,830
+to know about linear algebra for this course
+这门课程中你需要具备的解线性代数知识
+
diff --git a/srt/3 - 6 - Inverse and Transpose (11 min).srt b/srt/3 - 6 - Inverse and Transpose (11 min).srt
new file mode 100644
index 00000000..fdb60ef2
--- /dev/null
+++ b/srt/3 - 6 - Inverse and Transpose (11 min).srt
@@ -0,0 +1,1601 @@
+1
+00:00:00,310 --> 00:00:01,540
+In this video, I want to
+在这一讲中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,590 --> 00:00:02,885
+tell you about a couple of special
+我将介绍一些特殊的矩阵运算
+
+3
+00:00:02,885 --> 00:00:04,848
+matrix operations, called the
+也就是矩阵的逆运算
+
+4
+00:00:04,848 --> 00:00:07,430
+matrix inverse and the matrix transpose operation.
+以及矩阵的转置运算
+
+5
+00:00:08,740 --> 00:00:10,312
+Let's start by talking about matrix
+首先我们从逆矩阵开始
+
+6
+00:00:10,312 --> 00:00:12,928
+inverse, and as
+同往常一样
+
+7
+00:00:12,940 --> 00:00:14,516
+usual we'll start by thinking about
+我们依然先思考一下
+
+8
+00:00:14,516 --> 00:00:17,248
+how it relates to real numbers.
+矩阵运算和实数运算的关系
+
+9
+00:00:17,280 --> 00:00:18,803
+In the last video, I said
+在上一段视频中
+
+10
+00:00:18,803 --> 00:00:20,625
+that the number one plays the
+我讲过 在实数空间中
+
+11
+00:00:20,625 --> 00:00:24,570
+role of the identity in
+数字1扮演了单位矩阵的角色
+
+12
+00:00:24,590 --> 00:00:26,059
+the space of real numbers because
+因为1和任何数相乘
+
+13
+00:00:26,070 --> 00:00:28,851
+one times anything is equal to itself.
+其结果都是那个数本身
+
+14
+00:00:28,860 --> 00:00:30,270
+It turns out that real numbers
+我们都知道
+
+15
+00:00:30,270 --> 00:00:31,642
+have this property that very
+实数有这样的一个性质
+
+16
+00:00:31,642 --> 00:00:33,093
+number have an, that
+那就是每一个实数
+
+17
+00:00:33,120 --> 00:00:34,635
+each number has an inverse,
+都有一个倒数
+
+18
+00:00:34,635 --> 00:00:36,637
+for example, given the number
+举个例子
+
+19
+00:00:36,660 --> 00:00:38,552
+three, there exists some
+对于数字3
+
+20
+00:00:38,552 --> 00:00:40,132
+number, which happens to
+一定存在某个数
+
+21
+00:00:40,132 --> 00:00:41,544
+be three inverse so that
+是3的倒数
+
+22
+00:00:41,544 --> 00:00:43,798
+that number times gives you
+这个倒数乘以3
+
+23
+00:00:43,798 --> 00:00:46,458
+back the identity element one.
+其乘积将得到单位元1
+
+24
+00:00:46,480 --> 00:00:50,727
+And so to me, inverse of course this is just one third.
+当然 这里的逆也就是三分之一
+
+25
+00:00:50,727 --> 00:00:53,236
+And given some other number,
+而对于另一个数
+
+26
+00:00:53,236 --> 00:00:55,360
+maybe twelve there is
+比如说12
+
+27
+00:00:55,360 --> 00:00:57,312
+some number which is the
+那么一定有某个数
+
+28
+00:00:57,340 --> 00:00:59,464
+inverse of twelve written as
+是12的倒数
+
+29
+00:00:59,464 --> 00:01:01,600
+twelve to the minus one, or
+也可以写作12的-1次方
+
+30
+00:01:01,600 --> 00:01:03,582
+really this is just one twelve.
+其实也就是1/12
+
+31
+00:01:03,582 --> 00:01:07,092
+So that when you multiply these two things together.
+因此当你将这两个数相乘的时候
+
+32
+00:01:07,092 --> 00:01:09,292
+the product is equal to
+其结果依然是等于1
+
+33
+00:01:09,292 --> 00:01:12,367
+the identity element one again.
+或者可以称为单位元1
+
+34
+00:01:12,370 --> 00:01:13,838
+Now it turns out that in
+然而事实上 在实数空间内
+
+35
+00:01:13,860 --> 00:01:17,154
+the space of real numbers, not everything has an inverse.
+并非所有实数都有倒数
+
+36
+00:01:17,154 --> 00:01:19,148
+For example the number zero
+比如说数字0就没有倒数
+
+37
+00:01:19,160 --> 00:01:20,981
+does not have an inverse, right?
+是吧?
+
+38
+00:01:20,981 --> 00:01:25,410
+Because zero's a zero inverse, one over zero that's undefined.
+因为0的倒数 也就是1/0 是没有意义的
+
+39
+00:01:25,460 --> 00:01:29,862
+Like this one over zero is not well defined.
+像这里 1除以0是未被定义的
+
+40
+00:01:30,100 --> 00:01:31,419
+And what we want to
+在这一节剩下的内容中
+
+41
+00:01:31,450 --> 00:01:32,453
+do, in the rest of this
+我们将要解决的一个问题
+
+42
+00:01:32,453 --> 00:01:33,835
+slide, is figure out what does
+是求解一个矩阵的逆
+
+43
+00:01:33,835 --> 00:01:38,341
+it mean to compute the inverse of a matrix.
+是什么意思
+
+44
+00:01:39,253 --> 00:01:41,718
+Here's the idea: If
+概念如下
+
+45
+00:01:41,750 --> 00:01:43,198
+A is a n by
+如果A是一个矩阵
+
+46
+00:01:43,200 --> 00:01:45,078
+n matrix, and it
+其维度为m行m列
+
+47
+00:01:45,078 --> 00:01:46,320
+has an inverse, I will say
+那么A矩阵有其逆矩阵
+
+48
+00:01:46,350 --> 00:01:48,487
+a bit more about that later, then
+后面我还会详细介绍这一点
+
+49
+00:01:48,487 --> 00:01:49,927
+the inverse is going to
+那么这个逆矩阵
+
+50
+00:01:49,927 --> 00:01:51,668
+be written A to the
+可以写成
+
+51
+00:01:51,668 --> 00:01:54,186
+minus one and A
+矩阵A的-1次方
+
+52
+00:01:54,186 --> 00:01:55,798
+times this inverse, A to
+同时 矩阵A乘以它的逆矩阵
+
+53
+00:01:55,798 --> 00:01:57,045
+the minus one, is going to
+A的-1次方
+
+54
+00:01:57,050 --> 00:01:59,395
+equal to A inverse times
+也等于A的-1次方乘以A
+
+55
+00:01:59,395 --> 00:02:00,741
+A, is going to
+其结果将等于
+
+56
+00:02:00,741 --> 00:02:04,088
+give us back the identity matrix.
+单位矩阵
+
+57
+00:02:04,088 --> 00:02:04,958
+Okay?
+对吧?
+
+58
+00:02:04,960 --> 00:02:07,037
+Only matrices that are
+只有维度是m行m列的矩阵
+
+59
+00:02:07,060 --> 00:02:09,848
+m by m for some the idea of M having inverse.
+才有其逆矩阵
+
+60
+00:02:09,870 --> 00:02:11,692
+So, a matrix is
+因此
+
+61
+00:02:11,692 --> 00:02:13,010
+M by M, this is also
+如果一个矩阵的维度是m行m列
+
+62
+00:02:13,040 --> 00:02:16,055
+called a square matrix and
+那么这个矩阵也可以称之为方阵
+
+63
+00:02:16,055 --> 00:02:18,222
+it's called square because
+称其为方阵是因为
+
+64
+00:02:18,222 --> 00:02:24,852
+the number of rows is equal to the number of columns.
+这类矩阵的行数和列数相等
+
+65
+00:02:24,852 --> 00:02:26,516
+Right and it turns out
+好的
+
+66
+00:02:26,530 --> 00:02:29,518
+only square matrices have inverses,
+实际上只有方阵才有逆矩阵
+
+67
+00:02:29,520 --> 00:02:31,148
+so A is a square
+所以 如果A是一个方阵
+
+68
+00:02:31,148 --> 00:02:32,973
+matrix, is m by m,
+其维度是m行m列
+
+69
+00:02:33,020 --> 00:02:37,198
+on inverse this equation over here.
+那么它将满足这个等式
+
+70
+00:02:37,340 --> 00:02:39,568
+Let's look at a concrete example,
+接下来我们看一个具体的例子
+
+71
+00:02:39,568 --> 00:02:41,530
+so let's say I
+假设说
+
+72
+00:02:41,580 --> 00:02:45,090
+have a matrix, three, four,
+我们有这样一个矩阵 3 4
+
+73
+00:02:45,120 --> 00:02:48,080
+two, sixteen.
+2 16
+
+74
+00:02:48,080 --> 00:02:49,535
+So this is a two by
+这是一个2行2列的矩阵
+
+75
+00:02:49,535 --> 00:02:51,788
+two matrix, so it's
+因此这个矩阵
+
+76
+00:02:51,810 --> 00:02:53,159
+a square matrix and so this
+是一个方阵
+
+77
+00:02:53,160 --> 00:02:55,442
+may just could have an and
+因此这个矩阵可以有它的逆矩阵
+
+78
+00:02:55,480 --> 00:02:57,733
+it turns out that I
+假如说
+
+79
+00:02:57,750 --> 00:02:59,308
+happen to know the inverse
+这个矩阵的逆矩阵是
+
+80
+00:02:59,310 --> 00:03:00,810
+of this matrix is zero point
+0.4
+
+81
+00:03:00,840 --> 00:03:02,675
+four, minus zero point
+-0.1
+
+82
+00:03:02,675 --> 00:03:04,485
+one, minus zero point
+-0.05
+
+83
+00:03:04,520 --> 00:03:08,687
+zero five, zero zero seven five.
+0.075
+
+84
+00:03:08,750 --> 00:03:10,267
+And if I take this matrix
+那么如果我用这个逆矩阵
+
+85
+00:03:10,267 --> 00:03:12,273
+and multiply these together it
+和原来的矩阵相乘
+
+86
+00:03:12,273 --> 00:03:13,598
+turns out what I get
+那么
+
+87
+00:03:13,620 --> 00:03:15,595
+is the two by
+我们将得到的结果
+
+88
+00:03:15,595 --> 00:03:18,324
+two identity matrix, I,
+是一个2行2列的单位矩阵I
+
+89
+00:03:18,350 --> 00:03:20,542
+this is I two by two.
+这就是矩阵I 维度是2行2列
+
+90
+00:03:20,558 --> 00:03:21,365
+Okay?
+没问题吧?
+
+91
+00:03:21,365 --> 00:03:22,308
+And so on this slide,
+在这张幻灯片上
+
+92
+00:03:22,308 --> 00:03:24,416
+you know this matrix is
+这个矩阵就是矩阵A
+
+93
+00:03:24,416 --> 00:03:27,199
+the matrix A, and this matrix is the matrix A-inverse.
+这个矩阵就是A的逆矩阵
+
+94
+00:03:27,199 --> 00:03:28,622
+And it turns out
+结果是
+
+95
+00:03:28,622 --> 00:03:29,835
+if that you are computing A
+如果你要计算A乘以A的逆矩阵
+
+96
+00:03:29,835 --> 00:03:31,385
+times A-inverse, it turns out
+或者说
+
+97
+00:03:31,410 --> 00:03:32,742
+if you compute A-inverse times
+A的逆矩阵乘以A
+
+98
+00:03:32,750 --> 00:03:36,821
+A you also get back the identity matrix.
+你将得到一个单位矩阵
+
+99
+00:03:36,852 --> 00:03:38,640
+So how did I
+那么
+
+100
+00:03:38,640 --> 00:03:39,760
+find this inverse or how
+怎样得到这个逆矩阵呢
+
+101
+00:03:39,920 --> 00:03:42,698
+did I come up with this inverse over here?
+或者说我是怎么知道这个逆矩阵的呢?
+
+102
+00:03:42,730 --> 00:03:45,048
+It turns out that sometimes
+实际上有时候你可以
+
+103
+00:03:45,060 --> 00:03:46,731
+you can compute inverses by hand
+自己用笔算出来
+
+104
+00:03:46,760 --> 00:03:48,745
+but almost no one does that these days.
+但可能现在没多少人这么求逆矩阵了
+
+105
+00:03:48,780 --> 00:03:49,888
+And it turns out there is
+实际上我们有很多很好的软件
+
+106
+00:03:49,888 --> 00:03:52,198
+very good numerical software for
+可以用来进行数学运算
+
+107
+00:03:52,240 --> 00:03:55,447
+taking a matrix and computing its inverse.
+能很容易地对矩阵进行求逆运算
+
+108
+00:03:55,447 --> 00:03:56,369
+So again, this is one of
+因此 同样地
+
+109
+00:03:56,369 --> 00:03:57,310
+those things where there are lots
+对于这个问题
+
+110
+00:03:57,310 --> 00:03:59,450
+of open source libraries that
+你可以在很多主流的编程环境中实现
+
+111
+00:03:59,450 --> 00:04:00,748
+you can link to from any
+这些环境一般都有很多开源库
+
+112
+00:04:00,748 --> 00:04:04,973
+of the popular programming languages to compute inverses of matrices.
+你可以直接运用来求解逆矩阵
+
+113
+00:04:04,990 --> 00:04:06,892
+Let me show you a quick example.
+这里我举一个简单的例子
+
+114
+00:04:06,900 --> 00:04:08,935
+How I actually computed this inverse,
+来说明一下怎样求逆矩阵
+
+115
+00:04:08,940 --> 00:04:13,132
+and what I did was I used software called Optive.
+我将使用一个叫Octave的软件
+
+116
+00:04:13,170 --> 00:04:14,437
+So let me bring that up.
+打开这个软件
+
+117
+00:04:14,437 --> 00:04:17,186
+We will see a lot about Optive later.
+之后我们还会更多地用到Octave
+
+118
+00:04:17,186 --> 00:04:18,903
+Let me just quickly show you an example.
+这里我只是很快地举一个例子
+
+119
+00:04:18,910 --> 00:04:21,078
+Set my matrix A to
+定义一个矩阵A
+
+120
+00:04:21,078 --> 00:04:22,274
+be equal to that matrix on
+对它赋值为左边幻灯片上的矩阵
+
+121
+00:04:22,274 --> 00:04:24,456
+the left, type three four
+键入3 4 2 16
+
+122
+00:04:24,456 --> 00:04:28,080
+two sixteen, so that's my matrix A right.
+这就是我的A矩阵了
+
+123
+00:04:28,080 --> 00:04:29,882
+This is matrix 34,
+这就是矩阵 3 4
+
+124
+00:04:29,882 --> 00:04:31,141
+216 that I have down
+2 16
+
+125
+00:04:31,160 --> 00:04:32,773
+here on the left.
+也就是左边那个矩阵
+
+126
+00:04:32,773 --> 00:04:34,543
+And, the software lets me compute
+使用这个软件
+
+127
+00:04:34,543 --> 00:04:36,243
+the inverse of A very easily.
+我能够很容易得到A的逆矩阵
+
+128
+00:04:36,250 --> 00:04:39,110
+It's like P over A equals this.
+就像这样直接键入pinv(A)
+
+129
+00:04:39,170 --> 00:04:40,765
+And so, this is right,
+这样
+
+130
+00:04:40,765 --> 00:04:41,935
+this matrix here on my
+我们得到了结果
+
+131
+00:04:41,935 --> 00:04:43,715
+four minus, on my one, and so on.
+0.4 -0.1 等等
+
+132
+00:04:43,715 --> 00:04:45,308
+This given the numerical
+这里得到的是
+
+133
+00:04:45,350 --> 00:04:46,794
+solution to what is the
+A的逆矩阵的近似解
+
+134
+00:04:46,794 --> 00:04:48,350
+inverse of A. So let me
+我可以这样写
+
+135
+00:04:48,350 --> 00:04:50,538
+just write, inverse of A
+定义一个变量inverseOFA
+
+136
+00:04:50,540 --> 00:04:52,558
+equals P inverse of
+其值等于 pinv(A)
+
+137
+00:04:52,580 --> 00:04:55,232
+A over that I
+那么inverseOFA的值就是这样
+
+138
+00:04:55,232 --> 00:04:57,200
+can now just verify that A
+现在我们可以证明一下
+
+139
+00:04:57,200 --> 00:04:58,597
+times A inverse the identity
+A乘以A的逆矩阵结果是单位矩阵
+
+140
+00:04:58,597 --> 00:05:00,644
+is, type A times the
+键入 A 乘以
+
+141
+00:05:00,644 --> 00:05:03,390
+inverse of A and
+A的逆矩阵
+
+142
+00:05:03,420 --> 00:05:04,740
+the result of that is
+这样得到的结果
+
+143
+00:05:04,750 --> 00:05:06,513
+this matrix and this is
+是这样的一个矩阵
+
+144
+00:05:06,513 --> 00:05:08,708
+one one on the diagonal
+主对角线是1和1
+
+145
+00:05:08,740 --> 00:05:10,453
+and essentially ten to
+副对角线不是1
+
+146
+00:05:10,453 --> 00:05:11,582
+the minus seventeen, ten to the
+但是基本上10的-17次方
+
+147
+00:05:11,582 --> 00:05:13,324
+minus sixteen, so Up to
+10的-16次方
+
+148
+00:05:13,324 --> 00:05:14,961
+numerical precision, up to
+这里由于计算精度的问题
+
+149
+00:05:14,961 --> 00:05:16,012
+a little bit of round off
+由于计算机在寻找最佳结果时
+
+150
+00:05:16,012 --> 00:05:17,562
+error that my computer
+由于进行了四舍五入圆整
+
+151
+00:05:17,562 --> 00:05:21,123
+had in finding optimal matrices
+所以产生了一点点误差
+
+152
+00:05:21,123 --> 00:05:22,623
+and these numbers off the
+所以这些副对角线上的数字
+
+153
+00:05:22,623 --> 00:05:24,948
+diagonals are essentially zero
+实际上也近似等于0
+
+154
+00:05:24,970 --> 00:05:29,078
+so A times the inverse is essentially the identity matrix.
+因此矩阵A和其逆矩阵相乘的结果就是单位矩阵
+
+155
+00:05:29,100 --> 00:05:30,907
+Can also verify the inverse of
+我们也可以证明
+
+156
+00:05:30,907 --> 00:05:33,215
+A times A is also
+A的逆矩阵乘以A
+
+157
+00:05:33,215 --> 00:05:35,795
+equal to the identity,
+其结果也是单位矩阵
+
+158
+00:05:35,795 --> 00:05:38,183
+ones on the diagonals and values
+主对角线上都是1
+
+159
+00:05:38,183 --> 00:05:39,938
+that are essentially zero except
+副对角线上的数有小数
+
+160
+00:05:39,938 --> 00:05:40,856
+for a little bit of round
+但四舍五入圆整一下
+
+161
+00:05:40,856 --> 00:05:44,752
+dot error on the off diagonals.
+实际上其值也等于0
+
+162
+00:05:45,780 --> 00:05:47,428
+If a definition that the inverse
+关于逆矩阵的定义
+
+163
+00:05:47,428 --> 00:05:48,636
+of a matrix is, I had
+我需要强调一点
+
+164
+00:05:48,636 --> 00:05:50,333
+this caveat first it must
+首先
+
+165
+00:05:50,333 --> 00:05:52,367
+always be a square matrix, it
+矩阵必须是方阵
+
+166
+00:05:52,410 --> 00:05:54,219
+had this caveat, that if
+注意这里
+
+167
+00:05:54,219 --> 00:05:57,237
+A has an inverse, exactly what
+如果A有其逆矩阵
+
+168
+00:05:57,237 --> 00:05:58,855
+matrices have an inverse
+究竟什么矩阵有其逆矩阵的问题
+
+169
+00:05:58,855 --> 00:06:00,176
+is beyond the scope of this
+已经超出了这节线性代数复习课
+
+170
+00:06:00,200 --> 00:06:02,056
+linear algebra for review that one
+所讨论的范畴
+
+171
+00:06:02,056 --> 00:06:03,942
+intuition you might take away
+但有一点
+
+172
+00:06:03,942 --> 00:06:05,245
+that just as the
+你能理解的是
+
+173
+00:06:05,260 --> 00:06:06,588
+number zero doesn't have an
+数字0没有倒数
+
+174
+00:06:06,588 --> 00:06:08,429
+inverse, it turns out
+因此
+
+175
+00:06:08,440 --> 00:06:10,188
+that if A is say the
+如果矩阵A中
+
+176
+00:06:10,188 --> 00:06:13,457
+matrix of all zeros, then
+所有元素都为0
+
+177
+00:06:13,457 --> 00:06:14,791
+this matrix A also does
+那么这个矩阵A
+
+178
+00:06:14,791 --> 00:06:16,432
+not have an inverse because there's
+依然没有逆矩阵
+
+179
+00:06:16,432 --> 00:06:18,033
+no matrix there's no A
+因为没有这样的A矩阵
+
+180
+00:06:18,040 --> 00:06:19,821
+inverse matrix so that this
+能使得这个矩阵
+
+181
+00:06:19,821 --> 00:06:21,174
+matrix times some other
+乘以另一个矩阵
+
+182
+00:06:21,174 --> 00:06:22,225
+matrix will give you the
+其值能得到单位矩阵
+
+183
+00:06:22,225 --> 00:06:23,778
+identity matrix so this matrix of
+所以
+
+184
+00:06:23,778 --> 00:06:25,322
+all zeros, and there
+所有元素都为0的矩阵
+
+185
+00:06:25,322 --> 00:06:27,660
+are a few other matrices with properties similar to this.
+以及一些其他类似这样的矩阵
+
+186
+00:06:27,660 --> 00:06:30,843
+That also don't have an inverse.
+它们都没有逆矩阵
+
+187
+00:06:30,843 --> 00:06:32,492
+But it turns out that
+但是实际上
+
+188
+00:06:32,492 --> 00:06:33,600
+in this review I don't
+在这节复习课中
+
+189
+00:06:33,600 --> 00:06:35,436
+want to go too deeply into what
+我不想太深入地介绍
+
+190
+00:06:35,436 --> 00:06:37,108
+it means matrix have an
+矩阵的逆矩阵有何意义等问题
+
+191
+00:06:37,108 --> 00:06:38,765
+inverse but it turns
+但是实际上
+
+192
+00:06:38,765 --> 00:06:40,006
+out for our machine learning
+对于我们机器学习的应用来讲
+
+193
+00:06:40,006 --> 00:06:41,807
+application this shouldn't be
+这一点不应成为问题
+
+194
+00:06:41,830 --> 00:06:44,260
+an issue or more precisely
+具体来讲
+
+195
+00:06:44,280 --> 00:06:46,389
+for the learning algorithms where
+对于某种机器学习算法
+
+196
+00:06:46,389 --> 00:06:47,943
+this may be an to namely
+可能会碰到这种问题的讨论
+
+197
+00:06:47,970 --> 00:06:49,252
+whether or not an inverse matrix
+是否存在逆矩阵这样的问题
+
+198
+00:06:49,252 --> 00:06:50,968
+appears and I will tell when
+在我们碰到这种学习算法时
+
+199
+00:06:50,968 --> 00:06:51,952
+we get to those learning algorithms
+我再告诉你
+
+200
+00:06:51,952 --> 00:06:53,690
+just what it means for an
+一种算法到底有没有逆矩阵
+
+201
+00:06:53,760 --> 00:06:54,850
+algorithm to have or not
+究竟应该怎样理解
+
+202
+00:06:55,150 --> 00:06:56,572
+have an inverse and how to fix it in case.
+以及应该怎样解决具体的问题
+
+203
+00:06:56,572 --> 00:06:59,200
+Working with matrices that don't
+比如怎样处理
+
+204
+00:06:59,200 --> 00:07:00,458
+have inverses.
+矩阵的逆矩阵不存在的问题
+
+205
+00:07:00,458 --> 00:07:02,680
+But the intuition if you
+不过你至少可以这样理解
+
+206
+00:07:02,711 --> 00:07:04,275
+want is that you can
+在某种意义上
+
+207
+00:07:04,275 --> 00:07:05,808
+think of matrices as not
+你可以把那些
+
+208
+00:07:05,808 --> 00:07:07,242
+have an inverse that is somehow
+没有逆矩阵的矩阵
+
+209
+00:07:07,242 --> 00:07:10,331
+too close to zero in some sense.
+想成是非常近似为0
+
+210
+00:07:10,353 --> 00:07:12,602
+So, just to wrap
+最后再讲一点
+
+211
+00:07:12,670 --> 00:07:14,900
+up the terminology, matrix that
+逆矩阵不存在的矩阵
+
+212
+00:07:14,900 --> 00:07:16,938
+don't have an inverse Sometimes called
+它的专有名词是
+
+213
+00:07:16,940 --> 00:07:18,835
+a singular matrix or degenerate
+奇异矩阵
+
+214
+00:07:18,835 --> 00:07:20,960
+matrix and so this
+或者叫退化矩阵
+
+215
+00:07:20,970 --> 00:07:22,560
+matrix over here is an
+因此这里这个矩阵
+
+216
+00:07:22,630 --> 00:07:24,701
+example zero zero zero matrix.
+这个零矩阵
+
+217
+00:07:24,701 --> 00:07:29,491
+is an example of a matrix that is singular, or a matrix that is degenerate.
+就是一个奇异矩阵 或者说退化矩阵的例子
+
+218
+00:07:29,537 --> 00:07:31,348
+Finally, the last special
+最后 还有一种矩阵运算
+
+219
+00:07:31,370 --> 00:07:32,652
+matrix operation I want to
+我想给大家介绍的
+
+220
+00:07:32,652 --> 00:07:34,520
+tell you about is to do matrix transpose.
+是矩阵的转置运算
+
+221
+00:07:34,530 --> 00:07:36,369
+So suppose I have
+假设我们有矩阵A
+
+222
+00:07:36,400 --> 00:07:38,160
+matrix A, if I compute
+那么A的转置矩阵
+
+223
+00:07:38,210 --> 00:07:41,220
+the transpose of A, that's what I get here on the right.
+可以写成右边这个矩阵
+
+224
+00:07:41,232 --> 00:07:43,156
+This is a transpose which is
+这就是矩阵A的转置矩阵
+
+225
+00:07:43,156 --> 00:07:46,275
+written and A superscript T,
+写法是A加上一个上标的T
+
+226
+00:07:46,278 --> 00:07:47,363
+and the way you compute
+计算转置矩阵的方法
+
+227
+00:07:47,410 --> 00:07:49,531
+the transpose of a matrix is as follows.
+如下所示
+
+228
+00:07:49,531 --> 00:07:50,628
+To get a transpose I am going
+要得到转置矩阵
+
+229
+00:07:50,628 --> 00:07:52,276
+to first take the first
+首先我取出A矩阵的第一行
+
+230
+00:07:52,300 --> 00:07:55,079
+row of A one to zero.
+1 2 0
+
+231
+00:07:55,080 --> 00:07:58,791
+That becomes this first column of this transpose.
+它们将成为转置矩阵的第一列
+
+232
+00:07:58,840 --> 00:07:59,762
+And then I'm going to take
+接下来
+
+233
+00:07:59,762 --> 00:08:01,050
+the second row of A,
+我再取出矩阵A的第二行
+
+234
+00:08:01,050 --> 00:08:04,610
+3 5 9, and that becomes the second column.
+3 5 9
+
+235
+00:08:04,610 --> 00:08:06,838
+of the matrix A transpose.
+它们将成为转置矩阵的第二列
+
+236
+00:08:06,850 --> 00:08:08,131
+And another way of
+你也可以这样想
+
+237
+00:08:08,131 --> 00:08:10,296
+thinking about how the computer transposes
+求解转置矩阵
+
+238
+00:08:10,296 --> 00:08:11,569
+is as if you're taking this
+实际上可以看作
+
+239
+00:08:11,570 --> 00:08:14,671
+sort of 45 degree axis
+画一条45度的斜线
+
+240
+00:08:14,671 --> 00:08:16,349
+and you are mirroring or you
+然后你以这条线求镜像
+
+241
+00:08:16,349 --> 00:08:21,698
+are flipping the matrix along that 45 degree axis.
+或者以这条45度线为轴进行翻转
+
+242
+00:08:21,698 --> 00:08:23,488
+so here's the more formal
+因此我们可以得到
+
+243
+00:08:23,500 --> 00:08:26,522
+definition of a matrix transpose.
+关于矩阵转置的正式定义
+
+244
+00:08:26,522 --> 00:08:30,688
+Let's say A is a m by n matrix.
+假设矩阵A是一个m行n列的矩阵
+
+245
+00:08:31,300 --> 00:08:32,727
+And let's let B equal A
+并且假设矩阵B等于A的转置
+
+246
+00:08:32,727 --> 00:08:36,371
+transpose and so BA transpose like so.
+就像这样
+
+247
+00:08:36,386 --> 00:08:37,563
+Then B is going to
+那么B将是一个
+
+248
+00:08:37,563 --> 00:08:39,637
+be a n by m matrix
+n行m列的矩阵
+
+249
+00:08:39,637 --> 00:08:42,752
+with the dimensions reversed so
+两个矩阵的维度是相反的
+
+250
+00:08:42,830 --> 00:08:46,308
+here we have a 2x3 matrix.
+这里我们的A矩阵是2行3列的
+
+251
+00:08:46,370 --> 00:08:48,050
+And so the transpose becomes a
+那么A的转置矩阵B
+
+252
+00:08:48,190 --> 00:08:51,196
+3x2 matrix, and moreover,
+将是一个3行2列的矩阵
+
+253
+00:08:51,210 --> 00:08:54,585
+the BIJ is equal to AJI.
+此外 Bij等于Aji
+
+254
+00:08:54,610 --> 00:08:56,030
+So the IJ element of this
+也就是说
+
+255
+00:08:56,220 --> 00:08:57,390
+matrix B is going to be
+矩阵B的ij元素
+
+256
+00:08:57,530 --> 00:08:59,913
+the JI element of that
+是等于A矩阵的ji元素
+
+257
+00:08:59,913 --> 00:09:02,310
+earlier matrix A. So for
+我们来举个例子
+
+258
+00:09:02,310 --> 00:09:04,212
+example, B 1 2
+B12这个元素
+
+259
+00:09:04,212 --> 00:09:06,997
+is going to be equal
+将等于
+
+260
+00:09:06,997 --> 00:09:08,756
+to, look at this
+请看这个矩阵
+
+261
+00:09:08,756 --> 00:09:10,537
+matrix, B 1 2 is going to be equal to
+B12元素将等于
+
+262
+00:09:10,537 --> 00:09:13,775
+this element 3 1st row, 2nd column.
+A矩阵中 第一行第二列的元素3
+
+263
+00:09:13,800 --> 00:09:16,008
+And that equal to this, which
+那么这个元素
+
+264
+00:09:16,010 --> 00:09:18,199
+is a two one, second
+也就是等于A21
+
+265
+00:09:18,220 --> 00:09:21,412
+row first column, right, which
+A矩阵中第二行第一列的元素
+
+266
+00:09:21,420 --> 00:09:23,421
+is equal to two and some
+也就是2
+
+267
+00:09:23,440 --> 00:09:25,860
+of the example B 3
+再举个例子
+
+268
+00:09:25,860 --> 00:09:28,561
+2, right, that's B
+B32这个元素
+
+269
+00:09:28,561 --> 00:09:30,922
+3 2 is this element 9,
+也就是9
+
+270
+00:09:30,930 --> 00:09:33,282
+and that's equal to
+是等于
+
+271
+00:09:33,282 --> 00:09:35,525
+a two three which is
+A23这个元素
+
+272
+00:09:35,525 --> 00:09:38,963
+this element up here, nine.
+也就是这个元素 也是等于9的
+
+273
+00:09:38,963 --> 00:09:40,433
+And so that wraps up
+好的
+
+274
+00:09:40,433 --> 00:09:41,893
+the definition of what it
+这样我们就介绍了
+
+275
+00:09:41,893 --> 00:09:43,468
+means to take the transpose
+怎样求一个矩阵的转置矩阵
+
+276
+00:09:43,510 --> 00:09:44,991
+of a matrix and that
+以及它的正式定义
+
+277
+00:09:44,991 --> 00:09:49,323
+in fact concludes our linear algebra review.
+这也是我们的线性代数复习课的最后一节
+
+278
+00:09:49,323 --> 00:09:50,754
+So by now hopefully you know
+现在你应该已经掌握
+
+279
+00:09:50,754 --> 00:09:52,205
+how to add and subtract
+如何进行矩阵的加法运算
+
+280
+00:09:52,205 --> 00:09:53,701
+matrices as well as
+以及乘法运算
+
+281
+00:09:53,701 --> 00:09:55,325
+multiply them and you
+同时
+
+282
+00:09:55,325 --> 00:09:57,185
+also know how, what are
+你也应该了解
+
+283
+00:09:57,185 --> 00:09:58,942
+the definitions of the inverses
+矩阵的逆运算
+
+284
+00:09:58,942 --> 00:10:01,457
+and transposes of a matrix
+和转置运算的定义
+
+285
+00:10:01,457 --> 00:10:02,934
+and these are the main operations
+这些都是这门课中
+
+286
+00:10:02,934 --> 00:10:05,152
+used in linear algebra
+需要用到的线性代数中
+
+287
+00:10:05,152 --> 00:10:06,172
+for this course.
+最基本的一些运算
+
+288
+00:10:06,172 --> 00:10:09,043
+In case this is the first time you are seeing this material.
+也许这是你第一次观看我们的视频
+
+289
+00:10:09,043 --> 00:10:10,798
+I know this was a lot
+我知道这里一下子讲了
+
+290
+00:10:10,798 --> 00:10:13,032
+of linear algebra material all presented
+很多线性代数的知识
+
+291
+00:10:13,032 --> 00:10:14,512
+very quickly and it's a
+讲的很多也讲的很快
+
+292
+00:10:14,520 --> 00:10:16,581
+lot to absorb but
+可能需要一点时间来消化
+
+293
+00:10:16,581 --> 00:10:18,102
+if you there's no need
+你没有必要记住
+
+294
+00:10:18,102 --> 00:10:20,045
+to memorize all the definitions
+我们刚刚讲过的
+
+295
+00:10:20,045 --> 00:10:21,718
+we just went through and if
+所有那些定义
+
+296
+00:10:21,718 --> 00:10:23,451
+you download the copy of either
+如果你从课程网站上
+
+297
+00:10:23,451 --> 00:10:24,520
+these slides or of the
+下载了这些幻灯片
+
+298
+00:10:24,540 --> 00:10:28,353
+lecture notes from the course website.
+或者讲义
+
+299
+00:10:28,370 --> 00:10:29,645
+and use either the slides or
+并且使用它们
+
+300
+00:10:29,645 --> 00:10:31,478
+the lecture notes as a reference
+作为一个参考内容
+
+301
+00:10:31,490 --> 00:10:32,886
+then you can always refer back
+那么你可以随时返回
+
+302
+00:10:32,900 --> 00:10:34,178
+to the definitions and to figure
+查看这些定义和概念
+
+303
+00:10:34,178 --> 00:10:35,615
+out what are these matrix
+随时理解这些矩阵的乘法 转置
+
+304
+00:10:35,615 --> 00:10:39,111
+multiplications, transposes and so on definitions.
+等等这些运算的概念
+
+305
+00:10:39,140 --> 00:10:40,697
+And the lecture notes on the course website also
+同时 课程官网的讲义中
+
+306
+00:10:40,697 --> 00:10:42,421
+has pointers to additional
+我们也提供了链接
+
+307
+00:10:42,450 --> 00:10:44,675
+resources linear algebra which
+也是一些其他的线性代数资料
+
+308
+00:10:44,675 --> 00:10:47,445
+you can use to learn more about linear algebra by yourself.
+你可以自己点击学习
+
+309
+00:10:48,861 --> 00:10:53,445
+And next with these new tools.
+接下来 运用这些所学的工具
+
+310
+00:10:53,540 --> 00:10:55,153
+We'll be able in the next
+在接下来的几段视频中
+
+311
+00:10:55,153 --> 00:10:57,035
+few videos to develop more powerful
+我们将介绍
+
+312
+00:10:57,035 --> 00:10:58,758
+forms of linear regression that
+非常重要的线性回归
+
+313
+00:10:58,758 --> 00:10:59,854
+can view of a lot
+我们会看到更多的数据
+
+314
+00:10:59,854 --> 00:11:00,809
+more data, a lot more
+更多的特征
+
+315
+00:11:00,809 --> 00:11:02,226
+features, a lot more training
+以及更多的训练样本
+
+316
+00:11:02,226 --> 00:11:04,367
+examples and later on
+再往后
+
+317
+00:11:04,400 --> 00:11:06,114
+after the new regression we'll actually
+在介绍线性回归之后
+
+318
+00:11:06,114 --> 00:11:07,832
+continue using these linear
+我们还将继续使用这些线代工具
+
+319
+00:11:07,832 --> 00:11:10,016
+algebra tools to derive more
+来推导一些
+
+320
+00:11:10,016 --> 00:11:13,242
+powerful learning algorithims as well
+更加强大的学习算法
+
diff --git a/srt/4 - 1 - Multiple Features (8 min).srt b/srt/4 - 1 - Multiple Features (8 min).srt
new file mode 100644
index 00000000..67b44648
--- /dev/null
+++ b/srt/4 - 1 - Multiple Features (8 min).srt
@@ -0,0 +1,1076 @@
+1
+00:00:00,150 --> 00:00:01,160
+in this video we will start
+在这段视频中 我们将开始
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,520 --> 00:00:02,600
+to talk about a new version
+介绍一种新的
+
+3
+00:00:03,250 --> 00:00:04,880
+of linear regression that's more powerful.
+更为有效的线性回归形式
+
+4
+00:00:05,800 --> 00:00:07,230
+One that works with multiple variables
+这种形式适用于多个变量
+
+5
+00:00:08,230 --> 00:00:09,070
+or with multiple features.
+或者多特征量的情况
+
+6
+00:00:10,320 --> 00:00:10,860
+Here's what I mean.
+比如说:
+
+7
+00:00:12,200 --> 00:00:13,670
+In the original version of
+在之前我们学习过的
+
+8
+00:00:13,900 --> 00:00:14,920
+linear regression that we developed,
+线性回归中
+
+9
+00:00:15,780 --> 00:00:17,590
+we have a single feature x,
+我们只有一个单一特征量
+
+10
+00:00:18,030 --> 00:00:19,450
+the size of the house, and
+房屋面积 x
+
+11
+00:00:19,600 --> 00:00:20,650
+we wanted to use that to
+我们希望用这个特征量
+
+12
+00:00:20,760 --> 00:00:22,510
+predict why the price of
+来预测
+
+13
+00:00:22,660 --> 00:00:24,210
+the house and this was
+房子的价格
+
+14
+00:00:25,310 --> 00:00:26,590
+our form of our hypothesis.
+这就是我们的假设
+
+15
+00:00:28,540 --> 00:00:29,210
+But now imagine, what if
+但是想象一下
+
+16
+00:00:29,410 --> 00:00:30,580
+we had not only the size
+如果我们不仅有房屋面积
+
+17
+00:00:31,020 --> 00:00:32,440
+of the house as a feature
+作为预测房屋
+
+18
+00:00:33,140 --> 00:00:34,450
+or as a variable of which
+价格的特征量
+
+19
+00:00:34,600 --> 00:00:35,490
+to try to predict the price,
+或者变量
+
+20
+00:00:36,450 --> 00:00:38,270
+but that we also knew the
+我们还知道
+
+21
+00:00:38,410 --> 00:00:39,710
+number of bedrooms, the number
+卧室的数量
+
+22
+00:00:39,990 --> 00:00:42,490
+of house and the age of the home and years.
+楼层的数量以及房子的使用年限
+
+23
+00:00:43,180 --> 00:00:44,050
+It seems like this would give
+这样就给了我们
+
+24
+00:00:44,230 --> 00:00:46,630
+us a lot more information with which to predict the price.
+更多可以用来
+
+25
+00:00:47,810 --> 00:00:49,130
+To introduce a little bit
+预测房屋价格的信息
+
+26
+00:00:49,290 --> 00:00:50,760
+of notation, we sort of
+先简单介绍一下记法
+
+27
+00:00:50,940 --> 00:00:51,910
+started to talk about this earlier,
+我们开始的时候就提到过
+
+28
+00:00:52,900 --> 00:00:53,800
+I'm going to use the variables
+我要用
+
+29
+00:00:54,560 --> 00:00:56,300
+X subscript 1 X subscript
+x 下标1
+
+30
+00:00:56,880 --> 00:00:59,320
+2 and so on to
+x 下标2 等等
+
+31
+00:00:59,480 --> 00:01:00,780
+denote my, in this
+来表示
+
+32
+00:01:00,960 --> 00:01:03,000
+case, four features and I'm
+这种情况下的四个特征量
+
+33
+00:01:03,310 --> 00:01:04,500
+going to continue to use
+然后仍然用
+
+34
+00:01:04,850 --> 00:01:06,780
+Y to denote the variable,
+Y来表示我们
+
+35
+00:01:07,370 --> 00:01:09,720
+the output variable price that we're trying to predict.
+所想要预测的输出变量
+
+36
+00:01:11,010 --> 00:01:12,600
+Let's introduce a little bit more notation.
+让我们来看看更多的表示方式
+
+37
+00:01:13,850 --> 00:01:15,210
+Now that we have four features
+现在我们有四个特征量
+
+38
+00:01:16,560 --> 00:01:18,490
+I'm going to use lowercase "n"
+我要用小写n
+
+39
+00:01:19,540 --> 00:01:20,670
+to denote the number of features.
+来表示特征量的数目
+
+40
+00:01:21,180 --> 00:01:22,460
+So in this example we have
+因此在这个例子中 我们的n等于4
+
+41
+00:01:23,030 --> 00:01:24,420
+n4 because we have, you
+因为你们看 我们有
+
+42
+00:01:24,820 --> 00:01:27,610
+know, one, two, three, four features.
+1 2 3 4 共4个特征量
+
+43
+00:01:28,850 --> 00:01:30,880
+And "n" is different from
+这里的n和我们之前
+
+44
+00:01:31,700 --> 00:01:33,280
+our earlier notation where we
+使用的n不同
+
+45
+00:01:33,570 --> 00:01:36,670
+were using "n" to denote the number of examples.
+之前我们是用的“m”来表示样本的数量
+
+46
+00:01:37,330 --> 00:01:38,640
+So if you have
+所以如果你有47行
+
+47
+00:01:39,050 --> 00:01:41,070
+47 rows "M" is the
+那么m就是这个表格里面的行数
+
+48
+00:01:41,300 --> 00:01:43,580
+number of rows on this table or the number of training examples.
+或者说是训练样本数
+
+49
+00:01:45,480 --> 00:01:47,290
+So I'm also
+然后我要用x 上标 (i)
+
+50
+00:01:47,500 --> 00:01:48,910
+going to use X superscript
+来表示第i个
+
+51
+00:01:49,540 --> 00:01:51,050
+"I" to denote the
+训练样本的
+
+52
+00:01:51,260 --> 00:01:53,460
+input features of the "I" training example.
+输入特征值
+
+53
+00:01:58,720 --> 00:02:00,580
+X2 is going to
+x上标 (2)
+
+54
+00:02:00,710 --> 00:02:02,300
+be a vector of
+就是表示第二个
+
+55
+00:02:02,550 --> 00:02:05,690
+the features for my second training example.
+训练样本的特征向量
+
+56
+00:02:06,430 --> 00:02:08,020
+And so X2 here is
+因此这里
+
+57
+00:02:08,160 --> 00:02:09,260
+going to be a vector 1416,
+x(2)就是向量 [1416, 3, 2, 40]
+
+58
+00:02:09,520 --> 00:02:10,560
+3, 2, 40 since those
+因为这四个数字对应了
+
+59
+00:02:11,060 --> 00:02:14,110
+are my four
+我用来预测房屋价格的
+
+60
+00:02:14,410 --> 00:02:16,100
+features that I have
+第二个房子的
+
+61
+00:02:17,500 --> 00:02:19,410
+to try to predict the price of the second house.
+四个特征量
+
+62
+00:02:20,990 --> 00:02:22,470
+So, in this notation, the
+因此在这种记法中
+
+63
+00:02:24,200 --> 00:02:25,250
+superscript 2 here.
+这个上标2
+
+64
+00:02:26,720 --> 00:02:28,620
+That's an index into my training set.
+就是训练集的一个索引
+
+65
+00:02:28,990 --> 00:02:31,630
+This is not X to the power of 2.
+而不是x的2次方
+
+66
+00:02:32,010 --> 00:02:33,150
+Instead, this is, you know,
+这个2就对应着
+
+67
+00:02:33,370 --> 00:02:36,430
+an index that says look at the second row of this table.
+你所看到的表格中的第二行
+
+68
+00:02:36,960 --> 00:02:38,260
+This refers to my second training example.
+即我的第二个训练样本
+
+69
+00:02:39,280 --> 00:02:41,780
+With this notation X2 is
+x上标(2) 这样表示
+
+70
+00:02:42,140 --> 00:02:43,890
+a four dimensional vector.
+就是一个四维向量
+
+71
+00:02:44,400 --> 00:02:45,760
+In fact, more generally, this is
+事实上更普遍地来说
+
+72
+00:02:45,930 --> 00:02:48,630
+an in-dimensional feature back there.
+这是n维的向量
+
+73
+00:02:58,790 --> 00:03:00,030
+subscript J to denote
+ 下标j
+
+74
+00:03:00,550 --> 00:03:01,740
+the value of the J,
+来表示
+
+75
+00:03:02,850 --> 00:03:04,420
+of feature number J
+第i个训练样本的
+
+76
+00:03:05,170 --> 00:03:06,360
+and the training example.
+第j个特征量
+
+77
+00:03:07,950 --> 00:03:11,490
+So concretely X2 subscript 3,
+因此具体的来说
+
+78
+00:03:11,920 --> 00:03:14,130
+will refer to feature
+x上标(2)下标3代表着
+
+79
+00:03:14,420 --> 00:03:15,800
+number three in the
+第2个训练样本里的第3个特征量
+
+80
+00:03:15,930 --> 00:03:17,670
+x factor which is equal to 2,right?
+对吧?
+
+81
+00:03:18,300 --> 00:03:20,360
+That was a 3 over there, just fix my handwriting.
+这个是3 我写的不太好看
+
+82
+00:03:20,860 --> 00:03:23,810
+So x2 subscript 3 is going to be equal to 2.
+所以说x上标(2)下标3就等于2
+
+83
+00:03:26,810 --> 00:03:28,010
+Now that we have multiple features,
+既然我们有了多个特征量
+
+84
+00:03:29,110 --> 00:03:30,390
+let's talk about what the
+让我们继续讨论一下
+
+85
+00:03:30,470 --> 00:03:32,360
+form of our hypothesis should be.
+我们的假设形式应该是怎样的
+
+86
+00:03:33,220 --> 00:03:34,790
+Previously this was the
+这是我们之前使用的假设形式
+
+87
+00:03:34,860 --> 00:03:36,650
+form of our hypothesis, where x
+x就是我们唯一的特征量
+
+88
+00:03:37,250 --> 00:03:39,280
+was our single feature, but
+但现在我们有了多个特征量
+
+89
+00:03:39,440 --> 00:03:40,450
+now that we have multiple features,
+我们就不能再
+
+90
+00:03:41,010 --> 00:03:43,350
+we aren't going to use the simple representation any more.
+使用这种简单的表示方式了
+
+91
+00:03:44,460 --> 00:03:46,040
+Instead, a form
+取而代之的
+
+92
+00:03:46,630 --> 00:03:48,140
+of the hypothesis in linear regression
+我们将把线性回归的假设
+
+93
+00:03:49,380 --> 00:03:50,630
+is going to be this, can be
+改成这样
+
+94
+00:03:50,820 --> 00:03:52,190
+theta 0 plus theta
+θ0加上
+
+95
+00:03:52,440 --> 00:03:55,690
+1 x1 plus theta 2
+θ1 乘以 x1 加上
+
+96
+00:03:55,840 --> 00:03:57,320
+x2 plus theta 3 x3
+θ2乘以x2 加上 θ3 乘以x3
+
+97
+00:03:58,610 --> 00:04:00,140
+plus theta 4 X4.
+加上θ4乘以x4
+
+98
+00:04:00,910 --> 00:04:02,610
+And if we have N features then
+然后如果我们有n个特征量
+
+99
+00:04:02,860 --> 00:04:04,110
+rather than summing up over
+那么我们要将所有的n个特征量相加
+
+100
+00:04:04,340 --> 00:04:05,380
+our four features, we would have
+而不是四个特征量
+
+101
+00:04:05,570 --> 00:04:07,050
+a sum over our N features.
+我们需要对n个特征量进行相加
+
+102
+00:04:08,570 --> 00:04:10,270
+Concretely for a particular
+举个具体的例子
+
+103
+00:04:11,480 --> 00:04:12,880
+setting of our parameters we
+在我们的设置的参数中
+
+104
+00:04:13,010 --> 00:04:15,500
+may have H of
+我们可能有h(x)等于
+
+105
+00:04:17,370 --> 00:04:18,990
+X 80 + 0.1 X1 + 0.01x2 + 3x3 - 2x4.
+80 + 0.1 x1 + 0.01x2 + 3x3 - 2x4
+
+106
+00:04:19,160 --> 00:04:23,070
+This would be one
+这就是一个
+
+107
+00:04:25,710 --> 00:04:27,060
+example of a hypothesis
+假设的范例
+
+108
+00:04:27,700 --> 00:04:29,170
+and you remember a
+别忘了
+
+109
+00:04:29,760 --> 00:04:30,710
+hypothesis is trying to predict
+假设是为了预测
+
+110
+00:04:31,100 --> 00:04:32,020
+the price of the house in
+大约以千刀为单位的房屋价格
+
+111
+00:04:32,360 --> 00:04:33,910
+thousands of dollars, just saying
+就是说
+
+112
+00:04:34,250 --> 00:04:35,020
+that, you know, the base
+一个房子的价格
+
+113
+00:04:35,360 --> 00:04:37,270
+price of a house
+可以是
+
+114
+00:04:37,470 --> 00:04:39,960
+is maybe 80,000 plus another open
+80 k加上
+
+115
+00:04:40,690 --> 00:04:41,960
+1, so that's an extra,
+0.1乘以x1
+
+116
+00:04:42,460 --> 00:04:43,680
+what, hundred dollars per square feet,
+也就是说 每平方尺100美元
+
+117
+00:04:44,430 --> 00:04:45,710
+yeah, plus the price goes up
+然后价格
+
+118
+00:04:45,860 --> 00:04:47,340
+a little bit for each
+会随着楼层数的增加
+
+119
+00:04:53,170 --> 00:04:54,300
+up further for each additional
+随着卧室数的增加
+
+120
+00:04:54,790 --> 00:04:55,870
+bedroom the house has, because
+因为x3是
+
+121
+00:04:56,190 --> 00:04:57,390
+X three was the number
+卧室的数量
+
+122
+00:04:57,570 --> 00:04:58,890
+of bedrooms, and the price
+但是呢
+
+123
+00:04:59,220 --> 00:05:01,090
+goes down a little bit
+房子的价格会
+
+124
+00:05:01,540 --> 00:05:03,930
+with each additional age of the house.
+随着使用年数的增加
+
+125
+00:05:04,230 --> 00:05:07,150
+With each additional year of the age of the house.
+而贬值
+
+126
+00:05:08,930 --> 00:05:11,630
+Here's the form of a hypothesis rewritten on the slide.
+这是重新改写过的假设的形式
+
+127
+00:05:11,990 --> 00:05:13,390
+And what I'm gonna do is
+接下来
+
+128
+00:05:13,590 --> 00:05:14,560
+introduce a little bit of
+我要来介绍一点
+
+129
+00:05:14,650 --> 00:05:16,300
+notation to simplify this equation.
+简化这个等式的表示方式
+
+130
+00:05:17,840 --> 00:05:19,660
+For convenience of notation, let
+为了表示方便
+
+131
+00:05:19,770 --> 00:05:22,800
+me define x subscript 0 to be equals one.
+我要将x下标0的值设为1
+
+132
+00:05:23,870 --> 00:05:25,080
+Concretely, this means that for
+具体而言 这意味着
+
+133
+00:05:25,270 --> 00:05:27,770
+every example i I
+对于第i个样本
+
+134
+00:05:27,850 --> 00:05:29,300
+have a feature vector X superscript
+都有一个向量x上标(i)
+
+135
+00:05:29,850 --> 00:05:31,500
+I and X superscript
+并且x上标(i)
+
+136
+00:05:32,000 --> 00:05:34,370
+I subscript 0 is going to be equal to 1.
+下标0等于1
+
+137
+00:05:34,970 --> 00:05:35,990
+You can think of this as defining
+你可以认为我们
+
+138
+00:05:36,810 --> 00:05:38,590
+an additional zero feature.
+定义了一个额外的第0个特征量
+
+139
+00:05:39,290 --> 00:05:40,320
+So whereas previously I had
+因此 我过去有n个特征量
+
+140
+00:05:40,670 --> 00:05:41,790
+n features because x1, x2
+因为我们有x1 x2
+
+141
+00:05:41,930 --> 00:05:43,920
+through xn, I'm now defining
+直到xn 由于我另外定义了
+
+142
+00:05:44,830 --> 00:05:46,150
+an additional sort of zero
+额外的第0个特征向量
+
+143
+00:05:47,210 --> 00:05:48,910
+feature vector that always takes
+并且它的取值
+
+144
+00:05:49,310 --> 00:05:50,590
+on the value of one.
+总是1
+
+145
+00:05:52,130 --> 00:05:53,860
+So now my feature vector
+所以我现在的特征向量x
+
+146
+00:05:54,200 --> 00:05:56,390
+X becomes this N+1 dimensional
+是一个从0开始标记的
+
+147
+00:05:58,410 --> 00:06:01,020
+vector that is zero index.
+n+1维的向量
+
+148
+00:06:02,430 --> 00:06:04,080
+So this is now a n+1
+所以现在就是一个
+
+149
+00:06:04,190 --> 00:06:05,650
+dimensional feature vector, but
+n+1维的特征量向量
+
+150
+00:06:05,940 --> 00:06:07,200
+I'm gonna index it from
+但我要从0开始标记
+
+151
+00:06:07,420 --> 00:06:09,400
+0 and I'm also going
+同时
+
+152
+00:06:09,700 --> 00:06:10,950
+to think of my
+我也想把我的参数
+
+153
+00:06:11,090 --> 00:06:13,240
+parameters as a vector.
+都看做一个向量
+
+154
+00:06:13,610 --> 00:06:15,620
+So, our parameters here, right
+所以我们的参数就是
+
+155
+00:06:15,790 --> 00:06:16,800
+that would be our theta zero,
+我们的θ0
+
+156
+00:06:17,150 --> 00:06:18,130
+theta one, theta two, and so
+θ1 θ2 等等
+
+157
+00:06:18,380 --> 00:06:18,780
+on all the way up to theta n,
+直到θn
+
+158
+00:06:18,790 --> 00:06:19,950
+we're going to gather
+我们要把
+
+159
+00:06:20,340 --> 00:06:21,580
+them up into a parameter
+所有的参数都写成一个向量
+
+160
+00:06:22,380 --> 00:06:24,030
+vector written theta 0, theta
+θ0
+
+161
+00:06:24,190 --> 00:06:25,990
+1, theta 2, and so
+θ1 θ2
+
+162
+00:06:26,280 --> 00:06:27,390
+on, down to theta n.
+直到θn
+
+163
+00:06:28,330 --> 00:06:30,160
+This is another zero index vector.
+这里也有一个从0开始标记的矢量
+
+164
+00:06:30,560 --> 00:06:31,590
+It's of index signed from zero.
+下标从0开始
+
+165
+00:06:32,820 --> 00:06:35,380
+That is another n plus 1 dimensional vector.
+这是另外一个
+
+166
+00:06:37,180 --> 00:06:39,840
+So, my hypothesis cannot be
+所以我的假设
+
+167
+00:06:40,000 --> 00:06:42,720
+written theta 0x0 plus
+现在可以写成θ0乘以x0
+
+168
+00:06:42,910 --> 00:06:45,560
+theta 1x1+ up to
+加上θ1乘以x1直到
+
+169
+00:06:46,400 --> 00:06:47,330
+theta n Xn.
+θn 乘以xn
+
+170
+00:06:48,820 --> 00:06:50,310
+And this equation is
+这个等式
+
+171
+00:06:50,460 --> 00:06:51,600
+the same as this on
+和上面的等式是一样的
+
+172
+00:06:51,910 --> 00:06:53,670
+top because, you know,
+因为你看
+
+173
+00:06:54,080 --> 00:06:55,710
+eight zero is equal to one.
+x0等于1
+
+174
+00:06:58,270 --> 00:06:59,300
+Underneath and I now
+下面 我要
+
+175
+00:06:59,390 --> 00:07:00,700
+take this form of the
+把这种形式假设等式
+
+176
+00:07:00,740 --> 00:07:02,130
+hypothesis and write this
+写成
+
+177
+00:07:02,500 --> 00:07:04,990
+as either transpose x,
+θ转置乘以X
+
+178
+00:07:05,370 --> 00:07:06,910
+depending on how familiar
+取决于你对
+
+179
+00:07:07,320 --> 00:07:08,960
+you are with inner products of
+向量内积有多熟悉
+
+180
+00:07:09,720 --> 00:07:12,050
+vectors if you
+如果你展开
+
+181
+00:07:12,180 --> 00:07:13,880
+write what theta transfers x
+θ转置乘以X
+
+182
+00:07:14,110 --> 00:07:15,260
+is what theta transfer and
+那么就得到
+
+183
+00:07:15,360 --> 00:07:17,370
+this is theta zero,
+θ0
+
+184
+00:07:17,840 --> 00:07:19,730
+theta one, up to theta
+θ1直到θn
+
+185
+00:07:20,070 --> 00:07:22,880
+N. So this
+这个就是θ转置
+
+186
+00:07:23,140 --> 00:07:24,910
+thing here is theta transpose
+实际上
+
+187
+00:07:25,810 --> 00:07:27,820
+and this is actually a N
+这就是一个
+
+188
+00:07:27,960 --> 00:07:30,930
+plus one by one matrix.
+n+1乘以1维的矩阵
+
+189
+00:07:31,850 --> 00:07:32,600
+It's also called a row vector
+也被称为行向量
+
+190
+00:07:34,090 --> 00:07:35,160
+and you take that and
+用行向量
+
+191
+00:07:35,420 --> 00:07:37,420
+multiply it with the
+与X向量相乘
+
+192
+00:07:37,510 --> 00:07:38,440
+vector X which is X
+X向量是
+
+193
+00:07:38,640 --> 00:07:40,560
+zero, X one, and so
+x0 x1等等
+
+194
+00:07:40,820 --> 00:07:41,790
+on, down to X n.
+直到xn
+
+195
+00:07:43,030 --> 00:07:44,400
+And so, the inner product
+因此内积就是
+
+196
+00:07:44,940 --> 00:07:47,050
+that is theta transpose X
+θ转置乘以X
+
+197
+00:07:47,910 --> 00:07:48,810
+is just equal to this.
+就等于这个等式
+
+198
+00:07:49,520 --> 00:07:50,610
+This gives us a convenient way
+这就为我们提供了一个
+
+199
+00:07:50,770 --> 00:07:51,830
+to write the form of the
+表示假设的
+
+200
+00:07:52,110 --> 00:07:53,310
+hypothesis as just the inner
+更加便利的形式
+
+201
+00:07:53,510 --> 00:07:55,240
+product between our parameter
+即用参数向量θ以及
+
+202
+00:07:55,760 --> 00:07:57,200
+vector theta and our theta
+特征向量X的内积
+
+203
+00:07:57,550 --> 00:07:59,220
+vector X. And it
+这就是改写以后的
+
+204
+00:07:59,350 --> 00:08:00,360
+is this little bit of notation,
+表示方法
+
+205
+00:08:01,000 --> 00:08:02,270
+this little excerpt of the
+这样的表示习惯
+
+206
+00:08:02,320 --> 00:08:03,690
+notation convention that let
+就让我们
+
+207
+00:08:03,740 --> 00:08:05,530
+us write this in this compact form.
+可以以这种紧凑的形式写出假设
+
+208
+00:08:06,360 --> 00:08:09,230
+So that's the form of a hypthesis when we have multiple features.
+这就是多特征量情况下的假设形式
+
+209
+00:08:09,980 --> 00:08:10,940
+And, just to give this another
+起另一个名字
+
+210
+00:08:11,230 --> 00:08:12,330
+name, this is also
+就是
+
+211
+00:08:12,570 --> 00:08:13,860
+called multivariate linear regression.
+所谓的多元线性回归
+
+212
+00:08:15,200 --> 00:08:16,640
+And the term multivariable that's just
+多元一词
+
+213
+00:08:17,120 --> 00:08:18,300
+maybe a fancy term for saying
+也就是用来预测的多个特征量
+
+214
+00:08:18,730 --> 00:08:20,370
+we have multiple features, or
+或者变量
+
+215
+00:08:20,830 --> 00:08:22,900
+multivariables with which to try to predict the value Y.
+就是一种更加好听的说法罢了
+
diff --git a/srt/4 - 2 - Gradient Descent for Multiple Variables (5 min).srt b/srt/4 - 2 - Gradient Descent for Multiple Variables (5 min).srt
new file mode 100644
index 00000000..3bf16e27
--- /dev/null
+++ b/srt/4 - 2 - Gradient Descent for Multiple Variables (5 min).srt
@@ -0,0 +1,573 @@
+1
+00:00:00,220 --> 00:00:03,688
+前一视频中,我们探讨了多元或多变量线性回归假设的形式
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:00,220 --> 00:00:03,688
+In the previous video, we talked about
+the form of the hypothesis for linear
+
+3
+00:00:03,688 --> 00:00:07,246
+前一视频中,我们探讨了多元或多变量线性回归假设的形式
+
+4
+00:00:03,688 --> 00:00:07,246
+regression with multiple features
+or with multiple variables.
+
+5
+00:00:07,246 --> 00:00:11,912
+在这个视频中,我们将介绍如何设定该假设的参数
+
+6
+00:00:07,246 --> 00:00:11,912
+In this video, let's talk about how to
+fit the parameters of that hypothesis.
+
+7
+00:00:11,912 --> 00:00:15,175
+特别是,我们会讲解如何使用梯度下降法来
+
+8
+00:00:11,912 --> 00:00:15,175
+In particular let's talk about how
+to use gradient descent for linear
+
+9
+00:00:15,175 --> 00:00:19,875
+处理多元线性回归
+
+10
+00:00:15,175 --> 00:00:19,875
+regression with multiple features.
+
+11
+00:00:19,875 --> 00:00:24,802
+快速地总结下我们的变量记号,得到正式的多元线性回归假设
+
+12
+00:00:19,875 --> 00:00:24,802
+To quickly summarize our notation,
+this is our formal hypothesis in
+
+13
+00:00:24,802 --> 00:00:31,509
+其中,我们已经按惯例,使x0 = 1
+
+14
+00:00:24,802 --> 00:00:31,509
+multivariable linear regression where
+we've adopted the convention that x0=1.
+
+15
+00:00:31,509 --> 00:00:37,505
+此模型的参数包括从 theta0 到 theta n,但我们不把它看作
+
+16
+00:00:31,509 --> 00:00:37,505
+The parameters of this model are theta0
+through theta n, but instead of thinking
+
+17
+00:00:37,505 --> 00:00:42,385
+这 n 个独立有效的参数。而是考虑
+
+18
+00:00:37,505 --> 00:00:42,385
+of this as n separate parameters, which
+is valid, I'm instead going to think of
+
+19
+00:00:42,385 --> 00:00:51,175
+把这些theta参数作为一个 n + 1 维的向量。
+
+20
+00:00:42,385 --> 00:00:51,175
+the parameters as theta where theta
+here is a n+1-dimensional vector.
+
+21
+00:00:51,175 --> 00:00:55,498
+所以,我只会把此模型的参数看作模型自己的一个向量。
+
+22
+00:00:51,175 --> 00:00:55,498
+So I'm just going to think of the
+parameters of this model
+
+23
+00:00:55,498 --> 00:00:58,674
+所以,我只会把此模型的参数看作模型自己的一个向量。
+
+24
+00:00:55,498 --> 00:00:58,674
+as itself being a vector.
+
+25
+00:00:58,674 --> 00:01:03,507
+我们的成本函数是从 theta0 到 theta n 的J,它通过误差项的
+
+26
+00:00:58,674 --> 00:01:03,507
+Our cost function is J of theta0 through
+theta n which is given by this usual
+
+27
+00:01:03,507 --> 00:01:08,983
+平方的总和来给定。但又不把 J 看作带n+1个数的函数
+
+28
+00:01:03,507 --> 00:01:08,983
+sum of square of error term. But again
+instead of thinking of J as a function
+
+29
+00:01:08,983 --> 00:01:14,016
+我会使用更通用的方式把 J 看作是参数为theta向量的函数。
+
+30
+00:01:08,983 --> 00:01:14,016
+of these n+1 numbers, I'm going to
+more commonly write J as just a
+
+31
+00:01:14,016 --> 00:01:22,275
+所以, theta 在这里还是一个向量。
+
+32
+00:01:14,016 --> 00:01:22,275
+function of the parameter vector theta
+so that theta here is a vector.
+
+33
+00:01:22,275 --> 00:01:26,897
+这就是梯度下降的样子。我们要不断更新每个theta j 参数,
+
+34
+00:01:22,275 --> 00:01:26,897
+Here's what gradient descent looks like.
+We're going to repeatedly update each
+
+35
+00:01:26,897 --> 00:01:32,142
+(不是我懒,而是用自然语言来描述数学公式,老是有歧义,具体看视频上显示的公式吧)
+
+36
+00:01:26,897 --> 00:01:32,142
+parameter theta j according to theta j
+minus alpha times this derivative term.
+
+37
+00:01:32,142 --> 00:01:37,868
+我们再把这写作 theta 的 J,然后 theta j 更新为
+
+38
+00:01:32,142 --> 00:01:37,868
+And once again we just write this as
+J of theta, so theta j is updated as
+
+39
+00:01:37,868 --> 00:01:41,840
+(不是我懒,而是用自然语言来描述数学公式,老是有歧义,具体看视频上显示的公式吧)
+
+40
+00:01:37,868 --> 00:01:41,840
+theta j minus the learning rate
+alpha times the derivative, a partial
+
+41
+00:01:41,840 --> 00:01:47,840
+这个求导是成本函数对参数theta j 的求偏导。
+
+42
+00:01:41,840 --> 00:01:47,840
+derivative of the cost function with
+respect to the parameter theta j.
+
+43
+00:01:47,840 --> 00:01:51,305
+让我们看看这是什么样子时我们实施梯度下降,
+
+44
+00:01:47,840 --> 00:01:51,305
+Let's see what this looks like when
+we implement gradient descent and,
+
+45
+00:01:51,305 --> 00:01:55,985
+特别是,让我们去看看,偏导数看起来像什么。
+
+46
+00:01:51,305 --> 00:01:55,985
+in particular, let's go see what that
+partial derivative term looks like.
+
+47
+00:01:55,985 --> 00:02:01,383
+这就是我们使用梯度下降法,并且 N = 1 元时的例子。
+
+48
+00:01:55,985 --> 00:02:01,383
+Here's what we have for gradient descent
+for the case of when we had N=1 feature.
+
+49
+00:02:01,383 --> 00:02:06,782
+我们有两个独立的更新规则,分别对应参数 theta0 和 theta1
+
+50
+00:02:01,383 --> 00:02:06,782
+We had two separate update rules for
+the parameters theta0 and theta1, and
+
+51
+00:02:06,782 --> 00:02:12,779
+希望你熟悉这些内容。在这里的这个项,当然就是
+
+52
+00:02:06,782 --> 00:02:12,779
+hopefully these look familiar to you.
+And this term here was of course the
+
+53
+00:02:12,779 --> 00:02:17,672
+成本函数 对参数 theta0 求的偏导,
+
+54
+00:02:12,779 --> 00:02:17,672
+partial derivative of the cost function
+with respect to the parameter of theta0,
+
+55
+00:02:17,672 --> 00:02:21,891
+同样地,我们还有一个不同的参数 theta1 的更新规则。
+
+56
+00:02:17,672 --> 00:02:21,891
+and similarly we had a different
+update rule for the parameter theta1.
+
+57
+00:02:21,891 --> 00:02:26,259
+有一个小小的区别,我们以前只有一元特征值
+
+58
+00:02:21,891 --> 00:02:26,259
+There's one little difference which is
+that when we previously had only one
+
+59
+00:02:26,259 --> 00:02:31,992
+我们可以把它叫做x(i),但现在在我们的新的记号表示法中
+
+60
+00:02:26,259 --> 00:02:31,992
+feature, we would call that feature x(i)
+but now in our new notation
+
+61
+00:02:31,992 --> 00:02:38,462
+我们自然而然把他们称之为(看视频),以表示一个特征值。
+
+62
+00:02:31,992 --> 00:02:38,462
+we would of course call this
+x(i)1 to denote our one feature.
+
+63
+00:02:38,462 --> 00:02:41,019
+所以在我们只有一个特征值的情况下,就是这样。
+
+64
+00:02:38,462 --> 00:02:41,019
+So that was for when
+we had only one feature.
+
+65
+00:02:41,019 --> 00:02:44,496
+让我们看看新的算法,我们有多于一个的特征值,
+
+66
+00:02:41,019 --> 00:02:44,496
+Let's look at the new algorithm for
+we have more than one feature,
+
+67
+00:02:44,496 --> 00:02:47,350
+特征值个数 n 可能比 1 大得多。
+
+68
+00:02:44,496 --> 00:02:47,350
+where the number of features n
+may be much larger than one.
+
+69
+00:02:47,350 --> 00:02:53,158
+我们得到这个的梯度下降法更新规则,也许对于你们当中,
+
+70
+00:02:47,350 --> 00:02:53,158
+We get this update rule for gradient
+descent and, maybe for those of you that
+
+71
+00:02:53,158 --> 00:02:57,781
+会微积分的人来说,如果你根据成本函数的定义,然后
+
+72
+00:02:53,158 --> 00:02:57,781
+know calculus, if you take the
+definition of the cost function and take
+
+73
+00:02:57,781 --> 00:03:03,312
+计算 成本 函数J 对参数 theta j 的偏导,
+
+74
+00:02:57,781 --> 00:03:03,312
+the partial derivative of the cost
+function J with respect to the parameter
+
+75
+00:03:03,312 --> 00:03:08,119
+你就会发现,这偏导值就是
+
+76
+00:03:03,312 --> 00:03:08,119
+theta j, you'll find that that partial
+derivative is exactly that term that
+
+77
+00:03:08,119 --> 00:03:10,665
+我在它周围画上蓝框的项。
+
+78
+00:03:08,119 --> 00:03:10,665
+I've drawn the blue box around.
+
+79
+00:03:10,665 --> 00:03:14,837
+如果你这样做了,你就得到梯度下降法的具体实现,
+
+80
+00:03:10,665 --> 00:03:14,837
+And if you implement this you will
+get a working implementation of
+
+81
+00:03:14,837 --> 00:03:18,962
+用于多元线性回归
+
+82
+00:03:14,837 --> 00:03:18,962
+gradient descent for
+multivariate linear regression.
+
+83
+00:03:18,962 --> 00:03:21,572
+在此幻灯片中,我想做的最后一件事,就是告诉你
+
+84
+00:03:18,962 --> 00:03:21,572
+The last thing I want to do on
+this slide is give you a sense of
+
+85
+00:03:21,572 --> 00:03:26,882
+为何这些或新或旧的算法是同一类事件,或为何他们
+
+86
+00:03:21,572 --> 00:03:26,882
+why these new and old algorithms are
+sort of the same thing or why they're
+
+87
+00:03:26,882 --> 00:03:30,904
+是类似的算法,以及为何他们都是梯度下降算法。
+
+88
+00:03:26,882 --> 00:03:30,904
+both similar algorithms or why they're
+both gradient descent algorithms.
+
+89
+00:03:30,904 --> 00:03:34,363
+让我们来看个例子,现在有两个特征值,
+
+90
+00:03:30,904 --> 00:03:34,363
+Let's consider a case
+where we have two features
+
+91
+00:03:34,363 --> 00:03:37,488
+或者超过两个的特征值,因此,我们有三条更新规则
+
+92
+00:03:34,363 --> 00:03:37,488
+or maybe more than two features,
+so we have three update rules for
+
+93
+00:03:37,488 --> 00:03:42,680
+来计算参数 theta0 到 theta2 ,可能其他的 theta值也一样。
+
+94
+00:03:37,488 --> 00:03:42,680
+the parameters theta0, theta1, theta2
+and maybe other values of theta as well.
+
+95
+00:03:42,680 --> 00:03:49,457
+如果你观察theta0的更新规则,你会发现,
+
+96
+00:03:42,680 --> 00:03:49,457
+If you look at the update rule for
+theta0, what you find is that this
+
+97
+00:03:49,457 --> 00:03:55,300
+这更新规则和我们以前用过的更新规则一样
+
+98
+00:03:49,457 --> 00:03:55,300
+update rule here is the same as
+the update rule that we had previously
+
+99
+00:03:55,300 --> 00:03:57,350
+以前那个 n = 1 的例子的更新规则。
+
+100
+00:03:55,300 --> 00:03:57,350
+for the case of n = 1.
+
+101
+00:03:57,350 --> 00:04:00,203
+当然,它们等效的原因是
+
+102
+00:03:57,350 --> 00:04:00,203
+And the reason that they are
+equivalent is, of course,
+
+103
+00:04:00,203 --> 00:04:06,871
+因为在我们符号惯例中,我们有 x (i) 0 = 1 的约定,
+
+104
+00:04:00,203 --> 00:04:06,871
+because in our notational convention we
+had this x(i)0 = 1 convention, which is
+
+105
+00:04:06,871 --> 00:04:12,003
+这就是为什么洋红色框里左右两边会等效。
+
+106
+00:04:06,871 --> 00:04:12,003
+why these two term that I've drawn the
+magenta boxes around are equivalent.
+
+107
+00:04:12,003 --> 00:04:16,010
+同样地,如果你注意到 theta1 的更新规则,你会发现
+
+108
+00:04:12,003 --> 00:04:16,010
+Similarly, if you look the update
+rule for theta1, you find that
+
+109
+00:04:16,010 --> 00:04:21,540
+这一项等效于我们以前用过的项
+
+110
+00:04:16,010 --> 00:04:21,540
+this term here is equivalent to
+the term we previously had,
+
+111
+00:04:21,540 --> 00:04:25,020
+或者说方程,或更新规则,我们曾用于以前的 theta1
+
+112
+00:04:21,540 --> 00:04:25,020
+or the equation or the update
+rule we previously had for theta1,
+
+113
+00:04:25,020 --> 00:04:30,222
+当然我们只是使用了这种新的符号 x (i) 1 来表示
+
+114
+00:04:25,020 --> 00:04:30,222
+where of course we're just using
+this new notation x(i)1 to denote
+
+115
+00:04:30,222 --> 00:04:37,605
+我们第一元特征值,现在,我们有多个特征值,
+
+116
+00:04:30,222 --> 00:04:37,605
+our first feature, and now that we have
+more than one feature we can have
+
+117
+00:04:37,605 --> 00:04:43,560
+于是我们有相似的更新规则,用于诸如 theta2 等参数。
+
+118
+00:04:37,605 --> 00:04:43,560
+similar update rules for the other
+parameters like theta2 and so on.
+
+119
+00:04:43,560 --> 00:04:48,219
+此幻灯片中还有很多内容,所以我无比明确地鼓励你
+
+120
+00:04:43,560 --> 00:04:48,219
+There's a lot going on on this slide
+so I definitely encourage you
+
+121
+00:04:48,219 --> 00:04:52,020
+去暂停视频,一丝不苟地观看这张幻灯片上的数学内容
+
+122
+00:04:48,219 --> 00:04:52,020
+if you need to to pause the video
+and look at all the math on this slide
+
+123
+00:04:52,020 --> 00:04:55,446
+以确保您掌握了这上面的一切。
+
+124
+00:04:52,020 --> 00:04:55,446
+slowly to make sure you understand
+everything that's going on here.
+
+125
+00:04:55,446 --> 00:05:00,440
+但是,如果你实现了写在这里的算法,
+
+126
+00:04:55,446 --> 00:05:00,440
+But if you implement the algorithm
+written up here then you have
+
+127
+00:05:00,440 --> 00:05:51,300
+那么你就已经拥有一个多元线性回归的具体实现。
+
+128
+00:05:00,440 --> 00:05:51,300
+a working implementation of linear
+regression with multiple features.
+
diff --git a/srt/4 - 3 - Gradient Descent in Practice I - Feature Scaling (9 min).srt b/srt/4 - 3 - Gradient Descent in Practice I - Feature Scaling (9 min).srt
new file mode 100644
index 00000000..3fdcd7ae
--- /dev/null
+++ b/srt/4 - 3 - Gradient Descent in Practice I - Feature Scaling (9 min).srt
@@ -0,0 +1,1261 @@
+1
+00:00:00,190 --> 00:00:01,270
+In this video and in
+在这段视频
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,440 --> 00:00:02,720
+the video after this one, I
+以及下一段视频中
+
+3
+00:00:02,850 --> 00:00:04,040
+wanna tell you about some of
+我想告诉你一些关于
+
+4
+00:00:04,180 --> 00:00:06,940
+the practical tricks for making gradient descent work well.
+梯度下降运算中的实用技巧
+
+5
+00:00:07,680 --> 00:00:10,250
+In this video, I want to tell you about an idea called feature skill.
+在这段视频中 我会告诉你一个称为特征缩放 (feature scaling) 的方法
+
+6
+00:00:11,770 --> 00:00:12,210
+Here's the idea.
+这个方法如下
+
+7
+00:00:13,030 --> 00:00:14,080
+If you have a problem where you
+如果你有一个机器学习问题
+
+8
+00:00:14,180 --> 00:00:15,880
+have multiple features, if you
+这个问题有多个特征
+
+9
+00:00:16,320 --> 00:00:17,410
+make sure that the features
+如果你能确保这些特征
+
+10
+00:00:18,050 --> 00:00:19,440
+are on a similar scale, by
+都处在一个相近的范围
+
+11
+00:00:19,570 --> 00:00:20,480
+which I mean make sure that
+我的意思是确保
+
+12
+00:00:20,650 --> 00:00:22,130
+the different features take on
+不同特征的取值
+
+13
+00:00:22,300 --> 00:00:23,390
+similar ranges of values,
+在相近的范围内
+
+14
+00:00:24,420 --> 00:00:26,490
+then gradient descents can converge more quickly.
+这样梯度下降法就能更快地收敛
+
+15
+00:00:27,510 --> 00:00:28,680
+Concretely let's say you
+具体地说
+
+16
+00:00:28,820 --> 00:00:29,860
+have a problem with two features
+假如你有一个具有两个特征的问题
+
+17
+00:00:30,380 --> 00:00:31,680
+where X1 is the size
+其中 x1 是房屋面积大小
+
+18
+00:00:31,950 --> 00:00:32,860
+of house and takes on values
+它的取值
+
+19
+00:00:33,530 --> 00:00:34,540
+between say zero to two thousand
+在0到2000之间
+
+20
+00:00:35,490 --> 00:00:36,270
+and two is the number
+x2 是卧室的数量
+
+21
+00:00:36,520 --> 00:00:37,570
+of bedrooms, and maybe that takes
+可能这个值
+
+22
+00:00:37,820 --> 00:00:39,250
+on values between one and five.
+取值范围在1到5之间
+
+23
+00:00:40,100 --> 00:00:41,690
+If you plot the contours of
+如果你画出代价函数
+
+24
+00:00:41,800 --> 00:00:43,000
+the cos function J of theta,
+J(θ) 的轮廓图
+
+25
+00:00:44,810 --> 00:00:46,540
+then the contours may look
+那么这个轮廓看起来
+
+26
+00:00:46,750 --> 00:00:49,010
+like this, where, let's see,
+应该是像这样的
+
+27
+00:00:49,230 --> 00:00:50,570
+J of theta is a function
+J(θ) 是一个关于
+
+28
+00:00:50,910 --> 00:00:53,590
+of parameters theta zero, theta one and theta two.
+参数 θ0 θ1 和 θ2 的函数
+
+29
+00:00:54,300 --> 00:00:55,400
+I'm going to ignore theta zero,
+但我要忽略 θ0
+
+30
+00:00:56,020 --> 00:00:57,230
+so let's about theta 0
+所以暂时不考虑 θ0
+
+31
+00:00:57,480 --> 00:00:58,730
+and pretend as a function of
+并假想一个函数的变量
+
+32
+00:00:58,840 --> 00:01:01,080
+only theta 1 and theta
+只有 θ1 和 θ2
+
+33
+00:01:01,510 --> 00:01:02,810
+2, but if x1 can take on
+但如果 x1 的取值范围
+
+34
+00:01:02,940 --> 00:01:04,110
+them, you know, much larger range
+远远大于 x2 的取值范围的话
+
+35
+00:01:04,370 --> 00:01:05,790
+of values and x2 It turns
+那么最终画出来的
+
+36
+00:01:06,120 --> 00:01:07,270
+out that the contours of the
+代价函数 J(θ) 的轮廓图
+
+37
+00:01:07,340 --> 00:01:08,320
+cause function J of theta
+就会呈现出这样一种
+
+38
+00:01:09,420 --> 00:01:11,400
+can take on this very
+非常偏斜
+
+39
+00:01:11,690 --> 00:01:14,720
+very skewed elliptical shape, except
+并且椭圆的形状
+
+40
+00:01:15,070 --> 00:01:16,620
+that with the so 2000 to
+2000 和 5的比例
+
+41
+00:01:16,770 --> 00:01:18,470
+5 ratio, it can be even more secure.
+会让这个椭圆更加瘦长
+
+42
+00:01:18,800 --> 00:01:20,190
+So, this is very, very tall
+所以 这是一个又瘦又高的
+
+43
+00:01:20,560 --> 00:01:23,070
+and skinny ellipses, or these
+椭圆形轮廓图
+
+44
+00:01:23,320 --> 00:01:24,950
+very tall skinny ovals, can form
+就是这些非常高大细长的椭圆形
+
+45
+00:01:25,310 --> 00:01:27,940
+the contours of the cause function J of theta.
+构成了代价函数 J(θ)
+
+46
+00:01:29,420 --> 00:01:30,860
+And if you run gradient descents
+而如果你用这个代价函数
+
+47
+00:01:30,930 --> 00:01:34,290
+on this cos-function, your
+来运行梯度下降的话
+
+48
+00:01:34,830 --> 00:01:36,480
+gradients may end up
+你要得到梯度值 最终可能
+
+49
+00:01:36,970 --> 00:01:38,660
+taking a long time and
+需要花很长一段时间
+
+50
+00:01:39,080 --> 00:01:40,360
+can oscillate back and forth
+并且可能会来回波动
+
+51
+00:01:41,100 --> 00:01:43,130
+and take a long time before it
+然后会经过很长时间
+
+52
+00:01:43,190 --> 00:01:46,120
+can finally find its way to the global minimum.
+最终才收敛到全局最小值
+
+53
+00:01:47,470 --> 00:01:48,720
+In fact, you can imagine if these
+事实上 你可以想像 如果这些
+
+54
+00:01:48,890 --> 00:01:50,400
+contours are exaggerated even
+轮廓再被放大一些的话
+
+55
+00:01:50,580 --> 00:01:51,970
+more when you draw incredibly
+如果你画的再夸张一些
+
+56
+00:01:52,480 --> 00:01:54,300
+skinny, tall skinny contours,
+把它画的更细更长
+
+57
+00:01:56,230 --> 00:01:57,030
+and it can be even more extreme
+那么可能情况会更糟糕
+
+58
+00:01:57,380 --> 00:01:59,060
+than, then, gradient descent
+梯度下降的过程
+
+59
+00:01:59,790 --> 00:02:02,310
+just have a much
+可能更加缓慢
+
+60
+00:02:02,630 --> 00:02:04,280
+harder time taking it's way,
+需要花更长的时间
+
+61
+00:02:04,690 --> 00:02:06,030
+meandering around, it can take
+反复来回振荡
+
+62
+00:02:06,120 --> 00:02:08,270
+a long time to find this way to the global minimum.
+最终才找到一条正确通往全局最小值的路
+
+63
+00:02:12,130 --> 00:02:14,370
+In these settings, a useful
+在这样的情况下
+
+64
+00:02:14,780 --> 00:02:16,280
+thing to do is to scale the features.
+一种有效的方法是进行特征缩放(feature scaling)
+
+65
+00:02:17,380 --> 00:02:18,760
+Concretely if you instead
+具体来说
+
+66
+00:02:19,200 --> 00:02:20,370
+define the feature X
+把特征 x 定义为
+
+67
+00:02:20,570 --> 00:02:21,770
+one to be the size of
+房子的面积大小
+
+68
+00:02:21,870 --> 00:02:23,070
+the house divided by two thousand,
+除以2000的话
+
+69
+00:02:24,040 --> 00:02:25,140
+and define X two to be
+并且把 x2 定义为
+
+70
+00:02:25,270 --> 00:02:26,520
+maybe the number of bedrooms divided
+卧室的数量除以5
+
+71
+00:02:26,940 --> 00:02:29,010
+by five, then the
+那么这样的话
+
+72
+00:02:29,170 --> 00:02:30,020
+count well as of the
+表示代价函数 J(θ)
+
+73
+00:02:30,090 --> 00:02:31,840
+cost function J can become
+的轮廓图的形状
+
+74
+00:02:32,900 --> 00:02:34,430
+much more, much less
+就会变得偏移没那么严重
+
+75
+00:02:34,840 --> 00:02:36,990
+skewed so the contours may look more like circles.
+可能看起来更圆一些了
+
+76
+00:02:38,210 --> 00:02:39,180
+And if you run gradient
+如果你用这样的代价函数
+
+77
+00:02:39,520 --> 00:02:40,540
+descent on a cost function like
+来执行梯度下降的话
+
+78
+00:02:40,750 --> 00:02:42,120
+this, then gradient descent,
+那么 梯度下降算法
+
+79
+00:02:44,110 --> 00:02:45,630
+you can show mathematically, you can
+你可以从数学上来证明
+
+80
+00:02:45,860 --> 00:02:47,430
+find a much more direct path
+梯度下降算法 就会找到一条
+
+81
+00:02:47,540 --> 00:02:48,830
+to the global minimum rather than taking
+更捷径的路径通向全局最小
+
+82
+00:02:49,390 --> 00:02:51,200
+a much more convoluted path
+而不是像刚才那样
+
+83
+00:02:51,530 --> 00:02:52,530
+where you're sort of trying to
+ 沿着一条让人摸不着头脑的路径
+
+84
+00:02:52,620 --> 00:02:53,520
+follow a much more complicated
+一条复杂得多的轨迹
+
+85
+00:02:54,310 --> 00:02:55,910
+trajectory to get to the global minimum.
+来找到全局最小值
+
+86
+00:02:57,300 --> 00:02:58,710
+So, by scaling the features so
+因此 通过特征缩放
+
+87
+00:02:58,950 --> 00:03:01,000
+that there are, the consumer ranges of values.
+通过"消耗掉"这些值的范围
+
+88
+00:03:01,620 --> 00:03:02,810
+In this example, we end up
+在这个例子中
+
+89
+00:03:02,970 --> 00:03:04,150
+with both features, X one
+我们最终得到的两个特征
+
+90
+00:03:04,300 --> 00:03:06,960
+and X two, between zero and one.
+x1 和 x2 都在0和1之间
+
+91
+00:03:09,580 --> 00:03:12,290
+You can wind up with an implementation of gradient descent.
+这样你得到的梯度下降算法
+
+92
+00:03:12,690 --> 00:03:13,810
+They can convert much faster.
+就会更快地收敛
+
+93
+00:03:18,120 --> 00:03:19,640
+More generally, when we're performing
+更一般地
+
+94
+00:03:20,160 --> 00:03:21,240
+feature scaling, what we often
+我们执行特征缩放时 也就是我们经常
+
+95
+00:03:21,530 --> 00:03:22,480
+want to do is get every
+我们通常的目的是
+
+96
+00:03:22,750 --> 00:03:25,670
+feature into approximately a -1
+将特征的取值约束到
+
+97
+00:03:25,780 --> 00:03:28,170
+to +1 range and concretely,
+-1 到 +1 的范围内
+
+98
+00:03:28,960 --> 00:03:31,710
+your feature x0 is always equal to 1.
+你的特征 x0 是总是等于1
+
+99
+00:03:31,760 --> 00:03:32,810
+So, that's already in that range,
+因此 这已经是在这个范围内
+
+100
+00:03:34,110 --> 00:03:35,150
+but you may end up dividing
+但对其他的特征
+
+101
+00:03:35,630 --> 00:03:36,950
+other features by different numbers
+你可能需要通过除以不同的数
+
+102
+00:03:37,330 --> 00:03:39,150
+to get them to this range.
+来让它们处于同一范围内
+
+103
+00:03:39,510 --> 00:03:41,520
+The numbers -1 and +1 aren't too important.
+-1 和 +1 这两个数字并不是太重要
+
+104
+00:03:42,270 --> 00:03:42,900
+So, if you have a feature,
+所以 如果你有一个特征
+
+105
+00:03:44,150 --> 00:03:45,340
+x1 that winds up
+x1 它的取值
+
+106
+00:03:45,510 --> 00:03:48,000
+being between zero and three, that's not a problem.
+在0和3之间 这没问题
+
+107
+00:03:48,400 --> 00:03:49,410
+If you end up having a different
+如果你有另外一个特征
+
+108
+00:03:49,600 --> 00:03:51,190
+feature that winds being
+取值在-2 到 +0.5之间
+
+109
+00:03:52,140 --> 00:03:54,020
+between -2 and + 0.5,
+这也没什么关系
+
+110
+00:03:54,300 --> 00:03:55,710
+again, this is close enough
+这也非常接近
+
+111
+00:03:56,070 --> 00:03:57,070
+to minus one and plus one
+-1 到 +1的范围
+
+112
+00:03:57,320 --> 00:03:59,160
+that, you know, that's fine, and that's fine.
+这些都可以
+
+113
+00:04:00,310 --> 00:04:01,260
+It's only if you have a
+但如果你有另一个特征
+
+114
+00:04:01,340 --> 00:04:02,580
+different feature, say X 3
+比如叫 x3
+
+115
+00:04:02,820 --> 00:04:04,780
+that is between, that
+假如它的范围
+
+116
+00:04:05,840 --> 00:04:09,070
+ranges from -100 tp +100
+在 -100 到 +100之间
+
+117
+00:04:09,330 --> 00:04:10,850
+, then, this is a
+那么 这个范围
+
+118
+00:04:11,090 --> 00:04:13,570
+very different values than minus 1 and plus 1.
+跟-1到+1就有很大不同了
+
+119
+00:04:13,860 --> 00:04:15,020
+So, this might be a
+所以 这可能是一个
+
+120
+00:04:15,230 --> 00:04:17,480
+less well-skilled feature and similarly,
+不那么好的特征
+
+121
+00:04:17,970 --> 00:04:19,340
+if your features take on a
+类似地 如果你的特征在一个
+
+122
+00:04:19,420 --> 00:04:20,680
+very, very small range of
+非常非常小的范围内
+
+123
+00:04:20,950 --> 00:04:22,060
+values so if X 4
+比如另外一个特征
+
+124
+00:04:22,340 --> 00:04:25,530
+takes on values between minus
+x4 它的范围在
+
+125
+00:04:25,740 --> 00:04:28,290
+0.0001 and positive 0.0001, then
+0.0001和+0.0001之间 那么
+
+126
+00:04:29,720 --> 00:04:30,780
+again this takes on a
+这同样是一个
+
+127
+00:04:30,910 --> 00:04:31,960
+much smaller range of values
+比-1到+1小得多的范围
+
+128
+00:04:32,460 --> 00:04:33,760
+than the minus one to plus one range.
+比-1到+1小得多的范围
+
+129
+00:04:34,040 --> 00:04:36,630
+And again I would consider this feature poorly scaled.
+因此 我同样会认为这个特征也不太好
+
+130
+00:04:37,850 --> 00:04:39,150
+So you want the range of
+所以 可能你认可的范围
+
+131
+00:04:39,430 --> 00:04:40,350
+values, you know, can be
+也许可以大于
+
+132
+00:04:41,070 --> 00:04:42,010
+bigger than plus or smaller
+或者小于 -1 到 +1
+
+133
+00:04:42,370 --> 00:04:43,840
+than plus one, but just
+但是也别太大
+
+134
+00:04:44,040 --> 00:04:45,170
+not much bigger, like plus
+只要大得不多就可以接受
+
+135
+00:04:45,610 --> 00:04:47,470
+100 here, or too
+比如 +100
+
+136
+00:04:47,650 --> 00:04:49,990
+much smaller like 0.00 one over there.
+或者也别太小 比如这里的0.001
+
+137
+00:04:50,770 --> 00:04:52,530
+Different people have different rules of thumb.
+不同的人有不同的经验
+
+138
+00:04:52,870 --> 00:04:53,910
+But the one that I use is
+但是我一般是这么考虑的
+
+139
+00:04:54,070 --> 00:04:55,440
+that if a feature takes
+如果一个特征是在
+
+140
+00:04:55,670 --> 00:04:56,750
+on the range of values from
+-3 到 +3 的范围内
+
+141
+00:04:56,980 --> 00:04:58,590
+say minus three the plus
+那么你应该认为
+
+142
+00:04:58,840 --> 00:05:00,120
+3 how you should think that should
+这个范围是可以接受的
+
+143
+00:05:00,170 --> 00:05:01,690
+be just fine, but maybe
+但如果这个范围
+
+144
+00:05:02,000 --> 00:05:03,050
+it takes on much larger values
+大于了 -3 到 +3 的范围
+
+145
+00:05:03,440 --> 00:05:04,360
+than plus 3 or minus 3
+我可能就要开始注意了
+
+146
+00:05:04,530 --> 00:05:06,400
+unless not to worry and if
+如果它的取值
+
+147
+00:05:06,700 --> 00:05:09,660
+it takes on values from say minus one-third to one-third.
+在-1/3 到+1/3的话
+
+148
+00:05:10,920 --> 00:05:12,020
+You know, I think that's fine
+我觉得 还不错 可以接受
+
+149
+00:05:12,270 --> 00:05:14,880
+too or 0 to one-third or minus one-third to 0.
+或者是0到1/3 或-1/3到0
+
+150
+00:05:14,910 --> 00:05:17,890
+I guess that's typical range of value sector 0 okay.
+这些典型的范围 我都认为是可以接受的
+
+151
+00:05:18,560 --> 00:05:19,310
+But it will take on a
+但如果特征的范围
+
+152
+00:05:19,450 --> 00:05:20,640
+much tinier range of values
+取得很小的话
+
+153
+00:05:20,900 --> 00:05:23,220
+like x4 here than gain on mine not to worry.
+比如像这里的 x4 你就要开始考虑进行特征缩放了
+
+154
+00:05:23,790 --> 00:05:25,060
+So, the take-home message
+因此 总的来说
+
+155
+00:05:25,500 --> 00:05:26,780
+is don't worry if your
+不用过于担心
+
+156
+00:05:27,000 --> 00:05:28,550
+features are not exactly on
+你的特征是否在完全
+
+157
+00:05:28,700 --> 00:05:30,920
+the same scale or exactly in the same range of values.
+相同的范围或区间内
+
+158
+00:05:31,170 --> 00:05:31,930
+But so long as they're all
+但是只要他们都
+
+159
+00:05:32,090 --> 00:05:35,060
+close enough to this gradient descent it should work okay.
+只要它们足够接近的话 梯度下降法就会正常地工作
+
+160
+00:05:35,930 --> 00:05:37,530
+In addition to dividing by
+除了在特征缩放中
+
+161
+00:05:37,930 --> 00:05:39,960
+so that the maximum value when
+将特征除以最大值以外
+
+162
+00:05:40,220 --> 00:05:42,080
+performing feature scaling sometimes
+有时候我们也会进行一个
+
+163
+00:05:42,730 --> 00:05:45,070
+people will also do what's called mean normalization.
+称为均值归一化的工作(mean normalization)
+
+164
+00:05:45,330 --> 00:05:47,150
+And what I mean by
+我的意思是这样的
+
+165
+00:05:47,320 --> 00:05:48,130
+that is that you want
+如果你有一个特征 xi
+
+166
+00:05:48,350 --> 00:05:49,810
+to take a feature Xi and replace
+你就用 xi - μi 来替换
+
+167
+00:05:50,230 --> 00:05:51,850
+it with Xi minus new i
+通过这样做 让你的特征值
+
+168
+00:05:52,870 --> 00:05:55,260
+to make your features have approximately 0 mean.
+具有为0的平均值
+
+169
+00:05:56,530 --> 00:05:57,730
+And obviously we want
+很明显 我们不需要
+
+170
+00:05:57,890 --> 00:05:59,260
+to apply this to the future
+把这一步应用到
+
+171
+00:05:59,650 --> 00:06:00,750
+x zero, because the future
+x0中
+
+172
+00:06:00,940 --> 00:06:02,260
+x zero is always equal to
+因为 x0 总是等于1的
+
+173
+00:06:02,360 --> 00:06:03,600
+one, so it cannot have an
+所以它不可能有
+
+174
+00:06:03,810 --> 00:06:05,100
+average value of zero.
+为0的的平均值
+
+175
+00:06:06,370 --> 00:06:07,760
+But it concretely for other
+但是
+
+176
+00:06:07,950 --> 00:06:09,320
+features if the range
+对其他的特征来说
+
+177
+00:06:09,600 --> 00:06:10,320
+of sizes of the house
+比如房子的大小
+
+178
+00:06:10,960 --> 00:06:14,170
+takes on values between 0
+取值介于0到2000
+
+179
+00:06:14,310 --> 00:06:15,080
+to 2000 and if you know,
+并且假如
+
+180
+00:06:15,230 --> 00:06:16,230
+the average size of a
+房子面积
+
+181
+00:06:16,470 --> 00:06:18,340
+house is equal to
+的平均值
+
+182
+00:06:18,500 --> 00:06:20,080
+1000 then you might
+是等于1000的
+
+183
+00:06:21,470 --> 00:06:21,950
+use this formula.
+那么你可以用这个公式
+
+184
+00:06:23,940 --> 00:06:24,970
+Size, set the feature
+将 x1 的值变为
+
+185
+00:06:25,250 --> 00:06:26,270
+X1 to the size minus
+x1 减去平均值 μ1
+
+186
+00:06:26,590 --> 00:06:28,010
+the average value divided by 2000
+再除以2000
+
+187
+00:06:28,630 --> 00:06:31,820
+and similarly, on average
+类似地
+
+188
+00:06:32,530 --> 00:06:34,010
+if your houses have
+如果你的房子有
+
+189
+00:06:34,520 --> 00:06:37,630
+one to five bedrooms and if
+五间卧室
+
+190
+00:06:39,240 --> 00:06:40,460
+on average a house has
+并且平均一套房子有
+
+191
+00:06:40,890 --> 00:06:41,920
+two bedrooms then you might
+两间卧室 那么你可以
+
+192
+00:06:42,110 --> 00:06:44,750
+use this formula to mean
+使用这个公式
+
+193
+00:06:45,080 --> 00:06:47,460
+normalize your second feature x2.
+来归一化你的第二个特征 x2
+
+194
+00:06:49,340 --> 00:06:50,720
+In both of these cases, you
+在这两种情况下
+
+195
+00:06:50,840 --> 00:06:52,730
+therefore wind up with features x1 and x2.
+你可以算出新的特征 x1 和 x2
+
+196
+00:06:52,930 --> 00:06:54,490
+They can take on values roughly
+这样它们的范围
+
+197
+00:06:54,880 --> 00:06:56,580
+between minus .5 and positive .5.
+可以在-0.5和+0.5之间
+
+198
+00:06:57,130 --> 00:06:57,880
+Exactly not true - X2
+当然这肯定不对
+
+199
+00:06:58,210 --> 00:07:00,920
+can actually be slightly larger than .5 but, close enough.
+x2的值实际上肯定会大于0.5 但很接近
+
+200
+00:07:01,800 --> 00:07:03,140
+And the more general rule is
+更一般的规律是
+
+201
+00:07:03,530 --> 00:07:04,860
+that you might take a
+你可以用这样的公式
+
+202
+00:07:04,900 --> 00:07:06,390
+feature X1 and replace
+你可以用 (x1 - μ1)/S1
+
+203
+00:07:08,060 --> 00:07:10,110
+it with X1 minus mu1
+来替换原来的特征 x1
+
+204
+00:07:10,940 --> 00:07:13,410
+over S1 where to
+其中定义
+
+205
+00:07:13,550 --> 00:07:15,890
+define these terms mu1 is
+μ1的意思是
+
+206
+00:07:16,200 --> 00:07:18,290
+the average value of x1
+在训练集中
+
+207
+00:07:19,960 --> 00:07:21,310
+in the training sets
+特征 x1 的平均值
+
+208
+00:07:22,320 --> 00:07:24,190
+and S1 is the
+而 S1 是
+
+209
+00:07:24,350 --> 00:07:27,420
+range of values of that
+该特征值的范围
+
+210
+00:07:27,820 --> 00:07:28,940
+feature and by range, I
+我说的范围是指
+
+211
+00:07:29,040 --> 00:07:30,110
+mean let's say the maximum
+最大值减去最小值
+
+212
+00:07:30,630 --> 00:07:31,900
+value minus the minimum
+最大值减去最小值
+
+213
+00:07:32,290 --> 00:07:33,350
+value or for those
+或者学过
+
+214
+00:07:33,590 --> 00:07:35,360
+of you that understand the deviation
+标准差的同学可以记住
+
+215
+00:07:35,850 --> 00:07:37,390
+of the variable is setting S1
+也可以把 S1 设为
+
+216
+00:07:37,760 --> 00:07:40,790
+to be the standard deviation of the variable would be fine, too.
+变量的标准差
+
+217
+00:07:41,020 --> 00:07:43,240
+But taking, you know, this max minus min would be fine.
+但其实用最大值减最小值就可以了
+
+218
+00:07:44,330 --> 00:07:45,170
+And similarly for the second
+类似地 对于第二个
+
+219
+00:07:45,610 --> 00:07:47,380
+feature, x2, you replace
+特征 x2
+
+220
+00:07:47,840 --> 00:07:49,740
+x2 with this sort of
+你也可以用同样的这个
+
+221
+00:07:51,040 --> 00:07:52,220
+subtract the mean of the feature
+特征减去平均值
+
+222
+00:07:52,800 --> 00:07:54,110
+and divide it by the range
+再除以范围 来替换原特征
+
+223
+00:07:54,380 --> 00:07:55,980
+of values meaning the max minus min.
+范围的意思依然是最大值减最小值
+
+224
+00:07:56,880 --> 00:07:57,910
+And this sort of formula will
+这类公式将
+
+225
+00:07:58,370 --> 00:07:59,630
+get your features, you know, maybe
+把你的特征
+
+226
+00:07:59,850 --> 00:08:01,020
+not exactly, but maybe roughly
+变成这样的范围
+
+227
+00:08:01,920 --> 00:08:03,320
+into these sorts of
+也许不是完全这样
+
+228
+00:08:03,490 --> 00:08:04,820
+ranges, and by the
+但大概是这样的范围
+
+229
+00:08:04,890 --> 00:08:05,700
+way, for those of you that
+顺便提一下
+
+230
+00:08:05,940 --> 00:08:07,570
+are being super careful technically if
+有些同学可能比较仔细
+
+231
+00:08:07,710 --> 00:08:09,300
+we're taking the range as max
+如果我们用最大值减最小值
+
+232
+00:08:09,610 --> 00:08:12,410
+minus min this five here will actually become a four.
+来表示范围的话 这里的5有可能应该是4
+
+233
+00:08:13,140 --> 00:08:14,390
+So if max is 5
+如果最大值为5
+
+234
+00:08:14,600 --> 00:08:15,830
+minus 1 then the range of
+那么减去最小值1
+
+235
+00:08:16,320 --> 00:08:17,160
+their own values is actually
+这个范围值就是4
+
+236
+00:08:17,860 --> 00:08:18,530
+equal to 4, but all of these
+但不管咋说 这些取值
+
+237
+00:08:18,690 --> 00:08:20,380
+are approximate and any value
+都是非常近似的
+
+238
+00:08:20,830 --> 00:08:22,010
+that gets the features into
+只要将特征转换为
+
+239
+00:08:22,450 --> 00:08:24,750
+anything close to these sorts of ranges will do fine.
+相近似的范围 就都是可以的
+
+240
+00:08:25,200 --> 00:08:27,220
+And the feature scaling
+特征缩放其实
+
+241
+00:08:27,660 --> 00:08:28,520
+doesn't have to be too exact,
+并不需要太精确
+
+242
+00:08:29,050 --> 00:08:30,390
+in order to get gradient
+只是为了让梯度下降
+
+243
+00:08:30,790 --> 00:08:32,290
+descent to run quite a lot faster.
+能够运行得更快一点而已
+
+244
+00:08:34,610 --> 00:08:35,840
+So, now you know
+好的 现在你知道了
+
+245
+00:08:36,020 --> 00:08:37,420
+about feature scaling and if
+什么是特征缩放
+
+246
+00:08:37,530 --> 00:08:39,040
+you apply this simple trick, it
+通过使用这个简单的方法
+
+247
+00:08:39,250 --> 00:08:40,650
+and make gradient descent run much
+你可以将梯度下降的速度变得更快
+
+248
+00:08:40,870 --> 00:08:43,680
+faster and converge in a lot fewer other iterations.
+让梯度下降收敛所需的循环次数更少
+
+249
+00:08:44,990 --> 00:08:45,540
+That was feature scaling.
+这就是特征缩放
+
+250
+00:08:46,080 --> 00:08:47,190
+In the next video, I'll tell
+在接下来的视频中
+
+251
+00:08:47,350 --> 00:08:49,410
+you about another trick to make
+我将介绍另一种技巧来使梯度下降
+
+252
+00:08:49,710 --> 00:08:50,970
+gradient descent work well in practice.
+在实践中工作地更好
+
diff --git a/srt/4 - 4 - Gradient Descent in Practice II - Learning Rate (9 min).srt b/srt/4 - 4 - Gradient Descent in Practice II - Learning Rate (9 min).srt
new file mode 100644
index 00000000..d0ac4b3a
--- /dev/null
+++ b/srt/4 - 4 - Gradient Descent in Practice II - Learning Rate (9 min).srt
@@ -0,0 +1,1361 @@
+1
+00:00:00,040 --> 00:00:01,057
+In this video, I wanna give
+在本段视频中 我想告诉大家
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,072 --> 00:00:04,041
+you more practical tips for getting gradient descent to work.
+一些关于梯度下降算法的实用技巧
+
+3
+00:00:05,003 --> 00:00:06,024
+The ideas in this video will
+我将集中讨论
+
+4
+00:00:06,046 --> 00:00:08,025
+center around the learning rate alpha.
+学习率 α
+
+5
+00:00:09,092 --> 00:00:11,024
+Concretely, here's the gradient
+具体来说 这是梯度下降算法的
+
+6
+00:00:11,064 --> 00:00:13,040
+descent update rule and what
+更新规则
+
+7
+00:00:13,065 --> 00:00:14,036
+I want to do in this video
+这里我想要
+
+8
+00:00:14,086 --> 00:00:16,062
+is tell you about what
+告诉大家
+
+9
+00:00:16,078 --> 00:00:18,046
+I think of as debugging and some
+如何调试
+
+10
+00:00:18,060 --> 00:00:19,064
+tips for making sure that
+也就是我认为应该如何确定
+
+11
+00:00:19,085 --> 00:00:21,017
+Gradient Descent is working correctly
+梯度下降是正常工作的
+
+12
+00:00:22,039 --> 00:00:23,037
+and second, I want to tell you
+此外我还想告诉大家
+
+13
+00:00:23,058 --> 00:00:25,051
+how to choose the rates
+如何选择学习率 α
+
+14
+00:00:25,089 --> 00:00:26,071
+out for, but this is how
+也就是我平常
+
+15
+00:00:27,007 --> 00:00:28,053
+I go about choosing it.
+如何选择这个参数
+
+16
+00:00:29,021 --> 00:00:30,041
+Here's something that I often do
+我通常是怎样确定
+
+17
+00:00:30,064 --> 00:00:32,071
+to make sure gradient descent is working correctly.
+梯度下降正常工作的
+
+18
+00:00:34,011 --> 00:00:35,056
+The job of gradient descent is
+梯度下降算法所做的事情
+
+19
+00:00:35,082 --> 00:00:37,003
+to find a value of
+就是为你找到
+
+20
+00:00:37,010 --> 00:00:38,050
+theta for you that, you
+一个 θ 值
+
+21
+00:00:38,063 --> 00:00:40,082
+know, hopefully minimizes the cost function j of theta.
+并希望它能够最小化代价函数 J(θ)
+
+22
+00:00:42,067 --> 00:00:43,086
+What I often do is therefore
+我通常会在
+
+23
+00:00:44,029 --> 00:00:45,089
+pluck the cost function j
+梯度下降算法运行时
+
+24
+00:00:46,010 --> 00:00:48,089
+of theta as gradient descent runs.
+绘出代价函数 J(θ) 的值
+
+25
+00:00:49,075 --> 00:00:51,009
+So, the x-axis here is
+这里的 x 轴是表示
+
+26
+00:00:51,031 --> 00:00:52,032
+the number of iteration of gradient
+梯度下降算法的
+
+27
+00:00:52,085 --> 00:00:53,096
+descent and as gradient descent
+迭代步数
+
+28
+00:00:54,025 --> 00:00:55,079
+runs, you'll hopefully get a
+你可能会得到
+
+29
+00:00:55,096 --> 00:00:58,025
+plot that maybe looks like this.
+这样一条曲线
+
+30
+00:00:59,067 --> 00:01:00,088
+Notice that the x-axis is
+注意 这里的 x 轴
+
+31
+00:01:01,017 --> 00:01:02,092
+a number of iterations previously
+是迭代步数
+
+32
+00:01:03,057 --> 00:01:04,076
+we were looking at plots of
+在我们以前看到的
+
+33
+00:01:05,006 --> 00:01:06,065
+J of theta where the
+J(θ) 曲线中
+
+34
+00:01:07,004 --> 00:01:08,009
+X-axis, where the horizontal axis,
+x 轴 也就是横轴
+
+35
+00:01:08,095 --> 00:01:12,026
+was the parameter vector theta but this is not where this is.
+曾经用来表示参数 θ 但这里不是
+
+36
+00:01:13,007 --> 00:01:14,073
+Concretely, what this point
+具体来说
+
+37
+00:01:15,009 --> 00:01:17,073
+is is I'm going
+这一点的含义是这样的
+
+38
+00:01:17,090 --> 00:01:19,050
+to rank gradient descent for hundred iterations.
+当我运行完100步的梯度下降迭代之后
+
+39
+00:01:20,057 --> 00:01:22,006
+And whatever value I get
+无论我得到
+
+40
+00:01:22,062 --> 00:01:23,090
+for theta after a hundred
+什么 θ 值
+
+41
+00:01:24,010 --> 00:01:25,042
+of the rations and get,
+总之 100步迭代之后
+
+42
+00:01:25,060 --> 00:01:26,076
+you know, some value of theta
+我将得到
+
+43
+00:01:27,015 --> 00:01:28,095
+after a hundred iterations and I'm
+一个 θ 值
+
+44
+00:01:29,009 --> 00:01:30,035
+going to evaluate the cost
+根据100步迭代之后
+
+45
+00:01:30,067 --> 00:01:32,057
+function J of theta for
+得到的这个 θ 值
+
+46
+00:01:32,084 --> 00:01:33,078
+the value of theta I get
+我将算出
+
+47
+00:01:34,012 --> 00:01:36,001
+after a hundred iterations and this
+代价函数 J(θ) 的值
+
+48
+00:01:36,021 --> 00:01:37,060
+vertical height is the
+而这个点的垂直高度就代表
+
+49
+00:01:37,068 --> 00:01:39,073
+value of J of theta for
+梯度下降算法
+
+50
+00:01:39,090 --> 00:01:40,075
+the value of theta I got
+100步迭代之后
+
+51
+00:01:41,010 --> 00:01:42,015
+after a hundred other ratios of
+得到的 θ
+
+52
+00:01:42,021 --> 00:01:43,084
+gradient descent and this
+算出的 J(θ) 值
+
+53
+00:01:44,004 --> 00:01:45,070
+point here, that corresponds
+而这个点
+
+54
+00:01:46,051 --> 00:01:48,012
+to the value of J of
+则是梯度下降算法
+
+55
+00:01:48,023 --> 00:01:49,071
+theta for the theta
+迭代200次之后
+
+56
+00:01:50,006 --> 00:01:51,081
+that I get after I've
+得到的 θ
+
+57
+00:01:52,004 --> 00:01:53,068
+run grade and descent for two hundred iterations.
+算出的 J(θ) 值
+
+58
+00:01:55,023 --> 00:01:56,018
+So what this plot is showing,
+所以这条曲线
+
+59
+00:01:56,071 --> 00:01:58,009
+is it's showing the value of
+显示的是
+
+60
+00:01:58,020 --> 00:02:01,020
+your cost function after iteration of grade and descent.
+梯度下降算法迭代过程中代价函数 J(θ) 的值
+
+61
+00:02:02,001 --> 00:02:03,012
+And, if grade and descent is
+如果梯度下降算法
+
+62
+00:02:03,034 --> 00:02:04,098
+working properly, then J
+正常工作
+
+63
+00:02:05,018 --> 00:02:06,093
+of theta should decrease.
+那么每一步迭代之后
+
+64
+00:02:10,006 --> 00:02:10,065
+after every iteration.
+J(θ) 都应该下降
+
+65
+00:02:17,081 --> 00:02:19,028
+And one useful thing
+这条曲线
+
+66
+00:02:19,053 --> 00:02:20,037
+that this sort of plot can
+的一个用处在于
+
+67
+00:02:20,050 --> 00:02:21,075
+tell you also is that
+它可以告诉你
+
+68
+00:02:22,050 --> 00:02:23,087
+if you look at the specific figure
+如果你看一下
+
+69
+00:02:24,015 --> 00:02:25,040
+that I've drawn, it looks like
+我画的这条曲线
+
+70
+00:02:26,003 --> 00:02:27,034
+by the time you've gotten out
+当你达到
+
+71
+00:02:27,058 --> 00:02:28,078
+to three hundred iterations,
+300步迭代之后
+
+72
+00:02:29,072 --> 00:02:31,000
+between three and four hundred
+也就是300步到400步迭代之间
+
+73
+00:02:31,031 --> 00:02:32,080
+iterations, in this segment, it
+也就是曲线的这一段
+
+74
+00:02:32,090 --> 00:02:35,069
+looks like J of theta hasn't gone down much more.
+看起来 J(θ) 并没有下降多少
+
+75
+00:02:35,081 --> 00:02:36,072
+So by the time you get
+所以当你
+
+76
+00:02:36,096 --> 00:02:38,059
+to four hundred iterations, it looks
+到达400步迭代时
+
+77
+00:02:38,081 --> 00:02:40,080
+like this curve has flattened out here.
+这条曲线看起来已经很平坦了
+
+78
+00:02:41,055 --> 00:02:43,018
+And so, way out
+也就是说
+
+79
+00:02:43,034 --> 00:02:44,040
+here at four hundred iterations, it
+在这里400步迭代的时候
+
+80
+00:02:44,050 --> 00:02:45,056
+looks like grade and descend has
+梯度下降算法
+
+81
+00:02:45,084 --> 00:02:47,075
+more or less converged because your
+基本上已经收敛了
+
+82
+00:02:47,087 --> 00:02:49,056
+cost function isn't going down much more.
+因为代价函数并没有继续下降
+
+83
+00:02:50,049 --> 00:02:51,043
+So looking at this figure can
+所以说 看这条曲线
+
+84
+00:02:51,059 --> 00:02:52,099
+also help you judge
+可以帮助你判断
+
+85
+00:02:53,041 --> 00:02:55,012
+whether or not gradient descent has converged.
+梯度下降算法是否已经收敛
+
+86
+00:02:57,055 --> 00:02:58,050
+By the way, the number of
+顺便说一下
+
+87
+00:02:58,084 --> 00:03:00,041
+iterations that gradient descent takes
+对于每一个特定的问题
+
+88
+00:03:00,078 --> 00:03:01,075
+to converge for a physical
+梯度下降算法所需的迭代次数
+
+89
+00:03:01,090 --> 00:03:03,081
+application can vary a lot.
+可以相差很大
+
+90
+00:03:04,019 --> 00:03:05,062
+So maybe for one application gradient
+也许对于某一个问题
+
+91
+00:03:06,012 --> 00:03:07,056
+descent may converge after just
+梯度下降算法
+
+92
+00:03:07,083 --> 00:03:09,065
+thirty iterations, for a
+只需要30步迭代就可以收敛
+
+93
+00:03:10,021 --> 00:03:12,027
+different application gradient descent
+然而换一个问题
+
+94
+00:03:12,059 --> 00:03:14,015
+made the 3,000 iterations.
+也许梯度下降算法就需要3000步迭代
+
+95
+00:03:15,005 --> 00:03:17,055
+For another learning algorithm
+对于另一个机器学习问题
+
+96
+00:03:17,097 --> 00:03:19,009
+it may take three million iterations.
+则可能需要三百万步迭代
+
+97
+00:03:19,081 --> 00:03:20,062
+It turns out to be
+实际上
+
+98
+00:03:20,072 --> 00:03:22,021
+very difficult to tell in
+我们很难提前判断
+
+99
+00:03:22,030 --> 00:03:24,000
+advance how many iterations gradient
+梯度下降算法
+
+100
+00:03:24,036 --> 00:03:25,075
+descent needs to converge, and
+需要多少步迭代才能收敛
+
+101
+00:03:26,015 --> 00:03:27,094
+is usually by plotting this sort of plot.
+通常我们需要画出这类曲线
+
+102
+00:03:28,093 --> 00:03:32,025
+Plotting the cause function as we increase the number of iterations.
+画出代价函数随迭代步数数增加的变化曲线
+
+103
+00:03:32,096 --> 00:03:33,087
+It's usually by looking at these
+通常 我会通过看这种曲线
+
+104
+00:03:34,034 --> 00:03:35,040
+plots that I tried to tell
+来试着判断
+
+105
+00:03:35,059 --> 00:03:37,006
+if gradient descent has converged.
+梯度下降算法是否已经收敛
+
+106
+00:03:38,059 --> 00:03:39,081
+It is also possible to come
+另外 也可以
+
+107
+00:03:40,012 --> 00:03:42,040
+up with automatic convergence test; namely
+进行一些自动的收敛测试
+
+108
+00:03:42,074 --> 00:03:44,006
+to have an algorithm to try
+也就是说用一种算法
+
+109
+00:03:44,028 --> 00:03:46,027
+to tell you if gradient descent
+来告诉你梯度下降算法
+
+110
+00:03:46,059 --> 00:03:48,040
+has converged and here's maybe
+是否已经收敛
+
+111
+00:03:48,062 --> 00:03:50,015
+a pretty typical example of an
+自动收敛测试
+
+112
+00:03:50,024 --> 00:03:52,031
+automatic convergence test and
+一个非常典型的例子是
+
+113
+00:03:52,053 --> 00:03:53,094
+so, you test the clear convergence
+如果代价函数 J(θ)
+
+114
+00:03:54,096 --> 00:03:56,024
+if your cause function jf theta
+的下降小于
+
+115
+00:03:57,002 --> 00:03:58,015
+decreases by less than
+一个很小的值 ε
+
+116
+00:03:58,037 --> 00:04:01,025
+some small value epsilon, some
+那么就认为已经收敛
+
+117
+00:04:01,040 --> 00:04:02,031
+small value ten to the
+比如可以选择
+
+118
+00:04:02,040 --> 00:04:03,081
+minus three in one iteration,
+1e-3
+
+119
+00:04:05,025 --> 00:04:06,065
+but I find that usually
+但我发现
+
+120
+00:04:07,006 --> 00:04:09,053
+choosing what this threshold is is pretty difficult.
+通常要选择一个合适的阈值 ε 是相当困难的
+
+121
+00:04:10,071 --> 00:04:11,087
+So, in order to check
+因此 为了检查
+
+122
+00:04:12,003 --> 00:04:13,075
+your gradient descent has converged, I
+梯度下降算法是否收敛
+
+123
+00:04:14,009 --> 00:04:15,012
+actually tend to look at
+我实际上还是
+
+124
+00:04:15,034 --> 00:04:16,072
+plots like like this
+通过看
+
+125
+00:04:17,005 --> 00:04:18,012
+figure on the left rather than
+左边的这条曲线图
+
+126
+00:04:18,031 --> 00:04:20,063
+rely on an automatic convergence test.
+而不是依靠自动收敛测试
+
+127
+00:04:21,076 --> 00:04:22,063
+Looking at this sort of
+此外 这种曲线图
+
+128
+00:04:22,077 --> 00:04:24,013
+figure can also tell you or
+也可以
+
+129
+00:04:24,031 --> 00:04:25,055
+give you an advanced warning if maybe
+在算法没有正常工作时
+
+130
+00:04:25,081 --> 00:04:27,049
+gradient descent is not working correctly.
+提前警告你
+
+131
+00:04:28,068 --> 00:04:29,076
+Concretely, if you plug
+具体地说
+
+132
+00:04:30,019 --> 00:04:31,041
+jf theta as a function
+如果代价函数 J(θ)
+
+133
+00:04:31,064 --> 00:04:34,044
+of number of iterations, then, if
+随迭代步数
+
+134
+00:04:34,085 --> 00:04:35,050
+you see a figure like this,
+的变化曲线是这个样子
+
+135
+00:04:35,081 --> 00:04:36,072
+where J of theta is actually
+J(θ) 实际上在不断上升
+
+136
+00:04:37,012 --> 00:04:38,088
+increasing, then that gives
+那么这就很明确的表示
+
+137
+00:04:39,011 --> 00:04:41,075
+you a clear sign that gradient descent is not working.
+梯度下降算法没有正常工作
+
+138
+00:04:42,088 --> 00:04:44,006
+And a figure like this
+而这样的曲线图
+
+139
+00:04:44,051 --> 00:04:46,093
+usually means that you should be using a learning rate alpha.
+通常意味着你应该使用较小的学习率 α
+
+140
+00:04:48,026 --> 00:04:49,042
+If J of theta is actually
+如果 J(θ) 在上升
+
+141
+00:04:49,062 --> 00:04:51,020
+increasing, the most common
+那么最常见的原因是
+
+142
+00:04:51,057 --> 00:04:52,082
+cause for that is if
+你在最小化
+
+143
+00:04:53,018 --> 00:04:54,035
+you're trying to minimize
+这样的
+
+144
+00:04:54,086 --> 00:04:57,089
+the function that maybe looks like this.
+一个函数
+
+145
+00:04:59,033 --> 00:05:00,037
+That's if your learning rate is
+这时如果你的学习率太大
+
+146
+00:05:00,045 --> 00:05:01,045
+too big then if you
+当你从这里开始
+
+147
+00:05:01,060 --> 00:05:02,093
+start off there, gradient descent
+梯度下降算法
+
+148
+00:05:03,019 --> 00:05:05,026
+may overshoot the minimum, send
+可能将冲过最小值达到这里
+
+149
+00:05:05,044 --> 00:05:06,075
+you there, then if only there's
+而如果你的学习率太大
+
+150
+00:05:07,007 --> 00:05:08,014
+too big, you may overshoot again,
+你可能再次冲过最小值
+
+151
+00:05:08,050 --> 00:05:10,037
+it will send you there and
+达到这里
+
+152
+00:05:10,050 --> 00:05:11,091
+so on so that what
+然后一直这样下去
+
+153
+00:05:12,025 --> 00:05:13,061
+you really wanted was really
+而你真正想要的是
+
+154
+00:05:13,081 --> 00:05:16,036
+start here and for to slowly go downhill.
+从这里开始慢慢的下降
+
+155
+00:05:17,093 --> 00:05:19,025
+But if the learning is too
+但是 如果学习率过大
+
+156
+00:05:19,044 --> 00:05:20,095
+big then gradient descent can
+那么梯度下降算法
+
+157
+00:05:21,025 --> 00:05:22,057
+instead keep on over
+将会不断的
+
+158
+00:05:22,075 --> 00:05:24,030
+shooting the minimum so
+冲过最小值
+
+159
+00:05:24,044 --> 00:05:25,067
+that you actually end up
+然后你将得到
+
+160
+00:05:26,016 --> 00:05:27,017
+getting worse and worse instead
+越来越糟糕的结果
+
+161
+00:05:27,020 --> 00:05:28,072
+of getting the higher values of
+得到越来越大的
+
+162
+00:05:28,077 --> 00:05:29,079
+the cost function j of theta
+代价函数 J(θ) 值
+
+163
+00:05:30,070 --> 00:05:31,051
+so do you end up with a
+所以如果你得到了
+
+164
+00:05:31,067 --> 00:05:33,013
+plot like and if you
+这样一个曲线图
+
+165
+00:05:33,022 --> 00:05:34,011
+see a plot like this the
+如果你看到这样一个曲线图
+
+166
+00:05:34,018 --> 00:05:35,086
+fix usually is to just
+通常的解决方法是
+
+167
+00:05:36,008 --> 00:05:37,068
+use a smaller value of alpha.
+使用较小的 α 值
+
+168
+00:05:38,016 --> 00:05:39,063
+Oh, and also of course make
+当然也要确保
+
+169
+00:05:39,079 --> 00:05:41,061
+sure that your code does not have a bug in it.
+你的代码中没有错误
+
+170
+00:05:41,079 --> 00:05:43,012
+But usually to watch it
+但通常最可能
+
+171
+00:05:43,020 --> 00:05:44,054
+out of the firms is the
+出现的错误是
+
+172
+00:05:44,060 --> 00:05:46,031
+most common, could be a common problem.
+α 值过大
+
+173
+00:05:49,005 --> 00:05:50,041
+Similarly, sometimes, you may
+同样的 有时你可能
+
+174
+00:05:50,056 --> 00:05:51,089
+also see j of theta
+看到这种形状的
+
+175
+00:05:52,012 --> 00:05:53,007
+do something like this and it
+J(θ) 曲线
+
+176
+00:05:53,018 --> 00:05:54,005
+go down for a while then
+它先下降 然后上升
+
+177
+00:05:54,016 --> 00:05:56,013
+go up then go down for a while then go up.
+接着又下降 然后又上升
+
+178
+00:05:56,032 --> 00:05:57,012
+Go down for a while, it
+然后再次下降
+
+179
+00:05:57,022 --> 00:05:58,091
+goes up and so on and
+再次上升 如此往复
+
+180
+00:05:58,093 --> 00:05:59,094
+and to fix for something like
+而解决这种情况的方法
+
+181
+00:06:00,013 --> 00:06:02,075
+this is also to use a smaller value of algorithm.
+通常同样是选择较小 α 值
+
+182
+00:06:04,008 --> 00:06:04,095
+I'm not going to prove it
+我不打算证明这一点
+
+183
+00:06:05,007 --> 00:06:06,081
+here, but undeniable assumptions about
+但对于我们讨论的线性回归
+
+184
+00:06:07,010 --> 00:06:09,075
+the cost function, which does proof of linear regression.
+可以很容易从数学上证明
+
+185
+00:06:10,082 --> 00:06:12,047
+You can show of mathematicians have
+只要学习率足够小
+
+186
+00:06:12,057 --> 00:06:13,058
+shown that if your learning
+那么每次迭代之后
+
+187
+00:06:13,091 --> 00:06:15,006
+rate offer is small enough
+代价函数 J(θ)
+
+188
+00:06:15,083 --> 00:06:18,043
+then j of theta should decrease on every single iteration.
+都会下降
+
+189
+00:06:19,002 --> 00:06:20,085
+So, if this doesn't happen, probably
+因此如果代价函数没有下降
+
+190
+00:06:21,033 --> 00:06:22,019
+means algorithm is too big then
+那可能以为着学习率过大
+
+191
+00:06:22,026 --> 00:06:23,089
+you should send a smaller, but of
+这时你就应该尝试一个较小的学习率
+
+192
+00:06:23,097 --> 00:06:24,079
+course, you all So you don't
+当然 你也不希望
+
+193
+00:06:24,088 --> 00:06:25,068
+want your learning rate to be
+学习度太小
+
+194
+00:06:25,073 --> 00:06:26,095
+too small because if you
+因为如果这样
+
+195
+00:06:27,006 --> 00:06:27,091
+do that, if you were
+如果你这么做
+
+196
+00:06:28,002 --> 00:06:30,042
+to do that, then gradient descent can be slow to converge.
+那么梯度下降算法可能收敛得很慢
+
+197
+00:06:31,049 --> 00:06:32,051
+And if alpha were too
+如果学习率 α 太小
+
+198
+00:06:32,080 --> 00:06:34,005
+small, you might end up
+你可能
+
+199
+00:06:34,074 --> 00:06:36,075
+starting out here, say, and,
+从这里开始
+
+200
+00:06:36,095 --> 00:06:37,091
+you know, end up taking just
+然后很缓慢很缓慢
+
+201
+00:06:38,022 --> 00:06:39,069
+minuscule, minuscule baby steps.
+向最低点移动
+
+202
+00:06:40,074 --> 00:06:40,074
+Right?
+这样一来
+
+203
+00:06:40,088 --> 00:06:42,022
+And just taking a lot
+你需要迭代很多次
+
+204
+00:06:42,098 --> 00:06:46,031
+of iterations before you finally get to the minimum.
+才能到达最低点
+
+205
+00:06:47,008 --> 00:06:47,098
+And so, if alpha is too
+因此 如果学习率 α 太小
+
+206
+00:06:48,011 --> 00:06:49,050
+small, gradient descent can
+梯度下降算法
+
+207
+00:06:49,056 --> 00:06:51,022
+make very slow progress and be slow to converge.
+的收敛将会很缓慢
+
+208
+00:06:53,081 --> 00:06:55,010
+To summarize, if the learning
+总结一下
+
+209
+00:06:55,037 --> 00:06:57,006
+rate is too small, you can
+如果学习率 α 太小
+
+210
+00:06:57,026 --> 00:06:59,037
+have a slow convergence problem, and
+你会遇到收敛速度慢的问题
+
+211
+00:06:59,062 --> 00:07:00,093
+if the learning rate is too
+而如果学习率 α 太大
+
+212
+00:07:01,010 --> 00:07:02,033
+large, j of theta may
+代价函数 J(θ) 可能不会在
+
+213
+00:07:02,047 --> 00:07:03,043
+not decrease on every iteration
+每次迭代都下降
+
+214
+00:07:04,039 --> 00:07:05,056
+and may not even converge.
+甚至可能不收敛
+
+215
+00:07:07,010 --> 00:07:08,020
+In some cases, if the learning
+在某些情况下
+
+216
+00:07:08,052 --> 00:07:10,007
+rate is too large, slow convergence
+如果学习率 α 过大
+
+217
+00:07:10,099 --> 00:07:14,070
+is also possible, but the
+也可能出现收敛缓慢的问题
+
+218
+00:07:14,080 --> 00:07:16,001
+more common problem you see
+但更常见的情况是
+
+219
+00:07:16,027 --> 00:07:17,037
+is that just that j of
+你会发现代价函数 J(θ)
+
+220
+00:07:17,043 --> 00:07:19,026
+theta may not decrease on every iteration.
+并不会在每次迭代之后都下降
+
+221
+00:07:20,054 --> 00:07:21,089
+And in order to debug all
+而为了调试
+
+222
+00:07:22,013 --> 00:07:24,017
+of these things, often plotting that
+所有这些情况
+
+223
+00:07:24,043 --> 00:07:25,073
+j of theta as a function
+绘制J(θ)随迭代步数变化的曲线
+
+224
+00:07:26,006 --> 00:07:28,069
+of the number of iterations can help you figure out what's going on.
+通常可以帮助你弄清楚到底发生了什么
+
+225
+00:07:29,026 --> 00:07:30,075
+Concretely, what I actually
+具体来说
+
+226
+00:07:31,022 --> 00:07:32,013
+do when I run gradient
+当我运行梯度下降算法时
+
+227
+00:07:32,051 --> 00:07:34,072
+descent is I would try a range of values.
+我通常会尝试一系列α值
+
+228
+00:07:35,000 --> 00:07:36,017
+So just try running gradient descent
+所以在运行梯度下降算法制
+
+229
+00:07:36,057 --> 00:07:37,068
+with a range of values for
+请尝试不同的 α 值
+
+230
+00:07:37,098 --> 00:07:39,061
+alpha, like 0.001, 0.01,
+比如0.001, 0.01
+
+231
+00:07:39,086 --> 00:07:41,024
+so these are a
+这里每隔10倍
+
+232
+00:07:41,044 --> 00:07:43,006
+factor of 10 differences, and
+取一个值
+
+233
+00:07:43,027 --> 00:07:44,027
+for these differences of this
+然后对于这些不同的 α 值
+
+234
+00:07:44,042 --> 00:07:45,060
+of alpha, just plot j of
+绘制 J(θ)
+
+235
+00:07:45,075 --> 00:07:46,080
+theta as a function of number
+随迭代步数变化的曲线
+
+236
+00:07:47,002 --> 00:07:48,074
+of iterations and then pick
+然后选择
+
+237
+00:07:49,017 --> 00:07:50,093
+the value of alpha that, you
+看上去使得 J(θ)
+
+238
+00:07:51,004 --> 00:07:54,022
+know, seems to be causing j of theta to decrease rapidly.
+快速下降的一个 α 值
+
+239
+00:07:55,061 --> 00:07:58,008
+In fact, what I do actually isn't these steps of ten.
+事实上 我通常并不是隔10倍取一个值
+
+240
+00:07:58,058 --> 00:07:59,051
+So, you know, this is
+你可以看到
+
+241
+00:07:59,088 --> 00:08:01,077
+a scale factor of ten if you reach the top.
+这里是每隔10倍取一个值
+
+242
+00:08:02,050 --> 00:08:03,045
+What I'll actually do is try
+我通常取的
+
+243
+00:08:03,087 --> 00:08:08,048
+this range of values and
+是这些 α 值
+
+244
+00:08:08,061 --> 00:08:09,076
+so on where this is,
+一直这样下去
+
+245
+00:08:09,097 --> 00:08:12,018
+you know, opening 001
+你看 先取0.001
+
+246
+00:08:12,018 --> 00:08:13,025
+then increase the linear rate to
+然后将学习率增加3倍
+
+247
+00:08:13,050 --> 00:08:15,031
+3.4 to get 0.03 and then
+得到0.003
+
+248
+00:08:15,050 --> 00:08:16,031
+to step up this is another
+然后这一步
+
+249
+00:08:17,032 --> 00:08:20,025
+roughly 3 fold increase point
+从0.003到0.01
+
+250
+00:08:21,070 --> 00:08:22,043
+of 0.03 to 0.01s and so these
+又大约增加了3倍
+
+251
+00:08:22,075 --> 00:08:24,081
+are roughly, you know,
+所以 在为梯度下降算法
+
+252
+00:08:26,001 --> 00:08:27,075
+trying out gradient descents with each
+选择合适的学习率时
+
+253
+00:08:28,001 --> 00:08:29,011
+value I try being about
+我大致是
+
+254
+00:08:29,037 --> 00:08:30,089
+3X bigger than the previous value.
+按3的倍数来取值的
+
+255
+00:08:32,012 --> 00:08:33,025
+So what I'll do is a range
+所以我会尝试一系列α值
+
+256
+00:08:33,040 --> 00:08:34,058
+of values until I've made sure
+直到我找到
+
+257
+00:08:34,087 --> 00:08:35,096
+that I've found one value that
+一个值
+
+258
+00:08:36,011 --> 00:08:36,088
+is too small and made sure
+它不能再小了
+
+259
+00:08:37,008 --> 00:08:38,013
+I found one value that is
+同时找到另一个值
+
+260
+00:08:38,025 --> 00:08:39,038
+too large, and then I sort
+它不能再大了
+
+261
+00:08:39,063 --> 00:08:40,096
+of try to pick the largest
+然后我尽量挑选
+
+262
+00:08:41,040 --> 00:08:42,069
+possible value or just something
+其中最大的那个 α 值
+
+263
+00:08:43,011 --> 00:08:45,008
+slightly smaller than the
+或者一个比最大值
+
+264
+00:08:45,021 --> 00:08:47,039
+largest reasonable value that I found.
+略小一些的合理的值
+
+265
+00:08:47,075 --> 00:08:48,078
+And when I do that
+而当我做了以上工作时
+
+266
+00:08:49,026 --> 00:08:50,035
+usually it just gives me
+我通常就可以得到
+
+267
+00:08:50,052 --> 00:08:52,001
+a good learning rate for my problem.
+一个不错的学习率
+
+268
+00:08:53,023 --> 00:08:53,091
+And if you do this
+如果也你这样做
+
+269
+00:08:54,008 --> 00:08:55,003
+too, hopefully you will be
+那么你也能够
+
+270
+00:08:55,012 --> 00:08:56,019
+able to choose a good
+为你的梯度下降算法
+
+271
+00:08:56,046 --> 00:08:57,034
+learning rate for your implementation
+找到一个合适的
+
+272
+00:08:58,050 --> 00:08:58,086
+of gradient descent.
+学习率值
+
diff --git a/srt/4 - 5 - Features and Polynomial Regression (8 min).srt b/srt/4 - 5 - Features and Polynomial Regression (8 min).srt
new file mode 100644
index 00000000..25fee7ac
--- /dev/null
+++ b/srt/4 - 5 - Features and Polynomial Regression (8 min).srt
@@ -0,0 +1,1161 @@
+1
+00:00:00,200 --> 00:00:03,878
+You now know about linear regression with multiple variables.
+你现在了解了多变量的线性回归
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:03,910 --> 00:00:05,185
+In this video, I wanna tell
+在本段视频中 我想告诉你
+
+3
+00:00:05,185 --> 00:00:06,369
+you a bit about the choice
+一些用来
+
+4
+00:00:06,380 --> 00:00:07,830
+of features that you have and
+选择特征的方法以及
+
+5
+00:00:07,830 --> 00:00:09,742
+how you can get different learning
+如何得到不同的学习算法
+
+6
+00:00:09,750 --> 00:00:11,477
+algorithm, sometimes very powerful
+当选择了合适的特征后
+
+7
+00:00:11,477 --> 00:00:13,803
+ones by choosing appropriate features.
+这些算法往往是非常有效的
+
+8
+00:00:13,810 --> 00:00:15,229
+And in particular I also want
+另外 我也想
+
+9
+00:00:15,229 --> 00:00:17,826
+to tell you about polynomial regression allows
+给你们讲一讲多项式回归
+
+10
+00:00:17,826 --> 00:00:19,535
+you to use the machinery of
+它使得你们能够使用
+
+11
+00:00:19,535 --> 00:00:21,247
+linear regression to fit very
+线性回归的方法来拟合
+
+12
+00:00:21,247 --> 00:00:25,060
+complicated, even very non-linear functions.
+非常复杂的函数 甚至是非线性函数
+
+13
+00:00:25,690 --> 00:00:28,827
+Let's take the example of predicting the price of the house.
+以预测房价为例
+
+14
+00:00:29,300 --> 00:00:31,147
+Suppose you have two features,
+假设你有两个特征
+
+15
+00:00:31,147 --> 00:00:33,805
+the frontage of house and the depth of the house.
+分别是房子临街的宽度和垂直宽度
+
+16
+00:00:33,805 --> 00:00:35,428
+So, here's the picture of the house we're trying to sell.
+这就是我们想要卖出的房子的图片
+
+17
+00:00:35,428 --> 00:00:37,264
+So, the frontage is
+临街宽度
+
+18
+00:00:37,264 --> 00:00:40,103
+defined as this distance
+被定义为这个距离
+
+19
+00:00:40,103 --> 00:00:43,009
+is basically the width
+其实就是它的宽度
+
+20
+00:00:43,009 --> 00:00:44,949
+or the length of
+或者说是
+
+21
+00:00:44,960 --> 00:00:46,652
+how wide your lot
+你拥有的土地的宽度
+
+22
+00:00:46,652 --> 00:00:47,994
+is if this that you
+如果这块地都是你的的话
+
+23
+00:00:48,020 --> 00:00:49,468
+own, and the depth
+而这所房子的
+
+24
+00:00:49,500 --> 00:00:53,120
+of the house is how
+纵向深度就是
+
+25
+00:00:53,130 --> 00:00:54,758
+deep your property is, so
+你的房子的深度
+
+26
+00:00:54,770 --> 00:00:57,992
+there's a frontage, there's a depth.
+这是正面的宽度 这是深度
+
+27
+00:00:57,992 --> 00:00:59,858
+called frontage and depth.
+我们称之为临街宽度和纵深
+
+28
+00:00:59,858 --> 00:01:01,355
+You might build a linear regression
+你可能会 像这样 建立一个
+
+29
+00:01:01,360 --> 00:01:04,163
+model like this where frontage
+线性回归模型 其中临街宽度
+
+30
+00:01:04,180 --> 00:01:06,062
+is your first feature x1 and
+是你的第一个特征x1
+
+31
+00:01:06,062 --> 00:01:07,535
+and depth is your second
+纵深是你的第二个
+
+32
+00:01:07,535 --> 00:01:10,169
+feature x2, but when you're
+特征x2 但当我们在
+
+33
+00:01:10,169 --> 00:01:11,772
+applying linear regression, you don't
+运用线性回归时
+
+34
+00:01:11,772 --> 00:01:13,342
+necessarily have to use
+你不一定非要直接用
+
+35
+00:01:13,342 --> 00:01:16,607
+just the features x1 and x2 that you're given.
+给出的 x1 和 x2 作为特征
+
+36
+00:01:16,610 --> 00:01:20,531
+What you can do is actually create new features by yourself.
+其实你可以自己创造新的特征
+
+37
+00:01:20,531 --> 00:01:21,709
+So, if I want to predict
+因此 如果我要预测
+
+38
+00:01:21,710 --> 00:01:22,895
+the price of a house, what I
+房子的价格
+
+39
+00:01:22,895 --> 00:01:24,840
+might do instead is decide
+我真正要需做的 也许是
+
+40
+00:01:24,850 --> 00:01:27,468
+that what really determines
+确定真正能够决定
+
+41
+00:01:27,490 --> 00:01:29,133
+the size of the house is
+我房子大小 或者说我土地大小
+
+42
+00:01:29,133 --> 00:01:32,164
+the area or the land area that I own.
+的因素是什么
+
+43
+00:01:32,190 --> 00:01:33,365
+So, I might create a new feature.
+因此 我可能会创造一个新的特征
+
+44
+00:01:33,380 --> 00:01:34,609
+I'm just gonna call this feature
+我称之为
+
+45
+00:01:34,609 --> 00:01:40,409
+x which is frontage, times depth.
+x 它是临街宽度与纵深的乘积
+
+46
+00:01:40,440 --> 00:01:42,404
+This is a multiplication symbol.
+这是一个乘法符号
+
+47
+00:01:42,404 --> 00:01:44,334
+It's a frontage x depth because
+它是临街宽度与纵深的乘积
+
+48
+00:01:44,334 --> 00:01:46,040
+this is the land area
+这得到的就是我拥有的土地的面积
+
+49
+00:01:46,090 --> 00:01:48,035
+that I own and I might
+然后 我可以把
+
+50
+00:01:48,035 --> 00:01:50,651
+then select my hypothesis
+假设选择为
+
+51
+00:01:50,710 --> 00:01:53,327
+as that using just
+使其只使用
+
+52
+00:01:53,350 --> 00:01:54,785
+one feature which is my
+一个特征 也就是我的
+
+53
+00:01:54,785 --> 00:01:57,430
+land area, right?
+土地的面积 对吧?
+
+54
+00:01:57,580 --> 00:01:58,939
+Because the area of a
+由于矩形面积的
+
+55
+00:01:58,940 --> 00:02:00,345
+rectangle is you know,
+计算方法是
+
+56
+00:02:00,345 --> 00:02:01,432
+the product of the length
+矩形长和宽相乘
+
+57
+00:02:01,460 --> 00:02:03,822
+of the size So, depending
+因此 这取决于
+
+58
+00:02:03,822 --> 00:02:05,253
+on what insight you might have
+你从什么样的角度
+
+59
+00:02:05,280 --> 00:02:07,481
+into a particular problem, rather than
+去审视一个特定的问题 而不是
+
+60
+00:02:07,490 --> 00:02:09,604
+just taking the features [xx]
+只是直接去使用临街宽度和纵深
+
+61
+00:02:09,620 --> 00:02:11,103
+that we happen to have started
+这两个我们只是碰巧在开始时
+
+62
+00:02:11,130 --> 00:02:13,489
+off with, sometimes by defining
+使用的特征 有时 通过定义
+
+63
+00:02:13,489 --> 00:02:16,771
+new features you might actually get a better model.
+新的特征 你确实会得到一个更好的模型
+
+64
+00:02:16,790 --> 00:02:18,163
+Closely related to the
+与选择特征的想法
+
+65
+00:02:18,163 --> 00:02:19,745
+idea of choosing your features
+密切相关的一个概念
+
+66
+00:02:19,745 --> 00:02:22,973
+is this idea called polynomial regression.
+被称为多项式回归(polynomial regression)
+
+67
+00:02:23,010 --> 00:02:26,868
+Let's say you have a housing price data set that looks like this.
+比方说 你有这样一个住房价格的数据集
+
+68
+00:02:26,880 --> 00:02:29,646
+Then there are a few different models you might fit to this.
+为了拟合它 可能会有多个不同的模型供选择
+
+69
+00:02:29,660 --> 00:02:32,587
+One thing you could do is fit a quadratic model like this.
+其中一个你可以选择的是像这样的二次模型
+
+70
+00:02:32,600 --> 00:02:35,598
+It doesn't look like a straight line fits this data very well.
+因为直线似乎并不能很好地拟合这些数据
+
+71
+00:02:35,598 --> 00:02:36,788
+So maybe you want to fit
+因此 也许你会想到
+
+72
+00:02:36,788 --> 00:02:38,408
+a quadratic model like this
+用这样的二次模型去拟合数据
+
+73
+00:02:38,420 --> 00:02:40,248
+where you think the size, where
+你可能会考量
+
+74
+00:02:40,248 --> 00:02:42,017
+you think the price is a quadratic
+是关于价格的一个二次函数
+
+75
+00:02:42,020 --> 00:02:43,956
+function and maybe that'll
+也许这样做
+
+76
+00:02:43,970 --> 00:02:45,018
+give you, you know, a fit
+会给你一个
+
+77
+00:02:45,020 --> 00:02:47,070
+to the data that looks like that.
+像这样的拟合结果
+
+78
+00:02:47,280 --> 00:02:48,560
+But then you may decide that your
+但是 然后你可能会觉得
+
+79
+00:02:48,570 --> 00:02:50,013
+quadratic model doesn't make sense
+二次函数的模型并不好用
+
+80
+00:02:50,013 --> 00:02:52,582
+because of a quadratic function, eventually
+因为 一个二次函数最终
+
+81
+00:02:52,582 --> 00:02:53,858
+this function comes back down
+会降回来
+
+82
+00:02:53,858 --> 00:02:55,591
+and well, we don't think housing
+而我们并不认为
+
+83
+00:02:55,600 --> 00:02:58,899
+prices should go down when the size goes up too high.
+房子的价格在高到一定程度后 会下降回来
+
+84
+00:02:58,970 --> 00:03:00,649
+So then maybe we might
+因此 也许我们会
+
+85
+00:03:00,650 --> 00:03:02,700
+choose a different polynomial model
+选择一个不同的多项式模型
+
+86
+00:03:02,700 --> 00:03:04,274
+and choose to use instead a
+并转而选择使用一个
+
+87
+00:03:04,290 --> 00:03:07,480
+cubic function, and where
+三次函数 在这里
+
+88
+00:03:07,480 --> 00:03:09,225
+we have now a third-order term
+现在我们有了一个三次的式子
+
+89
+00:03:09,225 --> 00:03:10,764
+and we fit that, maybe
+我们用它进行拟合
+
+90
+00:03:10,800 --> 00:03:12,367
+we get this sort of
+我们可能得到这样的模型
+
+91
+00:03:12,390 --> 00:03:13,907
+model, and maybe the
+也许这条绿色的线
+
+92
+00:03:13,910 --> 00:03:15,278
+green line is a somewhat better fit
+对这个数据集拟合得更好
+
+93
+00:03:15,278 --> 00:03:18,052
+to the data cause it doesn't eventually come back down.
+因为它不会在最后下降回来
+
+94
+00:03:18,052 --> 00:03:21,992
+So how do we actually fit a model like this to our data?
+那么 我们到底应该如何将模型与我们的数据进行拟合呢?
+
+95
+00:03:22,020 --> 00:03:23,868
+Using the machinery of multivariant
+使用多元
+
+96
+00:03:23,868 --> 00:03:27,059
+linear regression, we can
+线性回归的方法 我们可以
+
+97
+00:03:27,059 --> 00:03:30,692
+do this with a pretty simple modification to our algorithm.
+通过将我们的算法做一个非常简单的修改来实现它
+
+98
+00:03:30,692 --> 00:03:32,632
+The form of the hypothesis we,
+按照我们以前假设的形式
+
+99
+00:03:32,632 --> 00:03:34,217
+we know how the fit
+我们知道如何对
+
+100
+00:03:34,217 --> 00:03:35,782
+looks like this, where we say
+这样的模型进行拟合 其中
+
+101
+00:03:35,782 --> 00:03:37,612
+H of x is theta zero
+?θ(x) 等于 θ0
+
+102
+00:03:37,612 --> 00:03:41,608
+plus theta one x one plus x two theta X3.
++θ1×x1 + θ2×x2 + θ3×x3
+
+103
+00:03:41,608 --> 00:03:42,775
+And if we want to
+那么 如果我们想
+
+104
+00:03:42,775 --> 00:03:45,220
+fit this cubic model that
+拟合这个三次模型
+
+105
+00:03:45,250 --> 00:03:47,239
+I have boxed in green,
+就是我用绿色方框框起来的这个
+
+106
+00:03:47,239 --> 00:03:48,940
+what we're saying is that
+现在我们讨论的是
+
+107
+00:03:48,940 --> 00:03:49,825
+to predict the price of a
+为了预测一栋房子的价格
+
+108
+00:03:49,825 --> 00:03:51,364
+house, it's theta 0 plus theta
+我们用 θ0 加 θ1
+
+109
+00:03:51,364 --> 00:03:53,056
+1 times the size of the house
+乘以房子的面积
+
+110
+00:03:53,056 --> 00:03:55,905
+plus theta 2 times the square size of the house.
+加上 θ2 乘以房子面积的平方
+
+111
+00:03:55,910 --> 00:03:58,974
+So this term is equal to that term.
+因此 这个式子与那个式子是相等的
+
+112
+00:03:58,974 --> 00:04:00,885
+And then plus theta 3
+然后再加 θ3
+
+113
+00:04:00,890 --> 00:04:02,343
+times the cube of the
+乘以
+
+114
+00:04:02,350 --> 00:04:05,302
+size of the house raises that third term.
+房子面积的立方
+
+115
+00:04:05,470 --> 00:04:06,967
+In order to map these
+为了将这两个定义
+
+116
+00:04:06,990 --> 00:04:08,668
+two definitions to each other,
+互相对应起来
+
+117
+00:04:08,668 --> 00:04:10,339
+well, the natural way
+为了做到这一点
+
+118
+00:04:10,339 --> 00:04:12,128
+to do that is to set
+我们自然想到了
+
+119
+00:04:12,150 --> 00:04:13,568
+the first feature x one to
+将 x1 特征设为
+
+120
+00:04:13,568 --> 00:04:15,320
+be the size of the house, and
+房子的面积
+
+121
+00:04:15,320 --> 00:04:16,721
+set the second feature x two
+将第二个特征 x2 设为
+
+122
+00:04:16,721 --> 00:04:17,766
+to be the square of the size
+房屋面积的平方
+
+123
+00:04:17,766 --> 00:04:20,400
+of the house, and set the third feature x three to
+将第三个特征 x3 设为
+
+124
+00:04:20,400 --> 00:04:22,780
+be the cube of the size of the house.
+房子面积的立方
+
+125
+00:04:22,800 --> 00:04:24,292
+And, just by choosing my
+那么 仅仅通过将
+
+126
+00:04:24,292 --> 00:04:26,311
+three features this way and
+这三个特征这样设置
+
+127
+00:04:26,311 --> 00:04:27,720
+applying the machinery of linear
+然后再应用线性回归的方法
+
+128
+00:04:27,720 --> 00:04:30,540
+regression, I can fit this
+我就可以拟合
+
+129
+00:04:30,540 --> 00:04:31,901
+model and end up with
+这个模型 并最终
+
+130
+00:04:31,901 --> 00:04:34,374
+a cubic fit to my data.
+将一个三次函数拟合到我的数据上
+
+131
+00:04:34,374 --> 00:04:35,523
+I just want to point out one
+我还想再说一件事
+
+132
+00:04:35,523 --> 00:04:36,799
+more thing, which is that
+那就是
+
+133
+00:04:36,800 --> 00:04:38,610
+if you choose your features
+如果你像这样选择特征
+
+134
+00:04:38,610 --> 00:04:40,925
+like this, then feature scaling
+那么特征的归一化
+
+135
+00:04:40,925 --> 00:04:43,688
+becomes increasingly important.
+就变得更重要了
+
+136
+00:04:44,130 --> 00:04:45,254
+So if the size of the
+因此 如果
+
+137
+00:04:45,254 --> 00:04:46,794
+house ranges from one to
+房子的大小范围在
+
+138
+00:04:46,800 --> 00:04:47,992
+a thousand, so, you know,
+1到1000之间 那么
+
+139
+00:04:47,992 --> 00:04:49,300
+from one to a thousand square
+比如说
+
+140
+00:04:49,310 --> 00:04:50,918
+feet, say, then the size
+从1到1000平方尺 那么
+
+141
+00:04:50,930 --> 00:04:52,175
+squared of the house will
+房子面积的平方
+
+142
+00:04:52,175 --> 00:04:54,519
+range from one to one
+的范围就是
+
+143
+00:04:54,520 --> 00:04:55,953
+million, the square of
+一到一百万 也就是
+
+144
+00:04:55,953 --> 00:04:58,468
+a thousand, and your third
+1000的平方 而你的第三个特征
+
+145
+00:04:58,490 --> 00:05:01,335
+feature x cubed, excuse me
+x的立方 抱歉
+
+146
+00:05:01,360 --> 00:05:03,106
+you, your third feature x
+你的第三个特征 x3
+
+147
+00:05:03,120 --> 00:05:04,732
+three, which is the size
+它是房子面积的
+
+148
+00:05:04,732 --> 00:05:05,941
+cubed of the house, will range
+立方 范围会扩大到
+
+149
+00:05:05,950 --> 00:05:07,478
+from one two ten to
+1到10的9次方
+
+150
+00:05:07,478 --> 00:05:09,311
+the nine, and so these
+因此
+
+151
+00:05:09,330 --> 00:05:10,955
+three features take on very
+这三个特征的范围
+
+152
+00:05:10,955 --> 00:05:13,459
+different ranges of values, and
+有很大的不同
+
+153
+00:05:13,490 --> 00:05:15,105
+it's important to apply feature
+因此 如果你使用梯度下降法
+
+154
+00:05:15,110 --> 00:05:16,509
+scaling if you're using gradient
+应用特征值的归一化是非常重要的
+
+155
+00:05:16,509 --> 00:05:18,554
+descent to get them into
+这样才能将他们的
+
+156
+00:05:18,554 --> 00:05:21,139
+comparable ranges of values.
+值的范围变得具有可比性
+
+157
+00:05:21,140 --> 00:05:23,243
+Finally, here's one last example
+最后 这里是最后一个例子
+
+158
+00:05:23,250 --> 00:05:25,138
+of how you really have
+关于如何使你
+
+159
+00:05:25,150 --> 00:05:29,056
+broad choices in the features you use.
+真正选择出要使用的特征
+
+160
+00:05:29,090 --> 00:05:30,446
+Earlier we talked about how a
+此前我们谈到
+
+161
+00:05:30,446 --> 00:05:31,559
+quadratic model like this might
+一个像这样的二次模型
+
+162
+00:05:31,559 --> 00:05:33,122
+not be ideal because, you know,
+并不是理想的 因为 你知道
+
+163
+00:05:33,122 --> 00:05:34,408
+maybe a quadratic model fits the
+也许一个二次模型能很好地拟合
+
+164
+00:05:34,408 --> 00:05:35,952
+data okay, but the quadratic
+这个数据 但二次
+
+165
+00:05:35,952 --> 00:05:37,514
+function goes back down
+函数最后会下降
+
+166
+00:05:37,514 --> 00:05:39,065
+and we really don't want, right,
+这是我们不希望的
+
+167
+00:05:39,070 --> 00:05:40,352
+housing prices that go down,
+就是住房价格往下走
+
+168
+00:05:40,352 --> 00:05:43,567
+to predict that, as the size of housing freezes.
+像预测的那样 出现房价的下降
+
+169
+00:05:43,567 --> 00:05:45,388
+But rather than going to
+但是 除了转而
+
+170
+00:05:45,388 --> 00:05:46,938
+a cubic model there, you
+建立一个三次模型以外
+
+171
+00:05:46,938 --> 00:05:48,389
+have, maybe, other choices of
+你也许有其他的选择
+
+172
+00:05:48,389 --> 00:05:50,798
+features and there are many possible choices.
+特征的方法 这里有很多可能的选项
+
+173
+00:05:50,800 --> 00:05:52,313
+But just to give you another
+但是给你另外一个
+
+174
+00:05:52,313 --> 00:05:53,691
+example of a reasonable
+合理的选择的例子
+
+175
+00:05:53,691 --> 00:05:55,620
+choice, another reasonable choice
+另一种合理的选择
+
+176
+00:05:55,620 --> 00:05:57,263
+might be to say that the
+可能是这样的
+
+177
+00:05:57,263 --> 00:05:58,832
+price of a house is theta
+一套房子的价格是
+
+178
+00:05:58,850 --> 00:05:59,992
+zero plus theta one times
+θ0 加 θ1 乘以
+
+179
+00:05:59,992 --> 00:06:01,264
+the size, and then plus theta
+房子的面积 然后
+
+180
+00:06:01,320 --> 00:06:03,625
+two times the square root of the size, right?
+加 θ2 乘以房子面积的平方根 可以吧?
+
+181
+00:06:03,630 --> 00:06:05,364
+So the square root function is
+平方根函数是
+
+182
+00:06:05,364 --> 00:06:08,110
+this sort of function, and maybe
+这样的一种函数
+
+183
+00:06:08,110 --> 00:06:09,318
+there will be some value of theta
+也许θ1 θ2 θ3
+
+184
+00:06:09,318 --> 00:06:11,355
+one, theta two, theta three, that
+中会有一些值
+
+185
+00:06:11,355 --> 00:06:14,049
+will let you take this model
+会捕捉到这个模型
+
+186
+00:06:14,080 --> 00:06:15,445
+and, for the curve that looks
+从而使得这个曲线看起来
+
+187
+00:06:15,445 --> 00:06:16,952
+like that, and, you know,
+是这样的
+
+188
+00:06:16,952 --> 00:06:19,500
+goes up, but sort of flattens
+趋势是上升的 但慢慢变得
+
+189
+00:06:19,520 --> 00:06:21,529
+out a bit and doesn't ever
+平缓一些 而且永远不会
+
+190
+00:06:21,540 --> 00:06:23,877
+come back down.
+下降回来
+
+191
+00:06:24,154 --> 00:06:26,584
+And, so, by having insight into, in
+因此 通过深入地研究
+
+192
+00:06:26,584 --> 00:06:27,630
+this case, the shape of a
+在这里我们研究了平方根
+
+193
+00:06:27,630 --> 00:06:30,952
+square root function, and, into
+函数的形状 并且
+
+194
+00:06:30,990 --> 00:06:32,555
+the shape of the data, by choosing
+更深入地了解了选择不同特征时数据的形状
+
+195
+00:06:32,555 --> 00:06:36,469
+different features, you can sometimes get better models.
+有时可以得到更好的模型
+
+196
+00:06:36,469 --> 00:06:39,026
+In this video, we talked about polynomial regression.
+在这段视频中 我们探讨了多项式回归
+
+197
+00:06:39,026 --> 00:06:40,672
+That is, how to fit a
+也就是 如何将一个
+
+198
+00:06:40,672 --> 00:06:42,298
+polynomial, like a quadratic function,
+多项式 如一个二次函数
+
+199
+00:06:42,298 --> 00:06:43,868
+or a cubic function, to your data.
+或一个三次函数拟合到你的数据上
+
+200
+00:06:43,868 --> 00:06:45,112
+Was also throw out this idea,
+除了这个方面
+
+201
+00:06:45,112 --> 00:06:46,640
+that you have a choice in what
+我们还讨论了
+
+202
+00:06:46,640 --> 00:06:47,732
+features to use, such as
+在使用特征时的选择性
+
+203
+00:06:47,748 --> 00:06:48,804
+that instead of using
+例如 我们不使用
+
+204
+00:06:48,804 --> 00:06:50,078
+the frontish and the depth
+房屋的临街宽度和纵深
+
+205
+00:06:50,078 --> 00:06:51,092
+of the house, maybe, you can
+也许 你可以
+
+206
+00:06:51,092 --> 00:06:53,133
+multiply them together to get
+把它们乘在一起 从而得到
+
+207
+00:06:53,133 --> 00:06:55,317
+a feature that captures the land area of a house.
+房子的土地面积这个特征
+
+208
+00:06:55,317 --> 00:06:57,551
+In case this seems a little
+实际上 这似乎有点
+
+209
+00:06:57,551 --> 00:06:58,895
+bit bewildering, that with all
+难以抉择 这里有这么多
+
+210
+00:06:58,896 --> 00:07:03,265
+these different feature choices, so how do I decide what features to use.
+不同的特征选择 我该如何决定使用什么特征呢
+
+211
+00:07:03,265 --> 00:07:04,594
+Later in this class, we'll talk
+在之后的课程中 我们将
+
+212
+00:07:04,594 --> 00:07:06,622
+about some algorithms were automatically
+探讨一些算法 它们能够
+
+213
+00:07:06,622 --> 00:07:08,083
+choosing what features are used,
+自动选择要使用什么特征
+
+214
+00:07:08,083 --> 00:07:09,466
+so you can have an
+因此 你可以使用一个算法
+
+215
+00:07:09,466 --> 00:07:10,611
+algorithm look at the data
+观察给出的数据
+
+216
+00:07:10,611 --> 00:07:12,040
+and automatically choose for you
+并自动为你选择
+
+217
+00:07:12,040 --> 00:07:13,357
+whether you want to fit a
+到底应该选择
+
+218
+00:07:13,357 --> 00:07:15,528
+quadratic function, or a cubic function, or something else.
+一个二次函数 或者一个三次函数 还是别的函数
+
+219
+00:07:15,528 --> 00:07:17,164
+But, until we get to
+但是 在我们
+
+220
+00:07:17,164 --> 00:07:18,764
+those algorithms now I just
+学到那种算法之前
+
+221
+00:07:18,764 --> 00:07:20,295
+want you to be aware that
+现在我希望你知道
+
+222
+00:07:20,295 --> 00:07:21,582
+you have a choice in
+你需要选择
+
+223
+00:07:21,582 --> 00:07:23,094
+what features to use, and
+使用什么特征
+
+224
+00:07:23,094 --> 00:07:25,256
+by designing different features
+并且通过设计不同的特征
+
+225
+00:07:25,256 --> 00:07:26,888
+you can fit more complex functions
+你能够用更复杂的函数
+
+226
+00:07:26,888 --> 00:07:28,156
+your data then just fitting a
+去拟合你的数据 而不是只用
+
+227
+00:07:28,156 --> 00:07:30,471
+straight line to the data and
+一条直线去拟合
+
+228
+00:07:30,471 --> 00:07:32,092
+in particular you can put polynomial
+特别是 你也可以使用多项式
+
+229
+00:07:32,092 --> 00:07:35,065
+functions as well and sometimes
+函数 有时候
+
+230
+00:07:35,065 --> 00:07:36,072
+by appropriate insight into the
+通过采取适当的角度来观察
+
+231
+00:07:36,072 --> 00:07:37,564
+feature simply get a much
+特征就可以
+
+232
+00:07:37,564 --> 00:07:40,020
+better model for your data.
+得到一个更符合你的数据的模型
+
diff --git a/srt/4 - 6 - Normal Equation (16 min).srt b/srt/4 - 6 - Normal Equation (16 min).srt
new file mode 100644
index 00000000..79e7932a
--- /dev/null
+++ b/srt/4 - 6 - Normal Equation (16 min).srt
@@ -0,0 +1,2236 @@
+1
+00:00:00,302 --> 00:00:01,883
+In this video, we'll talk about
+在这段视频中 我们要讲
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,883 --> 00:00:03,948
+the normal equation, which for
+标准方程法 (Normal Equation)
+
+3
+00:00:03,948 --> 00:00:05,660
+some linear regression problems, will
+对于某些线性回归问题
+
+4
+00:00:05,660 --> 00:00:06,981
+give us a much better way
+用标准方程法求解参数θ的最优值更好
+
+5
+00:00:06,981 --> 00:00:09,115
+to solve for the optimal value
+用标准方程法求解参数θ的最优值更好
+
+6
+00:00:09,115 --> 00:00:10,879
+of the parameters theta.
+用标准方程法求解参数θ的最优值更好
+
+7
+00:00:10,879 --> 00:00:13,096
+Concretely, so far the
+具体而言 到目前为止
+
+8
+00:00:13,096 --> 00:00:14,399
+algorithm that we've been using
+我们一直在使用的线性回归的算法
+
+9
+00:00:14,399 --> 00:00:16,042
+for linear regression is gradient
+是梯度下降法
+
+10
+00:00:16,042 --> 00:00:17,823
+descent where in order
+就是说 为了最小化代价函数 J (θ)
+
+11
+00:00:17,823 --> 00:00:19,410
+to minimize the cost function
+就是说 为了最小化代价函数 J (θ)
+
+12
+00:00:19,410 --> 00:00:21,354
+J of Theta, we would take
+我们使用的迭代算法
+
+13
+00:00:21,354 --> 00:00:23,792
+this iterative algorithm that takes
+需要经过很多步
+
+14
+00:00:23,792 --> 00:00:26,410
+many steps, multiple iterations of
+也就是说通过多次迭代来计算梯度下降
+
+15
+00:00:26,410 --> 00:00:28,259
+gradient descent to converge
+也就是说通过多次迭代来计算梯度下降
+
+16
+00:00:28,259 --> 00:00:30,396
+to the global minimum.
+来收敛到全局最小值
+
+17
+00:00:30,396 --> 00:00:32,563
+In contrast, the normal equation
+相反地
+
+18
+00:00:32,563 --> 00:00:34,413
+would give us a method to
+标准方程法提供了一种求θ的解析解法
+
+19
+00:00:34,413 --> 00:00:36,986
+solve for theta analytically, so
+标准方程法提供了一种求θ的解析解法
+
+20
+00:00:36,986 --> 00:00:38,761
+that rather than needing to run
+所以与其使用迭代算法
+
+21
+00:00:38,761 --> 00:00:40,594
+this iterative algorithm, we can
+我们可以直接一次性求解θ的最优值
+
+22
+00:00:40,594 --> 00:00:41,365
+instead just solve for the
+我们可以直接一次性求解θ的最优值
+
+23
+00:00:41,365 --> 00:00:42,791
+optimal value for theta
+我们可以直接一次性求解θ的最优值
+
+24
+00:00:42,791 --> 00:00:44,403
+all at one go, so that in
+所以说基本上
+
+25
+00:00:44,403 --> 00:00:46,096
+basically one step you get
+一步就可以得到优化值
+
+26
+00:00:46,096 --> 00:00:48,136
+to the optimal value right there.
+一步就可以得到优化值
+
+27
+00:00:49,136 --> 00:00:51,947
+It turns out the normal equation
+标准方程法有一些优点 也有一些缺点
+
+28
+00:00:52,209 --> 00:00:54,442
+that has some advantages and
+标准方程法有一些优点 也有一些缺点
+
+29
+00:00:54,442 --> 00:00:56,024
+some disadvantages, but before
+但是在我们讲解这个
+
+30
+00:00:56,024 --> 00:00:57,817
+we get to that and talk about
+和何时使用标准方程之前
+
+31
+00:00:57,903 --> 00:00:59,426
+when you should use it, let's
+让我们先对这个算法有一个直观的理解
+
+32
+00:00:59,426 --> 00:01:02,539
+get some intuition about what this method does.
+让我们先对这个算法有一个直观的理解
+
+33
+00:01:02,539 --> 00:01:04,633
+For this week's planetary example, let's
+我们举一个例子来解释这个问题
+
+34
+00:01:04,633 --> 00:01:06,120
+imagine, let's take a
+我们假设 有一个非常简单的代价函数J(θ)
+
+35
+00:01:06,120 --> 00:01:07,505
+very simplified cost function
+我们假设 有一个非常简单的代价函数J(θ)
+
+36
+00:01:07,505 --> 00:01:09,291
+J of Theta, that's just the
+它就是一个实数θ的函数
+
+37
+00:01:09,291 --> 00:01:11,958
+function of a real number Theta.
+它就是一个实数θ的函数
+
+38
+00:01:11,958 --> 00:01:13,642
+So, for now, imagine that Theta
+所以现在 假设θ只是一个标量
+
+39
+00:01:13,842 --> 00:01:16,615
+is just a scalar value or that Theta is just a row value.
+或者说θ只有一行
+
+40
+00:01:16,769 --> 00:01:18,918
+It's just a number, rather than a vector.
+它是一个数字 不是向量
+
+41
+00:01:19,171 --> 00:01:24,595
+Imagine that we have a cost function J that's a quadratic function of this real value
+假设我们的代价函数J 是这个实参数θ的二次函数
+
+42
+00:01:25,028 --> 00:01:27,420
+parameter Theta, so J of Theta looks like that.
+所以J (θ) 看起来是这样的
+
+43
+00:01:27,851 --> 00:01:30,336
+Well, how do you minimize a quadratic function?
+那么如何最小化一个二次函数呢?
+
+44
+00:01:30,720 --> 00:01:32,745
+For those of you that know a little bit of calculus,
+对于那些了解一点微积分的同学来说
+
+45
+00:01:32,858 --> 00:01:34,965
+you may know that the way to
+你可能知道
+
+46
+00:01:34,965 --> 00:01:36,628
+minimize a function is to
+最小化的一个函数的方法是
+
+47
+00:01:36,628 --> 00:01:38,991
+take derivatives and to
+对它求导 并且将导数置零
+
+48
+00:01:38,991 --> 00:01:41,707
+set derivatives equal to zero.
+对它求导 并且将导数置零
+
+49
+00:01:41,707 --> 00:01:44,721
+So, you take the derivative of J with respect to the parameter of Theta.
+所以对J求关于θ的导数
+
+50
+00:01:44,797 --> 00:01:46,847
+You get some formula which I am not going to derive,
+我不打算推导那些公式
+
+51
+00:01:46,847 --> 00:01:49,161
+you set that derivative
+你把那个导数置零
+
+52
+00:01:49,161 --> 00:01:50,782
+equal to zero, and this
+这样你就可以求得
+
+53
+00:01:50,782 --> 00:01:53,503
+allows you to solve for
+使得J(θ)最小的θ值
+
+54
+00:01:53,503 --> 00:01:57,866
+the value of Theda that minimizes J of Theta.
+使得J(θ)最小的θ值
+
+55
+00:01:57,866 --> 00:01:59,096
+That was a simpler case
+这是数据为实数的
+
+56
+00:01:59,096 --> 00:02:01,716
+of when data was just real number.
+一个比较简单的例子
+
+57
+00:02:01,716 --> 00:02:04,272
+In the problem that we are interested in, Theta is
+在这个问题中 我们感兴趣的是
+
+58
+00:02:04,929 --> 00:02:06,559
+no longer just a real number,
+θ不是一个实数的情况
+
+59
+00:02:06,559 --> 00:02:07,847
+but, instead, is this
+它是一个n+1维的参数向量
+
+60
+00:02:07,847 --> 00:02:11,986
+n+1-dimensional parameter vector, and,
+它是一个n+1维的参数向量
+
+61
+00:02:11,986 --> 00:02:13,809
+a cost function J is
+并且 代价函数J是这个向量的函数
+
+62
+00:02:13,809 --> 00:02:15,742
+a function of this vector
+并且 代价函数J是这个向量的函数
+
+63
+00:02:15,742 --> 00:02:17,501
+value or Theta 0 through
+也就是θ0到θm的函数
+
+64
+00:02:17,501 --> 00:02:18,924
+Theta m. And, a cost
+一个代价函数看起来是这样
+
+65
+00:02:18,924 --> 00:02:21,957
+function looks like this, some square cost function on the right.
+像右边的这个平方代价函数
+
+66
+00:02:22,373 --> 00:02:25,712
+How do we minimize this cost function J?
+我们如何最小化这个代价函数J?
+
+67
+00:02:25,712 --> 00:02:27,163
+Calculus actually tells us
+实际上 微积分告诉我们一种方法
+
+68
+00:02:27,163 --> 00:02:29,377
+that, if you, that
+实际上 微积分告诉我们一种方法
+
+69
+00:02:29,377 --> 00:02:30,709
+one way to do so, is
+对每个参数θ求J的偏导数
+
+70
+00:02:30,709 --> 00:02:38,604
+to take the partial derivative of J, with respect to every parameter of Theta J in turn, and then, to set
+对每个参数θ求J的偏导数
+
+71
+00:02:38,604 --> 00:02:40,271
+all of these to 0.
+然后把它们全部置零
+
+72
+00:02:40,271 --> 00:02:41,394
+If you do that, and you
+如果你这样做
+
+73
+00:02:41,394 --> 00:02:42,718
+solve for the values of
+并且求出θ0 θ1 一直到θn的值
+
+74
+00:02:42,718 --> 00:02:44,000
+Theta 0, Theta 1,
+并且求出θ0 θ1 一直到θn的值
+
+75
+00:02:44,000 --> 00:02:45,973
+up to Theta N, then,
+并且求出θ0 θ1 一直到θn的值
+
+76
+00:02:45,973 --> 00:02:47,217
+this would give you that values
+这样就能得到能够最小化代价函数J的θ值
+
+77
+00:02:47,217 --> 00:02:48,765
+of Theta to minimize the cost
+这样就能得到能够最小化代价函数J的θ值
+
+78
+00:02:48,765 --> 00:02:50,878
+function J. Where, if
+这样就能得到能够最小化代价函数J的θ值
+
+79
+00:02:50,878 --> 00:02:52,176
+you actually work through the
+如果你真的做完微积分和求解参数θ0到θn
+
+80
+00:02:52,176 --> 00:02:53,597
+calculus and work through
+如果你真的做完微积分和求解参数θ0到θn
+
+81
+00:02:53,597 --> 00:02:55,194
+the solution to the parameters
+如果你真的做完微积分和求解参数θ0到θn
+
+82
+00:02:55,194 --> 00:02:57,316
+Theta 0 through Theta N, the
+如果你真的做完微积分和求解参数θ0到θn
+
+83
+00:02:57,316 --> 00:03:00,520
+derivation ends up being somewhat involved.
+这个偏微分最终可能很复杂
+
+84
+00:03:00,520 --> 00:03:01,625
+And, what I am going
+接下来我在视频中要做的
+
+85
+00:03:01,625 --> 00:03:03,113
+to do in this video,
+接下来我在视频中要做的
+
+86
+00:03:03,113 --> 00:03:04,852
+is actually to not go
+实际上不是遍历所有的偏微分
+
+87
+00:03:04,852 --> 00:03:06,297
+through the derivation, which is kind
+实际上不是遍历所有的偏微分
+
+88
+00:03:06,297 --> 00:03:07,657
+of long and kind of involved, but
+因为这样太久太费事
+
+89
+00:03:07,657 --> 00:03:08,962
+what I want to do is just
+我只是想告诉你们
+
+90
+00:03:08,962 --> 00:03:10,545
+tell you what you need to know
+你们想要实现这个过程所需要知道内容
+
+91
+00:03:10,545 --> 00:03:12,619
+in order to implement this process
+你们想要实现这个过程所需要知道内容
+
+92
+00:03:12,619 --> 00:03:14,138
+so you can solve for the
+这样你就可以解出
+
+93
+00:03:14,138 --> 00:03:15,511
+values of the thetas that
+偏导数为0时θ的值
+
+94
+00:03:15,511 --> 00:03:16,892
+corresponds to where the
+偏导数为0时θ的值
+
+95
+00:03:16,892 --> 00:03:19,273
+partial derivatives is equal to zero.
+偏导数为0时θ的值
+
+96
+00:03:19,273 --> 00:03:21,733
+Or alternatively, or equivalently,
+换个方式说 或者等价地
+
+97
+00:03:21,733 --> 00:03:23,357
+the values of Theta is that
+这个θ能够使得代价函数J(θ)最小化
+
+98
+00:03:23,357 --> 00:03:25,901
+minimize the cost function J of Theta.
+这个θ能够使得代价函数J(θ)最小化
+
+99
+00:03:25,901 --> 00:03:27,283
+I realize that some of
+我发现可能只有熟悉微积分的同学
+
+100
+00:03:27,283 --> 00:03:28,846
+the comments I made that made
+我发现可能只有熟悉微积分的同学
+
+101
+00:03:28,846 --> 00:03:29,914
+more sense only to those
+比较容易理解我的话
+
+102
+00:03:29,914 --> 00:03:31,896
+of you that are normally familiar with calculus.
+比较容易理解我的话
+
+103
+00:03:31,896 --> 00:03:33,065
+So, but if you don't
+所以 如果你不了解
+
+104
+00:03:33,065 --> 00:03:34,487
+know, if you're less familiar
+或者不那么了解微积分
+
+105
+00:03:34,487 --> 00:03:36,354
+with calculus, don't worry about it.
+也不必担心
+
+106
+00:03:36,354 --> 00:03:37,404
+I'm just going to tell you what
+我会告诉你
+
+107
+00:03:37,404 --> 00:03:38,374
+you need to know in order to
+要实现这个算法并且使其正常运行
+
+108
+00:03:38,374 --> 00:03:41,358
+implement this algorithm and get it to work.
+你所需的必要知识
+
+109
+00:03:41,358 --> 00:03:42,585
+For the example that I
+举个例子
+
+110
+00:03:42,585 --> 00:03:43,737
+want to use as a running
+我想运行这样一个例子
+
+111
+00:03:43,737 --> 00:03:46,339
+example let's say that
+假如说我有m=4个训练样本
+
+112
+00:03:46,339 --> 00:03:49,056
+I have m = 4 training examples.
+假如说我有m=4个训练样本
+
+113
+00:03:50,409 --> 00:03:52,881
+In order to implement this normal
+为了实现标准方程法
+
+114
+00:03:52,881 --> 00:03:56,515
+equation at big, what I'm going to do is the following.
+我要这样做
+
+115
+00:03:56,515 --> 00:03:57,640
+I'm going to take my
+看我的训练集
+
+116
+00:03:57,640 --> 00:04:00,375
+data set, so here are my four training examples.
+在这里就是这四个训练样本
+
+117
+00:04:00,375 --> 00:04:01,844
+In this case let's assume that,
+在这种情况下 我们假设
+
+118
+00:04:01,844 --> 00:04:06,073
+you know, these four examples is all the data I have.
+这四个训练样本就是我的所有数据
+
+119
+00:04:06,073 --> 00:04:07,890
+What I am going to do is take
+我所要做的是
+
+120
+00:04:07,890 --> 00:04:09,007
+my data set and add
+在我的训练集中加上一列对应额外特征变量的x0
+
+121
+00:04:09,007 --> 00:04:11,289
+an extra column that corresponds
+在我的训练集中加上一列对应额外特征变量的x0
+
+122
+00:04:11,289 --> 00:04:14,579
+to my extra feature, x0,
+在我的训练集中加上一列对应额外特征变量的x0
+
+123
+00:04:14,579 --> 00:04:15,967
+that is always takes
+就是那个取值永远是1的
+
+124
+00:04:15,967 --> 00:04:17,527
+on this value of 1.
+就是那个取值永远是1的
+
+125
+00:04:17,527 --> 00:04:18,681
+What I'm going to do is
+接下来我要做的是
+
+126
+00:04:18,681 --> 00:04:19,943
+I'm then going to construct
+构建一个矩阵X
+
+127
+00:04:19,943 --> 00:04:22,638
+a matrix called X that's
+这个矩阵基本包含了训练样本的所有特征变量
+
+128
+00:04:22,638 --> 00:04:24,632
+a matrix are basically contains all
+这个矩阵基本包含了训练样本的所有特征变量
+
+129
+00:04:24,632 --> 00:04:26,100
+of the features from my
+这个矩阵基本包含了训练样本的所有特征变量
+
+130
+00:04:26,100 --> 00:04:28,140
+training data, so completely
+所以具体地说
+
+131
+00:04:28,140 --> 00:04:31,528
+here is my here are
+这里有我所有的特征变量
+
+132
+00:04:31,528 --> 00:04:33,743
+all my features and we're
+这里有我所有的特征变量
+
+133
+00:04:33,743 --> 00:04:34,797
+going to take all those numbers and
+我们要把这些数字
+
+134
+00:04:34,797 --> 00:04:37,777
+put them into this matrix "X", okay?
+全部放到矩阵中X中 好吧?
+
+135
+00:04:37,777 --> 00:04:39,179
+So just, you know, copy
+所以只是
+
+136
+00:04:39,179 --> 00:04:41,233
+the data over one column
+每次复制一列的数据
+
+137
+00:04:41,233 --> 00:04:45,962
+at a time and then I am going to do something similar for y's.
+我要对y做类似的事情
+
+138
+00:04:45,962 --> 00:04:47,087
+I am going to take the
+我要对我们将要预测的值
+
+139
+00:04:47,087 --> 00:04:47,952
+values that I'm trying to
+我要对我们将要预测的值
+
+140
+00:04:47,952 --> 00:04:49,360
+predict and construct now
+构建一个向量
+
+141
+00:04:49,360 --> 00:04:52,894
+a vector, like so
+像这样的
+
+142
+00:04:52,894 --> 00:04:55,440
+and call that a vector y.
+并且称之为向量y
+
+143
+00:04:55,440 --> 00:04:58,038
+So X is going to be a
+所以X会是一个M*(n+1)维矩阵
+
+144
+00:04:59,653 --> 00:05:05,688
+m by (n+1) - dimensional matrix, and
+所以X会是一个M*(n+1)维矩阵
+
+145
+00:05:05,688 --> 00:05:07,490
+Y is going to be
+y会是一个m维向量
+
+146
+00:05:07,490 --> 00:05:14,421
+a m-dimensional vector
+y会是一个m维向量
+
+147
+00:05:14,421 --> 00:05:16,624
+where m is the number of training examples
+其中m是训练样本数量
+
+148
+00:05:16,984 --> 00:05:18,688
+and n is, n is
+n是特征变量数
+
+149
+00:05:18,688 --> 00:05:20,713
+a number of features, n+1, because of
+n+1是因为我加的这个额外的特征变量x0
+
+150
+00:05:20,713 --> 00:05:24,825
+this extra feature X0 that I had.
+n+1是因为我加的这个额外的特征变量x0
+
+151
+00:05:24,825 --> 00:05:26,350
+Finally if you take
+最后 如??果你用矩阵X和向量y来计算这个
+
+152
+00:05:26,350 --> 00:05:27,489
+your matrix X and you take
+最后 如??果你用矩阵X和向量y来计算这个
+
+153
+00:05:27,489 --> 00:05:28,595
+your vector Y, and if you
+最后 如??果你用矩阵X和向量y来计算这个
+
+154
+00:05:28,595 --> 00:05:31,065
+just compute this, and set
+θ等于 X转置乘以X的逆 乘以X转置 乘以y
+
+155
+00:05:31,065 --> 00:05:32,419
+theta to be equal to
+θ等于 X转置乘以X的逆 乘以X转置 乘以y
+
+156
+00:05:32,419 --> 00:05:34,440
+X transpose X inverse times
+θ等于 X转置乘以X的逆 乘以X转置 乘以y
+
+157
+00:05:34,440 --> 00:05:36,516
+X transpose Y, this would
+θ等于 X转置乘以X的逆 乘以X转置 乘以y
+
+158
+00:05:36,516 --> 00:05:38,583
+give you the value of theta
+这样就得到能够使得代价函数最小化的θ
+
+159
+00:05:38,583 --> 00:05:42,559
+that minimizes your cost function.
+这样就得到能够使得代价函数最小化的θ
+
+160
+00:05:42,559 --> 00:05:43,436
+There was a lot
+幻灯片上的内容比较多
+
+161
+00:05:43,436 --> 00:05:44,416
+that happened on the slides and
+幻灯片上的内容比较多
+
+162
+00:05:44,416 --> 00:05:47,514
+I work through it using one specific example of one dataset.
+我讲解了这样一个数据组的一个例子
+
+163
+00:05:47,514 --> 00:05:49,241
+Let me just write this
+让我把这个写成更加通用的形式
+
+164
+00:05:49,333 --> 00:05:50,770
+out in a slightly more general form
+让我把这个写成更加通用的形式
+
+165
+00:05:50,955 --> 00:05:53,418
+and then let me just, and later on in
+在之后的视频中
+
+166
+00:05:53,621 --> 00:05:56,531
+this video let me explain this equation a little bit more.
+我会仔细介绍这个方程
+
+167
+00:05:57,581 --> 00:06:00,687
+It is not yet entirely clear how to do this.
+以防你不完全清楚要如何做
+
+168
+00:06:00,687 --> 00:06:02,129
+In a general case, let us
+在一般情况下
+
+169
+00:06:02,129 --> 00:06:04,124
+say we have M training examples
+假如我们有m个训练样本
+
+170
+00:06:04,124 --> 00:06:05,697
+so X1, Y1 up to
+x1 y1 直到 xn yn
+
+171
+00:06:05,697 --> 00:06:09,319
+Xn, Yn and n features.
+n个特征变量
+
+172
+00:06:09,319 --> 00:06:10,811
+So, each of the training example
+所以每一个训练样本
+
+173
+00:06:10,811 --> 00:06:12,926
+x(i) may looks like a vector
+xi 可能看起来像一个向量
+
+174
+00:06:12,926 --> 00:06:16,297
+like this, that is a n+1 dimensional feature vector.
+像这样一个n+1维特征向量
+
+175
+00:06:16,943 --> 00:06:18,350
+The way I'm going to construct the
+我要构建矩阵X的方法
+
+176
+00:06:18,350 --> 00:06:20,674
+matrix "X", this is
+我要构建矩阵X的方法
+
+177
+00:06:20,674 --> 00:06:24,827
+also called the design matrix
+也被称为设计矩阵
+
+178
+00:06:24,827 --> 00:06:26,712
+is as follows.
+如下所示
+
+179
+00:06:26,712 --> 00:06:28,640
+Each training example gives
+每个训练样本给出一个这样的特征向量
+
+180
+00:06:28,640 --> 00:06:30,549
+me a feature vector like this.
+每个训练样本给出一个这样的特征向量
+
+181
+00:06:30,549 --> 00:06:34,491
+say, sort of n+1 dimensional vector.
+也就是说 这样的n+1维向量
+
+182
+00:06:34,491 --> 00:06:36,190
+The way I am going to construct my
+我构建我的设计矩阵X的方法
+
+183
+00:06:36,359 --> 00:06:39,734
+design matrix x is only construct the matrix like this.
+就是构建这样的矩阵
+
+184
+00:06:39,734 --> 00:06:40,834
+and what I'm going to
+那么 我要做的就是
+
+185
+00:06:40,834 --> 00:06:42,109
+do is take the first
+取第一个训练样本
+
+186
+00:06:42,109 --> 00:06:43,711
+training example, so that's
+取第一个训练样本
+
+187
+00:06:43,711 --> 00:06:46,350
+a vector, take its transpose
+也就是一个向量 取它的转置
+
+188
+00:06:46,350 --> 00:06:48,692
+so it ends up being this,
+它最后是这样
+
+189
+00:06:48,692 --> 00:06:50,250
+you know, long flat thing and
+扁长的样子
+
+190
+00:06:50,250 --> 00:06:55,153
+make x1 transpose the first row of my design matrix.
+让x1转置作为我设计矩阵的第一行
+
+191
+00:06:55,153 --> 00:06:56,225
+Then I am going to take my
+然后我要把我的
+
+192
+00:06:56,225 --> 00:06:58,682
+second training example, x2, take
+第二个训练样本x2
+
+193
+00:06:58,682 --> 00:07:00,437
+the transpose of that and
+进行转置 让它作为X的第二行
+
+194
+00:07:00,437 --> 00:07:01,838
+put that as the second row
+进行转置 让它作为X的第二行
+
+195
+00:07:01,838 --> 00:07:04,068
+of x and so on,
+以此类推
+
+196
+00:07:04,068 --> 00:07:07,206
+down until my last training example.
+直到最后一个训练样本
+
+197
+00:07:07,206 --> 00:07:09,279
+Take the transpose of that,
+取它的转置作为矩阵X的最后一行
+
+198
+00:07:09,279 --> 00:07:10,850
+and that's my last row of
+取它的转置作为矩阵X的最后一行
+
+199
+00:07:10,850 --> 00:07:12,665
+my matrix X. And, so,
+取它的转置作为矩阵X的最后一行
+
+200
+00:07:12,665 --> 00:07:14,418
+that makes my matrix X, an
+这样矩阵X就是一个m*(n+1)维矩阵
+
+201
+00:07:14,418 --> 00:07:17,129
+M by N +1
+这样矩阵X就是一个m*(n+1)维矩阵
+
+202
+00:07:17,129 --> 00:07:19,836
+dimensional matrix.
+这样矩阵X就是一个m*(n+1)维矩阵
+
+203
+00:07:19,836 --> 00:07:21,953
+As a concrete example, let's
+举个具体的例子
+
+204
+00:07:21,953 --> 00:07:23,505
+say I have only one
+假如我只有一个特征变量
+
+205
+00:07:23,505 --> 00:07:24,670
+feature, really, only one
+就是说除了x0之外只有一个特征变量
+
+206
+00:07:24,670 --> 00:07:26,631
+feature other than X zero,
+就是说除了x0之外只有一个特征变量
+
+207
+00:07:26,631 --> 00:07:28,165
+which is always equal to 1.
+而x0始终为1
+
+208
+00:07:28,165 --> 00:07:30,376
+So if my feature vectors
+所以如果我的特征向量
+
+209
+00:07:30,376 --> 00:07:32,186
+X-i are equal to this
+xi等于1 也就是x0 和某个实际的特征变量
+
+210
+00:07:32,186 --> 00:07:33,878
+1, which is X-0, then
+xi等于1 也就是x0 和某个实际的特征变量
+
+211
+00:07:33,878 --> 00:07:35,912
+some real feature, like maybe the
+xi等于1 也就是x0 和某个实际的特征变量
+
+212
+00:07:35,912 --> 00:07:37,662
+size of the house, then my
+比如说房屋大小
+
+213
+00:07:37,662 --> 00:07:40,947
+design matrix, X, would be equal to this.
+那么我的设计矩阵X会是这样
+
+214
+00:07:40,947 --> 00:07:42,589
+For the first row, I'm going
+第一行 就是这个的转置
+
+215
+00:07:42,589 --> 00:07:46,071
+to basically take this and take its transpose.
+第一行 就是这个的转置
+
+216
+00:07:46,071 --> 00:07:51,644
+So, I'm going to end up with 1, and then X-1-1.
+所以最后得到1 然后x(1)1
+
+217
+00:07:51,644 --> 00:07:53,309
+For the second row, we're going to end
+对于第二行 我们得到1 然后x(1)2
+
+218
+00:07:53,309 --> 00:07:56,077
+up with 1 and then
+对于第二行 我们得到1 然后x(1)2
+
+219
+00:07:56,077 --> 00:07:58,046
+X-1-2 and so
+对于第二行 我们得到1 然后x(1)2
+
+220
+00:07:58,046 --> 00:07:59,046
+on down to 1, and
+这样直到1 然后x(1)m
+
+221
+00:07:59,046 --> 00:08:01,420
+then X-1-M.
+这样直到1 然后x(1)m
+
+222
+00:08:01,420 --> 00:08:03,084
+And thus, this will be
+这样 这就会是一个m*2维矩阵
+
+223
+00:08:03,084 --> 00:08:07,776
+a m by 2-dimensional matrix.
+这样 这就会是一个m*2维矩阵
+
+224
+00:08:07,776 --> 00:08:08,821
+So, that's how to construct
+所以 这就是如何构建矩阵X 和向量y
+
+225
+00:08:08,821 --> 00:08:11,251
+the matrix X. And, the
+所以 这就是如何构建矩阵X 和向量y
+
+226
+00:08:11,251 --> 00:08:13,886
+vector Y--sometimes I might
+有时我可能会在上面画一个箭头
+
+227
+00:08:13,886 --> 00:08:15,487
+write an arrow on top to
+有时我可能会在上面画一个箭头
+
+228
+00:08:15,487 --> 00:08:16,541
+denote that it is a vector,
+来表示这是一个向量
+
+229
+00:08:16,541 --> 00:08:19,871
+but very often I'll just write this as Y, either way.
+但很多时候 我就只写y 是一样的
+
+230
+00:08:19,871 --> 00:08:21,182
+The vector Y is obtained by
+向量y这样求得的
+
+231
+00:08:21,182 --> 00:08:23,275
+taking all all the labels,
+把所有标签
+
+232
+00:08:23,275 --> 00:08:25,098
+all the correct prices of
+所有训练集中正确的房子价格
+
+233
+00:08:25,098 --> 00:08:27,076
+houses in my training set, and
+所有训练集中正确的房子价格
+
+234
+00:08:27,076 --> 00:08:28,963
+just stacking them up into
+放在一起 得到一个m维向量y
+
+235
+00:08:28,963 --> 00:08:32,011
+an M-dimensional vector, and
+放在一起 得到一个m维向量y
+
+236
+00:08:32,011 --> 00:08:34,511
+that's Y. Finally, having
+最后 构建完矩阵X和向量y
+
+237
+00:08:34,511 --> 00:08:36,724
+constructed the matrix X
+最后 构建完矩阵X和向量y
+
+238
+00:08:36,724 --> 00:08:38,184
+and the vector Y, we then
+我们就可以通过计算X转置乘以X的逆乘以X转置乘以y来得到θ
+
+239
+00:08:38,184 --> 00:08:40,887
+just compute theta as X'(1/X)
+我们就可以通过计算X转置乘以X的逆乘以X转置乘以y来得到θ
+
+240
+00:08:40,887 --> 00:08:47,243
+x X'Y. I just
+我们就可以通过计算X转置乘以X的逆乘以X转置乘以y来得到θ
+
+241
+00:08:47,243 --> 00:08:49,356
+want to make
+我现在就想确保你明白这个等式
+
+242
+00:08:49,356 --> 00:08:51,348
+I just want to make sure that this equation makes sense to you
+我现在就想确保你明白这个等式
+
+243
+00:08:51,348 --> 00:08:52,242
+and that you know how to implement it.
+并且知道如何实现它
+
+244
+00:08:52,242 --> 00:08:55,221
+So, you know, concretely, what is this X'(1/X)?
+所以具体来说 什么是X的转置乘以X的逆?
+
+245
+00:08:55,221 --> 00:08:57,903
+Well, X'(1/X) is the
+X的转置乘以X的逆是X转置乘以X的逆矩阵
+
+246
+00:08:57,903 --> 00:09:02,101
+inverse of the matrix X'X.
+X的转置乘以X的逆是X转置乘以X的逆矩阵
+
+247
+00:09:02,101 --> 00:09:04,498
+Concretely, if you were
+具体来说
+
+248
+00:09:04,498 --> 00:09:08,055
+to say set A to
+如果你令A等于X转置乘以X
+
+249
+00:09:08,055 --> 00:09:11,120
+be equal to X' x
+如果你令A等于X转置乘以X
+
+250
+00:09:11,120 --> 00:09:12,542
+X, so X' is a
+X的转置是一个矩阵
+
+251
+00:09:12,542 --> 00:09:14,063
+matrix, X' x X
+X的转置乘以X是另一个矩阵
+
+252
+00:09:14,063 --> 00:09:15,305
+gives you another matrix, and we
+X的转置乘以X是另一个矩阵
+
+253
+00:09:15,305 --> 00:09:17,560
+call that matrix A. Then, you
+我们把这个矩阵称为A
+
+254
+00:09:17,560 --> 00:09:19,968
+know, X'(1/X) is just
+那么 X转置乘以X的逆就是矩阵A的逆
+
+255
+00:09:19,968 --> 00:09:22,352
+you take this matrix A and you invert it, right!
+那么 X转置乘以X的逆就是矩阵A的逆
+
+256
+00:09:23,245 --> 00:09:24,417
+This gives, let's say 1/A.
+也就是1/A
+
+257
+00:09:26,025 --> 00:09:28,919
+And so that's how you compute this thing.
+这就是计算过程
+
+258
+00:09:28,919 --> 00:09:31,451
+You compute X'X and then you compute its inverse.
+先计算X转置乘以X 然后计算它的逆
+
+259
+00:09:31,451 --> 00:09:34,296
+We haven't yet talked about Octave.
+我们还没有谈到Octave
+
+260
+00:09:34,296 --> 00:09:35,941
+We'll do so in the later
+我们将在之后的视频中谈到这个
+
+261
+00:09:35,941 --> 00:09:37,211
+set of videos, but in the
+但是在Octave编程语言
+
+262
+00:09:37,211 --> 00:09:39,073
+Octave programming language or a
+但是在Octave编程语言
+
+263
+00:09:39,073 --> 00:09:40,652
+similar view, and also the
+或者类似的MATLAB编程语言里是类似的
+
+264
+00:09:40,652 --> 00:09:42,957
+matlab programming language is very similar.
+或者类似的MATLAB编程语言里是类似的
+
+265
+00:09:42,957 --> 00:09:46,937
+The command to compute this quantity,
+计算这个量的命令
+
+266
+00:09:47,384 --> 00:09:50,326
+X transpose X inverse times
+X转置乘以X的逆乘以X转置乘以y
+
+267
+00:09:50,326 --> 00:09:52,537
+X transpose Y, is as follows.
+的代码命令如下所示
+
+268
+00:09:52,537 --> 00:09:54,903
+In Octave X prime is
+在Octave中 X’表示X转置
+
+269
+00:09:54,903 --> 00:09:58,354
+the notation that you use to denote X transpose.
+在Octave中 X’表示X转置
+
+270
+00:09:58,354 --> 00:10:00,737
+And so, this expression that's
+这个用红色框起来的表达式
+
+271
+00:10:00,737 --> 00:10:03,588
+boxed in red, that's computing
+计算的是X转置乘以X
+
+272
+00:10:03,588 --> 00:10:06,633
+X transpose times X.
+计算的是X转置乘以X
+
+273
+00:10:06,633 --> 00:10:08,551
+pinv is a function for
+pinv是用来计算逆矩阵的函数
+
+274
+00:10:08,551 --> 00:10:09,701
+computing the inverse of
+pinv是用来计算逆矩阵的函数
+
+275
+00:10:09,701 --> 00:10:11,818
+a matrix, so this computes
+所以这个计算X转置乘以X的逆
+
+276
+00:10:11,818 --> 00:10:14,656
+X transpose X inverse,
+所以这个计算X转置乘以X的逆
+
+277
+00:10:14,656 --> 00:10:16,453
+and then you multiply that by
+然后乘以X转置 再乘以y
+
+278
+00:10:16,453 --> 00:10:18,267
+X transpose, and you multiply
+然后乘以X转置 再乘以y
+
+279
+00:10:18,267 --> 00:10:19,712
+that by Y. So you
+然后乘以X转置 再乘以y
+
+280
+00:10:19,712 --> 00:10:22,325
+end computing that formula
+这样就算完了这个式子
+
+281
+00:10:22,325 --> 00:10:24,369
+which I didn't prove,
+我没有证明这个式子
+
+282
+00:10:24,369 --> 00:10:25,994
+but it is possible to
+尽管我并不打算这么做
+
+283
+00:10:25,994 --> 00:10:27,382
+show mathematically even though I'm
+但是数学上是可以证明的
+
+284
+00:10:27,382 --> 00:10:28,537
+not going to do so
+这个式子会给出最优的θ值
+
+285
+00:10:28,537 --> 00:10:31,071
+here, that this formula gives you
+这个式子会给出最优的θ值
+
+286
+00:10:31,071 --> 00:10:32,316
+the optimal value of theta
+这个式子会给出最优的θ值
+
+287
+00:10:32,316 --> 00:10:34,865
+in the sense that if you set theta equal
+就是说如果你令θ等于这个
+
+288
+00:10:34,865 --> 00:10:36,512
+to this, that's the value
+就是说如果你令θ等于这个
+
+289
+00:10:36,512 --> 00:10:38,000
+of theta that minimizes the
+这个θ值会最小化这个线性回归的代价函数J(θ)
+
+290
+00:10:38,000 --> 00:10:40,169
+cost function J of theta
+这个θ值会最小化这个线性回归的代价函数J(θ)
+
+291
+00:10:40,169 --> 00:10:41,993
+for the new regression.
+这个θ值会最小化这个线性回归的代价函数J(θ)
+
+292
+00:10:41,993 --> 00:10:44,530
+One last detail in the earlier video.
+最后一点
+
+293
+00:10:44,530 --> 00:10:46,131
+I talked about the feature
+在之前视频中我提到特征变量归一化
+
+294
+00:10:46,131 --> 00:10:47,061
+skill and the idea of
+在之前视频中我提到特征变量归一化
+
+295
+00:10:47,061 --> 00:10:48,878
+getting features to be
+和让特征变量在相似的范围内的想法
+
+296
+00:10:48,878 --> 00:10:50,726
+on similar ranges of
+和让特征变量在相似的范围内的想法
+
+297
+00:10:50,726 --> 00:10:54,900
+Scales of similar ranges of values of each other.
+将所有的值归一化在类似范围内
+
+298
+00:10:54,900 --> 00:10:56,872
+If you are using this normal
+如果你使用标准方程发
+
+299
+00:10:56,872 --> 00:10:59,843
+equation method then feature
+那么就不需要归一化特征变量
+
+300
+00:10:59,843 --> 00:11:02,315
+scaling isn't actually necessary
+那么就不需要归一化特征变量
+
+301
+00:11:02,315 --> 00:11:04,361
+and is actually okay if,
+实际上这是没问题的
+
+302
+00:11:04,361 --> 00:11:06,094
+say, some feature X one
+如果某个特征变量x1在0到1的区间
+
+303
+00:11:06,094 --> 00:11:07,552
+is between zero and one,
+如果某个特征变量x1在0到1的区间
+
+304
+00:11:07,552 --> 00:11:08,846
+and some feature X two is
+某个特征变量x2在0到1000的区间
+
+305
+00:11:08,846 --> 00:11:10,550
+between ranges from zero to
+某个特征变量x2在0到1000的区间
+
+306
+00:11:10,550 --> 00:11:12,019
+one thousand and some feature
+某个特征变量x2在0到1000的区间
+
+307
+00:11:12,019 --> 00:11:14,159
+x three ranges from zero
+某个特征变量x3在0到10^-5的区间
+
+308
+00:11:14,159 --> 00:11:15,822
+to ten to the
+某个特征变量x3在0到10^-5的区间
+
+309
+00:11:15,822 --> 00:11:17,263
+minus five and if
+某个特征变量x3在0到10^-5的区间
+
+310
+00:11:17,263 --> 00:11:18,321
+you are using the normal equation method
+然后如果使用标准方程法
+
+311
+00:11:18,321 --> 00:11:20,296
+this is okay and there is
+这样就没有问题
+
+312
+00:11:20,296 --> 00:11:21,550
+no need to do features
+不需要做特征变量归一化
+
+313
+00:11:21,550 --> 00:11:22,740
+scaling, although of course
+尽管如果你使用梯度下降法
+
+314
+00:11:22,740 --> 00:11:25,667
+if you are using gradient descent,
+尽管如果你使用梯度下降法
+
+315
+00:11:25,667 --> 00:11:27,814
+then, features scaling is still important.
+特征变量归一化仍然很重要
+
+316
+00:11:28,030 --> 00:11:31,020
+Finally, where should you use the gradient descent
+最后 你何时应该使用梯度下降法
+
+317
+00:11:31,020 --> 00:11:33,273
+and when should you use the normal equation method.
+而何时应该使用标准方程法
+
+318
+00:11:33,273 --> 00:11:35,800
+Here are some of the their advantages and disadvantages.
+这里列举了一些它们的优点和缺点
+
+319
+00:11:35,800 --> 00:11:38,305
+Let's say you have m training
+假如你有m个训练样本和n个特征变量
+
+320
+00:11:38,305 --> 00:11:40,918
+examples and n features.
+假如你有m个训练样本和n个特征变量
+
+321
+00:11:40,918 --> 00:11:42,854
+One disadvantage of gradient descent
+梯度下降法的缺点之一就是
+
+322
+00:11:42,854 --> 00:11:46,015
+is that, you need to choose the learning rate Alpha.
+你需要选择学习速率α
+
+323
+00:11:46,015 --> 00:11:47,374
+And, often, this means running
+这通常表示需要运行多次 尝试不同的学习速率α
+
+324
+00:11:47,374 --> 00:11:49,128
+it few times with different learning
+这通常表示需要运行多次 尝试不同的学习速率α
+
+325
+00:11:49,128 --> 00:11:51,154
+rate alphas and then seeing what works best.
+然后找到运行效果最好的那个
+
+326
+00:11:51,154 --> 00:11:54,274
+And so that is sort of extra work and extra hassle.
+所以这是一种额外的工作和麻烦
+
+327
+00:11:54,274 --> 00:11:55,976
+Another disadvantage with gradient descent
+梯度下降法的另一个缺点是
+
+328
+00:11:55,976 --> 00:11:57,841
+is it needs many more iterations.
+它需要更多次的迭代
+
+329
+00:11:57,841 --> 00:11:59,346
+So, depending on the details,
+因为一些细节 计算可能会更慢
+
+330
+00:11:59,346 --> 00:12:00,839
+that could make it slower, although
+因为一些细节 计算可能会更慢
+
+331
+00:12:00,839 --> 00:12:04,391
+there's more to the story as we'll see in a second.
+我们一会儿会看到更多的东西
+
+332
+00:12:04,391 --> 00:12:07,544
+As for the normal equation, you don't need to choose any learning rate alpha.
+至于标准方程 你不需要选择学习速率α
+
+333
+00:12:07,821 --> 00:12:11,208
+So that, you know, makes it really convenient, makes it simple to implement.
+所以就非常方便 也容易实现
+
+334
+00:12:11,208 --> 00:12:13,888
+You just run it and it usually just works.
+你只要运行一下 通常这就够了
+
+335
+00:12:13,888 --> 00:12:15,061
+And you don't need to
+并且你也不需要迭代
+
+336
+00:12:15,061 --> 00:12:16,129
+iterate, so, you don't need
+所以不需要画出J(θ)的曲线
+
+337
+00:12:16,129 --> 00:12:17,456
+to plot J of Theta or
+所以不需要画出J(θ)的曲线
+
+338
+00:12:17,456 --> 00:12:20,497
+check the convergence or take all those extra steps.
+来检查收敛性或者采取所有的额外步骤
+
+339
+00:12:20,497 --> 00:12:21,931
+So far, the balance seems to
+到目前为止
+
+340
+00:12:21,931 --> 00:12:23,846
+favor normal the normal equation.
+天平似乎倾向于标准方程法
+
+341
+00:12:24,826 --> 00:12:27,085
+Here are some disadvantages of
+这里列举一些标准方程法的缺点
+
+342
+00:12:27,612 --> 00:12:29,435
+the normal equation, and some advantages of gradient descent.
+和梯度下降法的优点
+
+343
+00:12:29,681 --> 00:12:31,447
+Gradient descent works pretty well,
+梯度下降法在有很多特征变量的情况下也能运行地相当好
+
+344
+00:12:31,928 --> 00:12:34,698
+even when you have a very large number of features.
+梯度下降法在有很多特征变量的情况下也能运行地相当好
+
+345
+00:12:34,698 --> 00:12:36,168
+So, even if you
+所以即使你有上百万的特征变量
+
+346
+00:12:36,168 --> 00:12:37,812
+have millions of features you
+所以即使你有上百万的特征变量
+
+347
+00:12:37,812 --> 00:12:40,865
+can run gradient descent and it will be reasonably efficient.
+你可以运行梯度下降法 并且通常很有效
+
+348
+00:12:40,865 --> 00:12:43,381
+It will do something reasonable.
+它会正常的运行
+
+349
+00:12:43,381 --> 00:12:46,566
+In contrast to normal equation, In, in
+相对地 标准方程法
+
+350
+00:12:46,566 --> 00:12:48,014
+order to solve for the parameters
+为了求解参数θ 需要求解这一项
+
+351
+00:12:48,014 --> 00:12:50,394
+data, we need to solve for this term.
+为了求解参数θ 需要求解这一项
+
+352
+00:12:50,394 --> 00:12:53,058
+We need to compute this term, X transpose, X inverse.
+我们需要计算这项X转置乘以X的逆
+
+353
+00:12:53,058 --> 00:12:56,328
+This matrix X transpose X.
+这个X转置乘以X矩阵是一个n*n的矩阵
+
+354
+00:12:56,328 --> 00:13:00,206
+That's an n by n matrix, if you have n features.
+如果你有n个特征变量的话
+
+355
+00:13:00,770 --> 00:13:02,947
+Because, if you look
+因为如果你看一下X转置乘以X的维度
+
+356
+00:13:02,947 --> 00:13:03,917
+at the dimensions of
+因为如果你看一下X转置乘以X的维度
+
+357
+00:13:03,917 --> 00:13:05,529
+X transpose the dimension of
+因为如果你看一下X转置乘以X的维度
+
+358
+00:13:05,529 --> 00:13:07,024
+X, you multiply, figure out what
+你可以发现他们的积的维度
+
+359
+00:13:07,024 --> 00:13:08,749
+the dimension of the product
+你可以发现他们的积的维度
+
+360
+00:13:08,749 --> 00:13:10,983
+is, the matrix X transpose
+X转置乘以X是一个n*n的矩阵
+
+361
+00:13:10,983 --> 00:13:13,727
+X is an n by n matrix where
+X转置乘以X是一个n*n的矩阵
+
+362
+00:13:13,727 --> 00:13:15,853
+n is the number of features, and
+其中 n是特征变量的数量
+
+363
+00:13:15,853 --> 00:13:18,641
+for almost computed implementations
+实现逆矩阵计算所需要的计算量
+
+364
+00:13:18,641 --> 00:13:20,990
+the cost of inverting
+实现逆矩阵计算所需要的计算量
+
+365
+00:13:20,990 --> 00:13:23,087
+the matrix, rose roughly as
+大致是矩阵维度的三次方
+
+366
+00:13:23,087 --> 00:13:25,707
+the cube of the dimension of the matrix.
+大致是矩阵维度的三次方
+
+367
+00:13:25,707 --> 00:13:28,180
+So, computing this inverse costs,
+因此计算这个逆矩阵需要计算大致n的三次方
+
+368
+00:13:28,180 --> 00:13:29,964
+roughly order, and cube time.
+因此计算这个逆矩阵需要计算大致n的三次方
+
+369
+00:13:29,964 --> 00:13:31,213
+Sometimes, it's slightly faster than
+有时稍微比计算n的三次方快一些
+
+370
+00:13:31,213 --> 00:13:35,050
+N cube but, it's, you know, close enough for our purposes.
+但是对我们来说很接近
+
+371
+00:13:35,489 --> 00:13:36,605
+So if n the number of features is very large,
+所以如果特征变量的数量n很大的话
+
+372
+00:13:37,643 --> 00:13:39,025
+then computing this
+那么计算这个量会很慢
+
+373
+00:13:39,025 --> 00:13:40,570
+quantity can be slow and
+那么计算这个量会很慢
+
+374
+00:13:40,570 --> 00:13:44,289
+the normal equation method can actually be much slower.
+实际上标准方程法会慢很多
+
+375
+00:13:44,289 --> 00:13:45,491
+So if n is
+因此如果n很大
+
+376
+00:13:45,491 --> 00:13:47,622
+large then I might
+因此如果n很大
+
+377
+00:13:47,622 --> 00:13:49,490
+usually use gradient descent because
+我可能还是会使用梯度下降法
+
+378
+00:13:49,490 --> 00:13:51,872
+we don't want to pay this all in q time.
+因为我们不想花费n的三次方的时间
+
+379
+00:13:51,872 --> 00:13:53,525
+But, if n is relatively small,
+但如果n比较小
+
+380
+00:13:53,525 --> 00:13:57,395
+then the normal equation might give you a better way to solve the parameters.
+那么标准方程法可能更好地求解参数θ
+
+381
+00:13:57,395 --> 00:13:59,080
+What does small and large mean?
+那么怎么叫大或者小呢?
+
+382
+00:13:59,080 --> 00:14:00,741
+Well, if n is on
+那么 如果n是上百的
+
+383
+00:14:00,741 --> 00:14:02,130
+the order of a hundred, then
+那么 如果n是上百的
+
+384
+00:14:02,130 --> 00:14:03,822
+inverting a hundred-by-hundred matrix is
+计算百位数乘百位数的矩阵
+
+385
+00:14:03,822 --> 00:14:06,539
+no problem by modern computing standards.
+对于现代计算机来说没有问题
+
+386
+00:14:06,539 --> 00:14:10,966
+If n is a thousand, I would still use the normal equation method.
+如果n是上千的 我还会使用标准方程法
+
+387
+00:14:10,966 --> 00:14:12,583
+Inverting a thousand-by-thousand matrix is
+千位数乘千位数的矩阵做逆变换
+
+388
+00:14:12,583 --> 00:14:15,408
+actually really fast on a modern computer.
+对于现代计算机来说实际上是非常快的
+
+389
+00:14:15,408 --> 00:14:18,406
+If n is ten thousand, then I might start to wonder.
+但如果n上万 那么我可能会开始犹豫
+
+390
+00:14:18,406 --> 00:14:20,618
+Inverting a ten-thousand- by-ten-thousand matrix
+上万乘上万维的矩阵作逆变换
+
+391
+00:14:20,618 --> 00:14:22,208
+starts to get kind of slow,
+会开始有点慢
+
+392
+00:14:22,208 --> 00:14:23,471
+and I might then start to
+此时我可能开始倾向于
+
+393
+00:14:23,471 --> 00:14:25,007
+maybe lean in the
+此时我可能开始倾向于
+
+394
+00:14:25,007 --> 00:14:27,007
+direction of gradient descent, but maybe not quite.
+梯度下降法 但也不绝对
+
+395
+00:14:27,114 --> 00:14:28,672
+n equals ten thousand, you can
+n等于一万 你可以
+
+396
+00:14:28,672 --> 00:14:31,148
+sort of convert a ten-thousand-by-ten-thousand matrix.
+逆变换一个一万乘一万的矩阵
+
+397
+00:14:31,148 --> 00:14:34,345
+But if it gets much bigger than that, then, I would probably use gradient descent.
+但如果n远大于此 我可能就会使用梯度下降法了
+
+398
+00:14:34,345 --> 00:14:35,834
+So, if n equals ten
+所以如果n等于10^6
+
+399
+00:14:35,834 --> 00:14:36,920
+to the sixth with a million
+有一百万个特征变量
+
+400
+00:14:36,920 --> 00:14:38,963
+features, then inverting a
+那么做百万乘百万的矩阵的逆变换
+
+401
+00:14:38,963 --> 00:14:41,565
+million-by-million matrix is going
+那么做百万乘百万的矩阵的逆变换
+
+402
+00:14:41,565 --> 00:14:42,631
+to be very expensive, and
+就会变得非常费时间
+
+403
+00:14:42,631 --> 00:14:46,163
+I would definitely favor gradient descent if you have that many features.
+在这种情况下我一定会使用梯度下降法
+
+404
+00:14:46,163 --> 00:14:47,859
+So exactly how large
+所以很难给出一个确定的值
+
+405
+00:14:47,859 --> 00:14:49,282
+set of features has to be
+来决定何时该换成梯度下降法
+
+406
+00:14:49,282 --> 00:14:52,655
+before you convert a gradient descent, it's hard to give a strict number.
+来决定何时该换成梯度下降法
+
+407
+00:14:52,655 --> 00:14:53,855
+But, for me, it is usually
+但是 对我来说通常是
+
+408
+00:14:53,855 --> 00:14:55,501
+around ten thousand that I might
+在一万左右 我会开始考虑换成梯度下降法
+
+409
+00:14:55,501 --> 00:14:58,258
+start to consider switching over
+在一万左右 我会开始考虑换成梯度下降法
+
+410
+00:14:58,335 --> 00:15:00,663
+to gradient descents or maybe,
+在一万左右 我会开始考虑换成梯度下降法
+
+411
+00:15:00,663 --> 00:15:04,324
+some other algorithms that we'll talk about later in this class.
+或者我们将在以后讨论到的其他算法
+
+412
+00:15:04,324 --> 00:15:05,765
+To summarize, so long
+总结一下
+
+413
+00:15:05,765 --> 00:15:06,999
+as the number of features is
+只要特征变量的数目并不大
+
+414
+00:15:06,999 --> 00:15:08,475
+not too large, the normal equation
+标准方程是一个很好的
+
+415
+00:15:08,475 --> 00:15:12,229
+gives us a great alternative method to solve for the parameter theta.
+计算参数θ的替代方法
+
+416
+00:15:12,583 --> 00:15:13,983
+Concretely, so long as
+具体地说 只要特征变量数量小于一万
+
+417
+00:15:13,983 --> 00:15:15,749
+the number of features is less
+具体地说 只要特征变量数量小于一万
+
+418
+00:15:15,749 --> 00:15:17,472
+than 1000, you know, I would
+我通常使用标准方程法
+
+419
+00:15:17,472 --> 00:15:18,881
+use, I would usually is used
+我通常使用标准方程法
+
+420
+00:15:18,881 --> 00:15:21,955
+in normal equation method rather than, gradient descent.
+而不使用梯度下降法
+
+421
+00:15:21,955 --> 00:15:23,549
+To preview some ideas that
+预告一下在之后的课程中我们要讲的
+
+422
+00:15:23,549 --> 00:15:24,493
+we'll talk about later in this
+预告一下在之后的课程中我们要讲的
+
+423
+00:15:24,493 --> 00:15:26,235
+course, as we get
+随着我们要讲的学习算法越来越复杂
+
+424
+00:15:26,235 --> 00:15:27,912
+to the more complex learning algorithm, for
+随着我们要讲的学习算法越来越复杂
+
+425
+00:15:27,912 --> 00:15:29,617
+example, when we talk about
+例如 当我们讲到分类算法
+
+426
+00:15:29,617 --> 00:15:32,188
+classification algorithm, like a logistic regression algorithm,
+像逻辑回归算法
+
+427
+00:15:32,834 --> 00:15:34,319
+We'll see that those algorithm
+我们会看到
+
+428
+00:15:34,319 --> 00:15:35,467
+actually...
+ 实际上对于那些算法
+
+429
+00:15:35,467 --> 00:15:37,592
+The normal equation method actually do not work
+并不能使用标准方程法
+
+430
+00:15:37,592 --> 00:15:39,388
+for those more sophisticated
+对于那些更复杂的学习算法
+
+431
+00:15:39,388 --> 00:15:41,190
+learning algorithms, and, we
+我们将不得不仍然使用梯度下降法
+
+432
+00:15:41,190 --> 00:15:43,916
+will have to resort to gradient descent for those algorithms.
+我们将不得不仍然使用梯度下降法
+
+433
+00:15:43,916 --> 00:15:46,682
+So, gradient descent is a very useful algorithm to know.
+因此 梯度下降法是一个非常有用的算法
+
+434
+00:15:46,682 --> 00:15:48,859
+The linear regression will have
+可以用在有大量特征变量的线性回归问题
+
+435
+00:15:48,982 --> 00:15:50,017
+a large number of features and
+可以用在有大量特征变量的线性回归问题
+
+436
+00:15:50,017 --> 00:15:52,373
+for some of the other algorithms
+或者我们以后在课程中
+
+437
+00:15:52,373 --> 00:15:53,893
+that we'll see in
+会讲到的一些其他的算法
+
+438
+00:15:53,893 --> 00:15:55,438
+this course, because, for them, the normal
+因为 标准方程法不适合或者不能用在它们上
+
+439
+00:15:55,438 --> 00:15:58,747
+equation method just doesn't apply and doesn't work.
+因为 标准方程法不适合或者不能用在它们上
+
+440
+00:15:58,747 --> 00:16:00,537
+But for this specific model of
+但对于这个特定的线性回归模型
+
+441
+00:16:00,537 --> 00:16:02,904
+linear regression, the normal equation
+但对于这个特定的线性回归模型
+
+442
+00:16:02,904 --> 00:16:05,827
+can give you a alternative
+标准方程法是一个
+
+443
+00:16:07,219 --> 00:16:08,612
+that can be much faster, than gradient descent.
+比梯度下降法更快的替代算法
+
+444
+00:16:09,604 --> 00:16:11,920
+So, depending on the detail of your algortithm,
+所以 根据具体的问题
+
+445
+00:16:12,007 --> 00:16:14,164
+depending of the detail of the problems and
+所以 根据具体的问题
+
+446
+00:16:14,164 --> 00:16:15,550
+how many features that you have,
+以及你的特征变量的数量
+
+447
+00:16:15,550 --> 00:16:19,550
+both of these algorithms are well worth knowing about.
+这两算法都是值得学习的
+
diff --git a/srt/4 - 7 - Normal Equation Noninvertibility (Optional) (6 min).srt b/srt/4 - 7 - Normal Equation Noninvertibility (Optional) (6 min).srt
new file mode 100644
index 00000000..84d0ad16
--- /dev/null
+++ b/srt/4 - 7 - Normal Equation Noninvertibility (Optional) (6 min).srt
@@ -0,0 +1,897 @@
+1
+00:00:00,000 --> 00:00:03,162
+在这段视频中我想谈谈正规方程 ( normal equation )
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:00,000 --> 00:00:03,162
+In this video, I want to talk about the normal equation
+
+3
+00:00:03,162 --> 00:00:05,212
+以及它们的不可逆性
+
+4
+00:00:03,162 --> 00:00:05,212
+and non-invertibility.
+
+5
+00:00:05,212 --> 00:00:07,877
+尽管是一种较为深入的概念
+
+6
+00:00:05,212 --> 00:00:07,877
+This is a somewhat more advanced concept,
+
+7
+00:00:07,877 --> 00:00:10,289
+但总有人问我有关这方面的问题
+
+8
+00:00:07,877 --> 00:00:10,289
+but it is something that I've often been asked about.
+
+9
+00:00:10,289 --> 00:00:12,711
+因此 我想在这里来讨论它
+
+10
+00:00:10,289 --> 00:00:12,711
+And so I wanted to talk about it here.
+
+11
+00:00:12,711 --> 00:00:14,752
+由于概念较为深入
+
+12
+00:00:12,711 --> 00:00:14,752
+But this is a somewhat more advanced concept,
+
+13
+00:00:14,752 --> 00:00:17,982
+所以对这段可选材料大家放轻松吧
+
+14
+00:00:14,752 --> 00:00:17,982
+so feel free to consider this optional material
+
+15
+00:00:17,982 --> 00:00:22,413
+下面
+
+16
+00:00:17,982 --> 00:00:22,413
+There's a phenomenon that you may run into
+
+17
+00:00:22,413 --> 00:00:24,416
+举一个比较实用的例子
+
+18
+00:00:22,413 --> 00:00:24,416
+that's maybe for some of you useful to understand.
+
+19
+00:00:24,416 --> 00:00:26,619
+这是一个关于正规方程和线性回归的例子
+
+20
+00:00:24,416 --> 00:00:26,619
+But even if you don't understand it,
+
+21
+00:00:26,619 --> 00:00:28,450
+即使你没能理解
+
+22
+00:00:26,619 --> 00:00:28,450
+the normal equation and linear regression,
+
+23
+00:00:28,450 --> 00:00:30,539
+也没有关系
+
+24
+00:00:28,450 --> 00:00:30,539
+you should really get that to work okay.
+
+25
+00:00:30,539 --> 00:00:33,195
+问题如下
+
+26
+00:00:30,539 --> 00:00:33,195
+Here's the issue:
+
+27
+00:00:33,195 --> 00:00:35,691
+你或许可能对
+
+28
+00:00:33,195 --> 00:00:35,691
+For those of you that are maybe somewhat
+
+29
+00:00:35,691 --> 00:00:37,876
+线性代数比较熟悉
+
+30
+00:00:35,691 --> 00:00:37,876
+more familar with linear algebra,
+
+31
+00:00:37,876 --> 00:00:39,884
+有些同学曾经问过我
+
+32
+00:00:37,876 --> 00:00:39,884
+what some students have asked me is,
+
+33
+00:00:39,884 --> 00:00:42,542
+当计算
+
+34
+00:00:39,884 --> 00:00:42,542
+when computing this
+
+35
+00:00:42,542 --> 00:00:45,130
+θ等于inv(X'X ) X'y (注:X的转置翻译为X',下同)
+
+36
+00:00:42,542 --> 00:00:45,130
+theta equals ( Xtranspose X )inverse Xtranspose y
+
+37
+00:00:45,130 --> 00:00:49,476
+那对于矩阵X'X的结果是不可逆的情况呢?
+
+38
+00:00:45,130 --> 00:00:49,476
+what if the matrix Xtranspose X is non-invertible?
+
+39
+00:00:49,476 --> 00:00:52,336
+如果你懂一点线性代数的知识
+
+40
+00:00:49,476 --> 00:00:52,336
+So, for those of you that know a bit more linear algebra
+
+41
+00:00:52,336 --> 00:00:55,171
+你或许会知道
+
+42
+00:00:52,336 --> 00:00:55,171
+you may know that only some matrices
+
+43
+00:00:55,171 --> 00:00:58,598
+有些矩阵可逆 而有些矩阵不可逆
+
+44
+00:00:55,171 --> 00:00:58,598
+are invertible and some matrices do not have an inverse
+
+45
+00:00:58,598 --> 00:01:00,540
+我们称那些不可逆矩阵为
+
+46
+00:00:58,598 --> 00:01:00,540
+we call those non-invertible matrices,
+
+47
+00:01:00,540 --> 00:01:04,737
+奇异或退化矩阵
+
+48
+00:01:00,540 --> 00:01:04,737
+singular or degenerate matrices.
+
+49
+00:01:04,737 --> 00:01:08,893
+问题的重点在于X'X的不可逆的问题
+
+50
+00:01:04,737 --> 00:01:08,893
+The issue or the problem of Xtranpose X being non-invertible
+
+51
+00:01:08,893 --> 00:01:11,287
+很少发生
+
+52
+00:01:08,893 --> 00:01:11,287
+should happen pretty rarely.
+
+53
+00:01:11,287 --> 00:01:16,749
+在Octave里 如果你用它来实现θ的计算
+
+54
+00:01:11,287 --> 00:01:16,749
+And in Octave, if you implement this to compute theta,
+
+55
+00:01:16,749 --> 00:01:20,636
+你将会得到正解
+
+56
+00:01:16,749 --> 00:01:20,636
+it turns out that this will actually do the right thing.
+
+57
+00:01:20,636 --> 00:01:24,629
+在这里我不想赘述
+
+58
+00:01:20,636 --> 00:01:24,629
+I'm getting a little bit technical now and I don't want to go into details,
+
+59
+00:01:24,629 --> 00:01:28,207
+在Octave里 有两个函数可以求解矩阵的逆
+
+60
+00:01:24,629 --> 00:01:28,207
+but Octave has two functions for inverting matrices:
+
+61
+00:01:28,207 --> 00:01:32,146
+一个被称为pinv ( ) 另一个是inv ( )
+
+62
+00:01:28,207 --> 00:01:32,146
+One is called pinv(), and the other is called inv().
+
+63
+00:01:32,146 --> 00:01:36,089
+这两者之间的差异是些许计算过程上的
+
+64
+00:01:32,146 --> 00:01:36,089
+The differences between these two are somewhat technical.
+
+65
+00:01:36,089 --> 00:01:38,107
+一个是所谓的伪逆 另一个被称为逆
+
+66
+00:01:36,089 --> 00:01:38,107
+One's called the pseudo-inverse, one's called the inverse.
+
+67
+00:01:38,107 --> 00:01:42,658
+使用pinv ( ) 函数可以展现数学上的过程
+
+68
+00:01:38,107 --> 00:01:42,658
+You can show mathemically so as long as you use the pinv() function,
+
+69
+00:01:42,658 --> 00:01:47,145
+这将计算出θ的值
+
+70
+00:01:42,658 --> 00:01:47,145
+then this will actually compute the value of theta that you want,
+
+71
+00:01:47,145 --> 00:01:51,227
+即便矩阵X'X是不可逆的
+
+72
+00:01:47,145 --> 00:01:51,227
+even if Xtranspose X is non-invertible.
+
+73
+00:01:51,227 --> 00:01:54,095
+在pinv ( ) 和 inv ( ) 之间
+
+74
+00:01:51,227 --> 00:01:54,095
+The specific details between what is the difference between
+
+75
+00:01:54,095 --> 00:01:55,959
+又有哪些具体区别 ?
+
+76
+00:01:54,095 --> 00:01:55,959
+pinv() and what is inv()
+
+77
+00:01:55,959 --> 00:01:58,562
+其中inv ( ) 引入了先进的数值计算的概念
+
+78
+00:01:55,959 --> 00:01:58,562
+that is somewhat advanced numerical computing concepts,
+
+79
+00:01:58,562 --> 00:02:00,907
+我真的不希望讲那些
+
+80
+00:01:58,562 --> 00:02:00,907
+that I don't really want to get into.
+
+81
+00:02:00,907 --> 00:02:02,993
+因此 我认为
+
+82
+00:02:00,907 --> 00:02:02,993
+But I thought in this optional
+
+83
+00:02:02,993 --> 00:02:04,672
+可以试着给你一点点直观的参考
+
+84
+00:02:02,993 --> 00:02:04,672
+video I try to give you a little bit of intuition
+
+85
+00:02:04,672 --> 00:02:08,823
+关于矩阵X'X的不可逆的问题
+
+86
+00:02:04,672 --> 00:02:08,823
+about what it means that Xtranspose X to be non-invertible.
+
+87
+00:02:08,823 --> 00:02:12,108
+如果你懂一点线性代数
+
+88
+00:02:08,823 --> 00:02:12,108
+For those of you that know a bit more linear algebra
+
+89
+00:02:12,108 --> 00:02:13,556
+或许你可能会感兴趣
+
+90
+00:02:12,108 --> 00:02:13,556
+and might be interested.
+
+91
+00:02:13,556 --> 00:02:15,948
+我不会从数学的角度来证明它
+
+92
+00:02:13,556 --> 00:02:15,948
+I'm not going to proove this mathematically,
+
+93
+00:02:15,948 --> 00:02:18,684
+但如果矩阵X'X结果是不可逆的
+
+94
+00:02:15,948 --> 00:02:18,684
+but if Xtranspose X is non-invertible,
+
+95
+00:02:18,684 --> 00:02:22,596
+通常有两种最常见的原因
+
+96
+00:02:18,684 --> 00:02:22,596
+there are usually two most common causes:
+
+97
+00:02:22,596 --> 00:02:26,238
+第一个原因是 如果不知何故 在你的学习问题
+
+98
+00:02:22,596 --> 00:02:26,238
+The first cause is if somehow, in your learning problem,
+
+99
+00:02:26,238 --> 00:02:28,461
+你有多余的功能
+
+100
+00:02:26,238 --> 00:02:28,461
+you have redundant features,
+
+101
+00:02:28,461 --> 00:02:30,844
+例如 在预测住房价格时
+
+102
+00:02:28,461 --> 00:02:30,844
+concretely, if you try to predict housing prices
+
+103
+00:02:30,844 --> 00:02:34,877
+如果x1是以英尺为尺寸规格计算的房子
+
+104
+00:02:30,844 --> 00:02:34,877
+and if x1 is the size of a house in square-feet,
+
+105
+00:02:34,877 --> 00:02:37,792
+x2是以平方米为尺寸规格计算的房子
+
+106
+00:02:34,877 --> 00:02:37,792
+and x2 is the size of the house in square-meters,
+
+107
+00:02:37,792 --> 00:02:46,071
+同时 你也知道1米等于3 28英尺 ( 四舍五入到两位小数 )
+
+108
+00:02:37,792 --> 00:02:46,071
+then, you know, 1 meter is equal to 3.28 feet, rounded to two decimals,
+
+109
+00:02:46,071 --> 00:02:48,947
+这样 你的这两个特征值将始终满足约束
+
+110
+00:02:46,071 --> 00:02:48,947
+and so your two features will always satisfy the constraint
+
+111
+00:02:48,947 --> 00:02:55,378
+x1等于x2倍的3.28平方
+
+112
+00:02:48,947 --> 00:02:55,378
+that x1 equals 3(.28)^2 times x2.
+
+113
+00:02:55,378 --> 00:02:59,107
+并且你可以将这过程显示出来 讲到这里 可能 或许对你来说有点难了
+
+114
+00:02:55,378 --> 00:02:59,107
+And you can show, for those of you - this is somehwat advanced linear algebra now,
+
+115
+00:02:59,107 --> 00:03:01,169
+但如果你在线性代数上非常熟练
+
+116
+00:02:59,107 --> 00:03:01,169
+but if you're an expert in linear algebra,
+
+117
+00:03:01,169 --> 00:03:05,275
+实际上 你可以用这样的一个线性方程 来展示那两个相关联的特征值
+
+118
+00:03:01,169 --> 00:03:05,275
+you can actually show that if your two features are related via a linear equation like this,
+
+119
+00:03:05,275 --> 00:03:09,095
+矩阵X'X将是不可逆的
+
+120
+00:03:05,275 --> 00:03:09,095
+then matrix Xtranspose X will be non-invertible.
+
+121
+00:03:09,095 --> 00:03:13,320
+第二个原因是 在你想用大量的特征值
+
+122
+00:03:09,095 --> 00:03:13,320
+The second thing that can cause Xtranspose X to be non-invertible
+
+123
+00:03:13,320 --> 00:03:17,043
+尝试实践你的学习算法的时候
+
+124
+00:03:13,320 --> 00:03:17,043
+is if you're trying to run a learning algorithm
+
+125
+00:03:17,043 --> 00:03:18,850
+可能会导致矩阵X'X的结果是不可逆的
+
+126
+00:03:17,043 --> 00:03:18,850
+with a lot of a features.
+
+127
+00:03:18,850 --> 00:03:23,035
+具体地说 在m小于或等于n的时候
+
+128
+00:03:18,850 --> 00:03:23,035
+Concretely, if m is less than or equal to n.
+
+129
+00:03:23,035 --> 00:03:27,723
+例如 有m等于10个的训练实例
+
+130
+00:03:23,035 --> 00:03:27,723
+For example, if you imagine that you have m equals 10 training examples
+
+131
+00:03:27,723 --> 00:03:31,192
+也有n等于100的特征数量
+
+132
+00:03:27,723 --> 00:03:31,192
+and that you have n equals 100 features, then you're trying
+
+133
+00:03:31,192 --> 00:03:36,829
+要找到适合的 ( n +1 ) 维参数矢量θ 这是第
+
+134
+00:03:31,192 --> 00:03:36,829
+to fit a parameter vector theta, which is (n+1)-dimensional,
+
+135
+00:03:36,829 --> 00:03:39,308
+这将会变成一个101维的矢量
+
+136
+00:03:36,829 --> 00:03:39,308
+so it's a 101-dimensional
+
+137
+00:03:39,308 --> 00:03:43,602
+尝试从10个训练实例中找到满足101个参数的值
+
+138
+00:03:39,308 --> 00:03:43,602
+you're trying to fit a 101 parameters from just 10 training examples.
+
+139
+00:03:43,602 --> 00:03:46,899
+这工作可能会让你花上一阵子时间
+
+140
+00:03:43,602 --> 00:03:46,899
+And this turns out to sometimes work,
+
+141
+00:03:46,899 --> 00:03:49,078
+但这并不总是一个好主意
+
+142
+00:03:46,899 --> 00:03:49,078
+but to not always be a good idea.
+
+143
+00:03:49,078 --> 00:03:52,212
+因为 正如我们所看到 你只有10个例子 以适应这100或101个参数
+
+144
+00:03:49,078 --> 00:03:52,212
+Because, as we see later, you might not have enough data
+
+145
+00:03:52,212 --> 00:03:58,432
+数据还是有些少
+
+146
+00:03:52,212 --> 00:03:58,432
+if you only have 10 examples to fit 100 or 101 parameters.
+
+147
+00:03:58,432 --> 00:04:01,924
+稍后我们将看到
+
+148
+00:03:58,432 --> 00:04:01,924
+We'll see later in this course, why this might be too little data
+
+149
+00:04:01,924 --> 00:04:04,418
+如何使用小数据样本以得到这100或101个参数
+
+150
+00:04:01,924 --> 00:04:04,418
+to fit this many parameters.
+
+151
+00:04:04,418 --> 00:04:07,544
+通常 我们会使用一种叫做正则化的线性代数方法
+
+152
+00:04:04,418 --> 00:04:07,544
+But commonly, what we do then if m is less than n,
+
+153
+00:04:07,544 --> 00:04:12,513
+通过删除某些特征或者是使用某些技术
+
+154
+00:04:07,544 --> 00:04:12,513
+is to see if we can either delete some features or to use a technique
+
+155
+00:04:12,513 --> 00:04:14,689
+来解决当m比n小的时候的问题
+
+156
+00:04:12,513 --> 00:04:14,689
+called regularization,
+
+157
+00:04:14,689 --> 00:04:17,477
+这也是在本节课后面要讲到的内容
+
+158
+00:04:14,689 --> 00:04:17,477
+which is something that we will talk about a bit later in this course as well,
+
+159
+00:04:17,477 --> 00:04:21,905
+即使你有一个相对较小的训练集
+
+160
+00:04:17,477 --> 00:04:21,905
+that will kind of let you fit a lot of parameters using a lot of features
+
+161
+00:04:21,905 --> 00:04:24,117
+也可使用很多的特征来找到很多合适的参数
+
+162
+00:04:21,905 --> 00:04:24,117
+even if you have a relatively small training set.
+
+163
+00:04:24,117 --> 00:04:27,698
+有关正规化的内容将是本节之后课程的话题
+
+164
+00:04:24,117 --> 00:04:27,698
+But this regularization will be a later topic in this course.
+
+165
+00:04:27,698 --> 00:04:32,628
+总之当你发现的矩阵X'X的结果是奇异矩阵
+
+166
+00:04:27,698 --> 00:04:32,628
+But to summarize, if ever you find that Xtranspose X is singular
+
+167
+00:04:32,628 --> 00:04:35,877
+或者找到的其它矩阵是不可逆的
+
+168
+00:04:32,628 --> 00:04:35,877
+or alternatively find is non-invertible,
+
+169
+00:04:35,877 --> 00:04:38,380
+我会建议你这么做
+
+170
+00:04:35,877 --> 00:04:38,380
+what I would recommend you do is
+
+171
+00:04:38,380 --> 00:04:42,016
+首先 看特征值里是否有一些多余的特征
+
+172
+00:04:38,380 --> 00:04:42,016
+first: look at your features and see if you have redundant features
+
+173
+00:04:42,016 --> 00:04:45,304
+像这些x1和x2是线性相关的
+
+174
+00:04:42,016 --> 00:04:45,304
+like these x1 and x2 being linearly dependent,
+
+175
+00:04:45,304 --> 00:04:48,017
+或像这样 互为线性函数
+
+176
+00:04:45,304 --> 00:04:48,017
+or being a linear function of each other, like so
+
+177
+00:04:48,017 --> 00:04:49,841
+同时 当有一些多余的特征时
+
+178
+00:04:48,017 --> 00:04:49,841
+and if you do have redundant features and
+
+179
+00:04:49,841 --> 00:04:51,493
+可以删除这两个重复特征里的其中一个
+
+180
+00:04:49,841 --> 00:04:51,493
+if you just delete one of these features -
+
+181
+00:04:51,493 --> 00:04:53,724
+无须两个特征同时保留
+
+182
+00:04:51,493 --> 00:04:53,724
+you really don't need both of these features,
+
+183
+00:04:53,724 --> 00:04:55,601
+所以 发现多余的特征删除二者其一
+
+184
+00:04:53,724 --> 00:04:55,601
+so if you just delete one of these features
+
+185
+00:04:55,601 --> 00:04:58,586
+将解决不可逆性的问题
+
+186
+00:04:55,601 --> 00:04:58,586
+that will solve your non-invertibility problem
+
+187
+00:04:58,586 --> 00:05:02,655
+因此 首先应该通过观察所有特征检查是否有多余的特征
+
+188
+00:04:58,586 --> 00:05:02,655
+and, so first think through my features and check if any are redundant
+
+189
+00:05:02,655 --> 00:05:05,481
+如果有多余的就删除掉
+
+190
+00:05:02,655 --> 00:05:05,481
+and if so, then, you know, keep deleting the redundant features
+
+191
+00:05:05,481 --> 00:05:07,659
+直到他们不再是多余的为止
+
+192
+00:05:05,481 --> 00:05:07,659
+until they are no longer redundant.
+
+193
+00:05:07,659 --> 00:05:09,799
+如果特征里没有多余的
+
+194
+00:05:07,659 --> 00:05:09,799
+And if your features are non redundant,
+
+195
+00:05:09,799 --> 00:05:11,939
+我会检查是否有过多的特征
+
+196
+00:05:09,799 --> 00:05:11,939
+I would check if I might have too many features,
+
+197
+00:05:11,939 --> 00:05:13,638
+如果特征数量实在太多
+
+198
+00:05:11,939 --> 00:05:13,638
+and if that's the case I would either
+
+199
+00:05:13,638 --> 00:05:16,140
+我会删除些 用较少的特征来反映尽可能多内容
+
+200
+00:05:13,638 --> 00:05:16,140
+delete some features if I can bare to use fewer features,
+
+201
+00:05:16,140 --> 00:05:20,708
+否则我会考虑使用正规化方法
+
+202
+00:05:16,140 --> 00:05:20,708
+or else I would consider using regularization,
+
+203
+00:05:20,708 --> 00:05:22,821
+这也是我们将要谈论的话题
+
+204
+00:05:20,708 --> 00:05:22,821
+which is this topic that we will talk about later.
+
+205
+00:05:22,821 --> 00:05:27,877
+同时 这也是有关标准方程的内容
+
+206
+00:05:22,821 --> 00:05:27,877
+So, that's it for the normal equation and what it means
+
+207
+00:05:27,877 --> 00:05:31,885
+如果矩阵X'X是不可逆的
+
+208
+00:05:27,877 --> 00:05:31,885
+if the matrix Xtranspose X is non-invertible.
+
+209
+00:05:31,885 --> 00:05:35,710
+通常来说 不会出现这种情况
+
+210
+00:05:31,885 --> 00:05:35,710
+But this is a problem that hopefully you run into pretty rarely.
+
+211
+00:05:35,710 --> 00:05:40,554
+如果在Octave里
+
+212
+00:05:35,710 --> 00:05:40,554
+And if you just implement it in Octave using the pinv() function
+
+213
+00:05:40,554 --> 00:05:42,853
+可以用伪逆函数pinv ( ) 来实现
+
+214
+00:05:40,554 --> 00:05:42,853
+which is called the pseudo-inverse function
+
+215
+00:05:42,853 --> 00:05:46,700
+这种使用不同的线性代数库的方法被称为伪逆
+
+216
+00:05:42,853 --> 00:05:46,700
+so you use a different linear algebra library, that is called pseudo-inverse
+
+217
+00:05:46,700 --> 00:05:50,071
+即使X'X的结果是不可逆的
+
+218
+00:05:46,700 --> 00:05:50,071
+but that implementation should just do the right thing
+
+219
+00:05:50,071 --> 00:05:52,582
+但算法执行的流程是正确的
+
+220
+00:05:50,071 --> 00:05:52,582
+even if Xtranspose X is non-invertible
+
+221
+00:05:52,582 --> 00:05:55,198
+总之 出现不可逆矩阵的情况极少发生
+
+222
+00:05:52,582 --> 00:05:55,198
+which should happen pretty rarily anyway
+
+223
+00:05:55,198 --> 99:59:59,000
+所以在大多数实现线性回归中 出现不可逆的问题不应该过多的关注
+
+224
+00:05:55,198 --> 99:59:59,000
+so this should not be a problem for most implementations of linear regression.
+
diff --git a/srt/5 - 1 - Basic Operations (14 min).srt b/srt/5 - 1 - Basic Operations (14 min).srt
new file mode 100644
index 00000000..2022d2c9
--- /dev/null
+++ b/srt/5 - 1 - Basic Operations (14 min).srt
@@ -0,0 +1,1871 @@
+1
+00:00:00,090 --> 00:00:02,346
+You now know a bunch about machine learning.
+你现在已经掌握不少机器学习知识了
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,390 --> 00:00:03,635
+In this video, I like to
+在这段视频中
+
+3
+00:00:03,635 --> 00:00:05,448
+teach you a programing language,
+我将教你一种编程语言
+
+4
+00:00:05,470 --> 00:00:06,718
+Octave, in which you'll be
+Octave语言
+
+5
+00:00:06,760 --> 00:00:08,878
+able to very quickly implement
+你能够用它来非常迅速地
+
+6
+00:00:08,890 --> 00:00:10,259
+the the learning algorithms we've
+实现这门课中我们已经学过
+
+7
+00:00:10,259 --> 00:00:11,770
+seen already, and the learning
+或者将要学的
+
+8
+00:00:11,770 --> 00:00:14,872
+algorithms we'll see later in this course.
+机器学习算法
+
+9
+00:00:14,900 --> 00:00:16,381
+In the past, I've tried to teach machine learning
+过去我一直尝试用不同的编程语言
+
+10
+00:00:16,381 --> 00:00:19,497
+using a large variety of different programming languages
+来教授机器学习
+
+11
+00:00:19,500 --> 00:00:22,046
+including C++ Java,
+包括C++、Java、
+
+12
+00:00:22,825 --> 00:00:25,379
+Python, NumPy, and also
+Python、Numpy
+
+13
+00:00:25,379 --> 00:00:27,128
+Octave, and what I
+和 Octave
+
+14
+00:00:27,160 --> 00:00:28,783
+found was that students were able
+我发现当使用像
+
+15
+00:00:28,790 --> 00:00:30,535
+to learn the most
+Octave这样的
+
+16
+00:00:30,570 --> 00:00:32,497
+productively learn the most quickly
+高级语言时
+
+17
+00:00:32,497 --> 00:00:33,780
+and prototype your algorithms most
+学生能够更快
+
+18
+00:00:33,780 --> 00:00:35,569
+quickly using a relatively
+更好地学习
+
+19
+00:00:35,569 --> 00:00:38,262
+high level language like octave.
+并掌握这些算法
+
+20
+00:00:38,290 --> 00:00:39,798
+In fact, what I often
+事实上 在硅谷
+
+21
+00:00:39,798 --> 00:00:41,516
+see in Silicon Valley is
+我经常看到的情况是
+
+22
+00:00:41,520 --> 00:00:43,655
+that if even if you need to build.
+进行大规模的
+
+23
+00:00:43,655 --> 00:00:44,714
+If you want to build a large
+机器学习项目的人
+
+24
+00:00:44,740 --> 00:00:46,548
+scale deployment of a learning
+通常会使用的
+
+25
+00:00:46,610 --> 00:00:48,242
+algorithm, what people will often do
+程序语言
+
+26
+00:00:48,242 --> 00:00:50,637
+is prototype and the language is Octave.
+就是Octave
+
+27
+00:00:50,660 --> 00:00:52,200
+Which is a great prototyping language.
+Octave是一种很好的原始语言(prototyping language)
+
+28
+00:00:52,210 --> 00:00:55,264
+So you can sort of get your learning algorithms working quickly.
+使用Octave 你能快速地实现你的算法
+
+29
+00:00:55,270 --> 00:00:56,629
+And then only if you need
+剩下的事情 你只需要
+
+30
+00:00:56,629 --> 00:00:58,459
+to a very large scale deployment of it.
+进行大规模的资源配置
+
+31
+00:00:58,480 --> 00:01:00,362
+Only then spend your time
+你只用再花时间
+
+32
+00:01:00,362 --> 00:01:03,059
+re-implementing the algorithm
+用C++或Java这些语言
+
+33
+00:01:03,059 --> 00:01:05,150
+to C++ Java or some of the language like that.
+把算法重新实现就行了
+
+34
+00:01:05,160 --> 00:01:06,273
+Because all the lessons we've learned is
+因为我们知道
+
+35
+00:01:06,300 --> 00:01:08,679
+that a time or develop a time.
+开发项目的时间
+
+36
+00:01:08,710 --> 00:01:09,848
+That is your time.
+或者说你的时间 是很宝贵的
+
+37
+00:01:09,870 --> 00:01:13,309
+The machine learning's time is incredibly valuable.
+机器学习的时间也是很宝贵的
+
+38
+00:01:13,320 --> 00:01:15,101
+And if you can
+所以 如果你能
+
+39
+00:01:15,101 --> 00:01:17,898
+get your learning algorithms to work more quickly in Octave.
+让你的学习算法在Octave上快速的实现
+
+40
+00:01:17,898 --> 00:01:18,932
+Then overall you have a
+基本的想法实现以后
+
+41
+00:01:18,932 --> 00:01:20,697
+huge time savings by first
+再用C++或者Java去改写
+
+42
+00:01:20,720 --> 00:01:22,143
+developing the algorithms in
+这样
+
+43
+00:01:22,150 --> 00:01:23,971
+Octave, and then implementing and
+ 你就能节省出
+
+44
+00:01:23,971 --> 00:01:28,145
+maybe C++ Java, only after we have the ideas working.
+大量的时间
+
+45
+00:01:28,160 --> 00:01:30,238
+The most common prototyping language I
+据我所见 人们使用最多的
+
+46
+00:01:30,238 --> 00:01:31,538
+see people use for machine
+用于机器学习的原始语言
+
+47
+00:01:31,560 --> 00:01:34,058
+learning are: Octave, MATLAB,
+是Octave、MATLAB
+
+48
+00:01:34,070 --> 00:01:37,230
+Python, NumPy, and R.
+Python、NumPy 和 R
+
+49
+00:01:38,150 --> 00:01:40,032
+Octave is nice because open sourced.
+Octave很好 因为它是开源的
+
+50
+00:01:40,032 --> 00:01:42,660
+And MATLAB works well
+当然 MATLAB也很好
+
+51
+00:01:42,670 --> 00:01:44,656
+too, but it is expensive for
+但它不是每个人都
+
+52
+00:01:44,656 --> 00:01:45,956
+to many people.
+买得起的
+
+53
+00:01:45,960 --> 00:01:47,972
+But if you have access to a copy of MATLAB.
+但是 如果你能够使用MATLAB
+
+54
+00:01:47,988 --> 00:01:50,095
+You can also use MATLAB with this class.
+你也可以在这门课里面使用
+
+55
+00:01:50,110 --> 00:01:52,037
+If you know Python, NumPy,
+如果你会Python、NumPy
+
+56
+00:01:52,037 --> 00:01:54,853
+or if you know R. I do see some people use it.
+或者R语言 我也见过有人用 R 的
+
+57
+00:01:54,870 --> 00:01:56,353
+But, what I see is
+但是 据我所知
+
+58
+00:01:56,360 --> 00:01:57,739
+that people usually end up
+这些人不得不中途放弃了
+
+59
+00:01:57,760 --> 00:02:00,041
+developing somewhat more slowly, and
+因为这些语言在开发上比较慢
+
+60
+00:02:00,050 --> 00:02:02,121
+you know, these languages.
+而且 因为这些语言
+
+61
+00:02:02,121 --> 00:02:04,048
+Because the Python, NumPy syntax
+Python、NumPy的语法
+
+62
+00:02:04,048 --> 00:02:08,391
+is just slightly clunkier than the Octave syntax.
+相较于Octave来说 还是更麻烦一点
+
+63
+00:02:08,410 --> 00:02:09,704
+And so because of that, and
+正因为这样
+
+64
+00:02:09,704 --> 00:02:11,372
+because we are releasing starter
+也因为我们最开始
+
+65
+00:02:11,380 --> 00:02:13,039
+code in Octave.
+用Octave来写程序
+
+66
+00:02:13,039 --> 00:02:14,363
+I strongly recommend that you
+所以我强烈建议你
+
+67
+00:02:14,363 --> 00:02:18,321
+not try to do the following exercises in this class in NumPy and R.
+不要用NumPy或者R来完整这门课的作业
+
+68
+00:02:18,330 --> 00:02:19,805
+But that I do recommend that
+我建议你
+
+69
+00:02:19,805 --> 00:02:21,498
+you instead do the programming exercises
+在这门课中
+
+70
+00:02:21,520 --> 00:02:24,292
+for this class in octave instead.
+用Octave来写程序
+
+71
+00:02:24,330 --> 00:02:25,428
+What I'm going to do in
+接下来
+
+72
+00:02:25,428 --> 00:02:26,708
+this video is go through
+本视频将快速地介绍
+
+73
+00:02:26,708 --> 00:02:28,667
+a list of commands very,
+一系列的命令
+
+74
+00:02:28,667 --> 00:02:29,879
+very quickly, and its goal
+目标是迅速地展示
+
+75
+00:02:29,879 --> 00:02:31,073
+is to quickly show you the
+通过这一系列Octave的命令
+
+76
+00:02:31,080 --> 00:02:34,807
+range of commands and the range of things you can do in Octave.
+让你知道Octave能用来做什么
+
+77
+00:02:34,807 --> 00:02:36,493
+The course website will have
+我们的网站会提供
+
+78
+00:02:36,520 --> 00:02:38,965
+a transcript of everything I
+所有我在视频中提到的
+
+79
+00:02:38,965 --> 00:02:42,095
+do, and so after
+内容的文本
+
+80
+00:02:42,095 --> 00:02:43,185
+watching this video you
+所以 当你看完这个视频
+
+81
+00:02:43,185 --> 00:02:44,905
+can refer to the transcript
+想查询一些命令时
+
+82
+00:02:44,905 --> 00:02:46,635
+posted on the course website
+你可以查看这些资料
+
+83
+00:02:46,635 --> 00:02:48,247
+when you want find a command.
+这些都放在网上了
+
+84
+00:02:48,247 --> 00:02:50,226
+Concretely, what I recommend
+总之 我建议你
+
+85
+00:02:50,226 --> 00:02:53,225
+you do is first watch the tutorial videos.
+先看教学视频
+
+86
+00:02:53,230 --> 00:02:55,118
+And after watching to the
+之后
+
+87
+00:02:55,120 --> 00:02:58,728
+end, then install Octave on your computer.
+把Octave安装到电脑上
+
+88
+00:02:58,728 --> 00:02:59,738
+And finally, it goes to
+最后 去这门课的网站上
+
+89
+00:02:59,738 --> 00:03:01,769
+the course website, download the transcripts
+下载这门课的
+
+90
+00:03:01,770 --> 00:03:02,983
+of the things you see in the
+相关文档和视频
+
+91
+00:03:02,983 --> 00:03:04,915
+session, and type in
+然后 你可以试着
+
+92
+00:03:04,930 --> 00:03:07,162
+whatever commands seem interesting
+在Octave中键入一些
+
+93
+00:03:07,200 --> 00:03:09,132
+to you into Octave, so that it's
+有趣的命令
+
+94
+00:03:09,132 --> 00:03:10,602
+running on your own computer, so
+让程序运行在你的电脑上
+
+95
+00:03:10,602 --> 00:03:12,962
+you can see it run for yourself.
+这样你可以看到程序是怎么运行的
+
+96
+00:03:12,970 --> 00:03:15,535
+And with that let's get started.
+让我们开始吧
+
+97
+00:03:15,920 --> 00:03:19,363
+Here's my Windows desktop, and I'm going to start up Octave.
+这里是我的Windows桌面 启动Octave
+
+98
+00:03:19,370 --> 00:03:20,977
+And I'm now in Octave.
+现在打开Octave
+
+99
+00:03:20,977 --> 00:03:22,522
+And that's my Octave prompt.
+这是Octave命令行
+
+100
+00:03:22,522 --> 00:03:24,475
+Let me first show the elementary
+现在让我示范
+
+101
+00:03:24,475 --> 00:03:27,291
+operations you can do in Octave.
+最基本的Octave代码
+
+102
+00:03:27,330 --> 00:03:28,505
+So you type in 5 + 6.
+输入5 + 6
+
+103
+00:03:28,505 --> 00:03:30,493
+That gives you the answer of 11.
+然后得到11
+
+104
+00:03:30,493 --> 00:03:31,516
+3 - 2.
+输入3 - 2
+
+105
+00:03:31,540 --> 00:03:33,710
+5 x 8, 1/2, 2^6
+5×8、1/2、2 ^ 6
+
+106
+00:03:35,733 --> 00:03:37,747
+is 64.
+得到64
+
+107
+00:03:37,810 --> 00:03:42,361
+So those are the elementary math operations.
+这些都是基本的数学运算
+
+108
+00:03:42,390 --> 00:03:44,495
+You can also do logical operations.
+你也可以做逻辑运算
+
+109
+00:03:44,550 --> 00:03:45,929
+So one equals two.
+例如 1==2
+
+110
+00:03:45,929 --> 00:03:47,722
+This evaluates to false.
+计算结果为 false ( 假 )
+
+111
+00:03:47,722 --> 00:03:51,658
+The percent command here means a comment.
+这里的百分号命令表示注释
+
+112
+00:03:51,658 --> 00:03:53,861
+So, one equals two, evaluates to false.
+1==2 计算结果为假
+
+113
+00:03:53,861 --> 00:03:55,622
+Which is represents by zero.
+这里用0表示
+
+114
+00:03:55,650 --> 00:03:58,028
+One not equals to two.
+1 ~= 2
+
+115
+00:03:58,028 --> 00:03:59,312
+This is true.
+这是真的
+
+116
+00:03:59,312 --> 00:04:00,718
+So that returns one.
+因此返回1
+
+117
+00:04:00,718 --> 00:04:02,146
+Note that a not equal sign
+请注意 不等于符号的写法
+
+118
+00:04:02,146 --> 00:04:05,478
+is this tilde equals symbol.
+是这个波浪线加上等于符号 ( ~= )
+
+119
+00:04:05,550 --> 00:04:07,336
+And not bang equals.
+而不是等于感叹号加等号 ( != )
+
+120
+00:04:07,336 --> 00:04:09,267
+Which is what some other
+这是和其他一些
+
+121
+00:04:09,267 --> 00:04:10,878
+programming languages use.
+编程语言中不太一样的地方
+
+122
+00:04:10,910 --> 00:04:13,616
+Lets see logical operations one and zero
+让我们看看逻辑运算 1 && 0
+
+123
+00:04:13,616 --> 00:04:15,545
+use a double ampersand sign to
+使用双&符号
+
+124
+00:04:15,545 --> 00:04:17,340
+the logical AND.
+表示逻辑与
+
+125
+00:04:18,120 --> 00:04:20,188
+And that evaluates false.
+1 && 0判断为假
+
+126
+00:04:20,188 --> 00:04:23,886
+One or zero is the OR operation.
+1和0的或运算 1 || 0
+
+127
+00:04:23,900 --> 00:04:25,736
+And that evaluates to true.
+其计算结果为真
+
+128
+00:04:25,736 --> 00:04:27,131
+And I can XOR one and
+还有异或运算 如XOR ( 1, 0 )
+
+129
+00:04:27,131 --> 00:04:30,333
+zero, and that evaluates to one.
+其返回值为1
+
+130
+00:04:30,333 --> 00:04:32,928
+This thing over on the left, this Octave 324.x
+从左向右写着 Octave 324.x版本
+
+131
+00:04:32,928 --> 00:04:35,683
+equals 11, this is the default Octave prompt.
+其计算结果等于11 这是默认的Octave提示
+
+132
+00:04:35,700 --> 00:04:37,513
+It shows the, what, the
+它显示了当前Octave的版本
+
+133
+00:04:37,520 --> 00:04:39,150
+version in Octave and so on.
+以及相关的其它信息
+
+134
+00:04:39,150 --> 00:04:40,423
+If you don't want that prompt,
+如果你不想看到那个提示
+
+135
+00:04:40,450 --> 00:04:43,025
+there's a somewhat cryptic command PF
+这里有一个隐藏的命令
+
+136
+00:04:43,025 --> 00:04:44,670
+quote, greater than, greater
+输入命令
+
+137
+00:04:44,670 --> 00:04:46,602
+than and so on,
+PS('>> ');
+
+138
+00:04:46,602 --> 00:04:48,800
+that you can use to change the prompt.
+现在你看到的就是等待命令的快捷提示
+
+139
+00:04:48,810 --> 00:04:51,272
+And I guess this quote a string in the middle.
+这句话在中间有一个字符串
+
+140
+00:04:51,272 --> 00:04:53,362
+Your quote, greater than, greater than, space.
+('>> ');
+
+141
+00:04:53,400 --> 00:04:55,592
+That's what I prefer my Octave prompt to look like.
+这是我喜欢的命令行样子
+
+142
+00:04:55,592 --> 00:04:57,722
+So if I hit enter.
+这里敲一个回车
+
+143
+00:04:57,920 --> 00:04:59,763
+Oops, excuse me.
+抱歉 写错了
+
+144
+00:04:59,763 --> 00:05:00,786
+Like so.
+这样才对
+
+145
+00:05:00,786 --> 00:05:02,622
+PS1 like so.
+要写成PS1这样
+
+146
+00:05:02,622 --> 00:05:05,420
+Now my Octave prompt has changed to the greater than, greater than sign.Which,
+现在命令提示已经变得简化了
+
+147
+00:05:05,500 --> 00:05:09,263
+you know, looks quite a bit better.
+这样看起来很棒
+
+148
+00:05:09,710 --> 00:05:12,384
+Next let's talk about Octave variables.
+接下来 我们将谈到Octave的变量
+
+149
+00:05:12,384 --> 00:05:13,865
+I can take the variable
+现在写一个变量
+
+150
+00:05:13,865 --> 00:05:16,165
+A and assign it to 3.
+对变量A赋值为3
+
+151
+00:05:16,165 --> 00:05:18,421
+And hit enter.
+并按下回车键
+
+152
+00:05:18,440 --> 00:05:20,043
+And now A is equal to 3.
+显示变量A等于3
+
+153
+00:05:20,070 --> 00:05:22,861
+You want to assign a variable, but you don't want to print out the result.
+如果你想分配一个变量 但不希望在屏幕上显示结果
+
+154
+00:05:22,861 --> 00:05:26,758
+If you put a semicolon, the semicolon
+你可以在命令后加一个分号
+
+155
+00:05:26,920 --> 00:05:30,824
+suppresses the print output.
+可以抑制打印输出
+
+156
+00:05:30,824 --> 00:05:33,160
+So to do that, enter, it doesn't print anything.
+敲入回车后 不打印任何东西。
+
+157
+00:05:33,160 --> 00:05:35,399
+Whereas A equals 3.
+A等于3
+
+158
+00:05:35,420 --> 00:05:36,719
+mix it, print it out,
+只是不显示出来
+
+159
+00:05:36,719 --> 00:05:39,845
+where A equals, 3 semicolon doesn't print anything.
+其中这句命令不打印任何东西
+
+160
+00:05:39,850 --> 00:05:41,845
+I can do string assignment.
+现在举一个字符串的例子
+
+161
+00:05:41,845 --> 00:05:43,473
+B equals hi
+变量b等于"hi"
+
+162
+00:05:43,520 --> 00:05:45,047
+Now if I just
+现在
+
+163
+00:05:45,047 --> 00:05:46,072
+enter B it prints out the
+如果我输入b
+
+164
+00:05:46,072 --> 00:05:48,338
+variable B. So B is the string hi
+则会显示字符串变量b的值"hi"
+
+165
+00:05:48,370 --> 00:05:51,118
+C equals 3 greater than colon 1.
+C等于3大于等于1
+
+166
+00:05:51,130 --> 00:05:54,538
+So, now C evaluates the true.
+所以 现在C变量的值是真
+
+167
+00:05:55,710 --> 00:05:57,999
+If you want to print
+如果你想打印出变量
+
+168
+00:05:58,030 --> 00:06:00,832
+out or display a variable, here's how you go about it.
+或显示一个变量 你可以像下面这么做
+
+169
+00:06:00,832 --> 00:06:03,725
+Let me set A equals Pi.
+设置A等于圆周率π
+
+170
+00:06:03,760 --> 00:06:04,985
+And if I want to print
+如果我要打印该值
+
+171
+00:06:04,985 --> 00:06:08,545
+A I can just type A like so, and it will print it out.
+那么只需键入A 像这样 就打印出来了
+
+172
+00:06:08,545 --> 00:06:10,344
+For more complex printing there is
+对于更复杂的屏幕输出
+
+173
+00:06:10,344 --> 00:06:13,674
+also the DISP command which stands for Display.
+也可以用DISP命令显示
+
+174
+00:06:13,710 --> 00:06:15,858
+Display A just prints out A like so.
+Disp( A )就相当于像这样打印出A
+
+175
+00:06:15,890 --> 00:06:18,337
+You can also display strings
+你也可以用该命令来显示字符串
+
+176
+00:06:18,350 --> 00:06:21,392
+so: DISP, sprintf, two
+输入disp sprintf
+
+177
+00:06:21,460 --> 00:06:24,990
+decimals, percent 0.2,
+小数 0.2%
+
+178
+00:06:25,260 --> 00:06:28,273
+F, comma, A. Like so.
+逗号 A 像这样
+
+179
+00:06:28,273 --> 00:06:29,863
+And this will print out the string.
+通过这条命令将打印出字符串
+
+180
+00:06:29,863 --> 00:06:31,722
+Two decimals, colon, 3.14.
+打印显示为“两位小数:3.14”
+
+181
+00:06:31,722 --> 00:06:33,651
+This is kind of
+这是一种
+
+182
+00:06:33,670 --> 00:06:35,993
+an old style C syntax.
+旧风格的C语言语法
+
+183
+00:06:35,993 --> 00:06:37,404
+For those of you that
+对于之前
+
+184
+00:06:37,420 --> 00:06:39,073
+have programmed C before, this is
+就学过C语言的同学来说
+
+185
+00:06:39,073 --> 00:06:41,378
+essentially the syntax you use to print screen.
+你可以使用这种基本的语法来将结果打印到屏幕
+
+186
+00:06:41,380 --> 00:06:44,498
+So the Sprintf generates a
+Sprintf命令生成一个字符串
+
+187
+00:06:44,510 --> 00:06:46,021
+string that is less
+不仅仅是
+
+188
+00:06:46,021 --> 00:06:48,274
+than the 2 decimals, 3.1 plus string.
+字符串“2 decimal:3.14”
+
+189
+00:06:48,290 --> 00:06:50,644
+This percent 0.2 F means
+其中的“0.2%F”表示
+
+190
+00:06:50,644 --> 00:06:52,475
+substitute A into here,
+代替A放在这里
+
+191
+00:06:52,475 --> 00:06:55,926
+showing the two digits after the decimal points.
+并显示A值的小数点后两位数字
+
+192
+00:06:55,926 --> 00:06:58,104
+And DISP takes the string
+同时DISP 命令对字符串做出操作
+
+193
+00:06:58,130 --> 00:07:00,691
+DISP generates it by the Sprintf command.
+DISP命令输出
+
+194
+00:07:00,691 --> 00:07:01,683
+Sprintf.
+Sprintf产生的字符串
+
+195
+00:07:01,683 --> 00:07:03,091
+The Sprintf command.
+Sprintf命令
+
+196
+00:07:03,091 --> 00:07:05,835
+And DISP actually displays the string.
+和DISP命令显示字符串
+
+197
+00:07:05,870 --> 00:07:07,020
+And to show you another
+再说一个细节
+
+198
+00:07:07,020 --> 00:07:11,360
+example, Sprintf six decimals
+例如 sprintf命令的六个小数
+
+199
+00:07:11,361 --> 00:07:14,551
+percent 0.6 F comma A.
+0.6%F ,A
+
+200
+00:07:14,930 --> 00:07:17,075
+And, this should print Pi
+这应该打印π
+
+201
+00:07:17,090 --> 00:07:21,100
+with six decimal places.
+的6位小数形式
+
+202
+00:07:22,060 --> 00:07:25,728
+Finally, I was saying, a like so, looks like this. There
+最后 看起来像这样
+
+203
+00:07:25,740 --> 00:07:28,633
+are useful shortcuts that type type formats long.
+也有一些控制输出长短格式的快捷命令
+
+204
+00:07:28,633 --> 00:07:31,759
+It causes strings by default.
+默认情况下 是字符串
+
+205
+00:07:31,760 --> 00:07:33,748
+Be displayed to a lot more decimal places.
+显示出的小数位有点多
+
+206
+00:07:33,748 --> 00:07:35,593
+And format short is a
+短 ( short ) 格式
+
+207
+00:07:35,593 --> 00:07:37,095
+command that restores the default
+是默认的输出格式
+
+208
+00:07:37,120 --> 00:07:40,113
+of just printing a small number of digits.
+只是打印小数数位的第一位
+
+209
+00:07:40,600 --> 00:07:43,934
+Okay, that's how you work with variables.
+相关这方面的内容还需要你继续练习
+
+210
+00:07:43,934 --> 00:07:47,047
+Now let's look at vectors and matrices.
+下面 让我们来看看向量和矩阵
+
+211
+00:07:47,070 --> 00:07:49,274
+Let's say I want to assign MAT A to the matrix.
+比方说 建立一个矩阵A
+
+212
+00:07:49,280 --> 00:07:50,974
+Let me show you an example: 1, 2,
+输入1 2
+
+213
+00:07:50,980 --> 00:07:54,593
+semicolon, 3, 4, semicolon, 5, 6.
+; 3 4 ; 5 6
+
+214
+00:07:54,600 --> 00:07:56,235
+This generates a three by
+这会产生一个
+
+215
+00:07:56,240 --> 00:07:58,572
+two matrix A whose first
+三行两列的矩阵A
+
+216
+00:07:58,580 --> 00:07:59,818
+row is 1, 2. Second row
+其第一行是1 2
+
+217
+00:07:59,820 --> 00:08:02,030
+3, 4. Third row is 5, 6.
+第二行是3 4 第三行是5 6
+
+218
+00:08:02,030 --> 00:08:04,385
+What the semicolon does is
+分号的作用
+
+219
+00:08:04,390 --> 00:08:05,818
+essentially say, go to
+从本质上来说
+
+220
+00:08:05,820 --> 00:08:07,915
+the next row of the matrix.
+就是在矩阵内换行到下一行
+
+221
+00:08:07,915 --> 00:08:09,016
+There are other ways to type this in.
+此外 还有其他的方法来建立矩阵A
+
+222
+00:08:09,016 --> 00:08:11,536
+Type A 1, 2 semicolon
+输入A矩阵的值 1 2 分号
+
+223
+00:08:11,536 --> 00:08:15,046
+3, 4, semicolon, 5, 6, like so.
+3 4 分号 5 6
+
+224
+00:08:15,046 --> 00:08:17,038
+And that's another equivalent way of
+这是另一种方法
+
+225
+00:08:17,038 --> 00:08:18,576
+assigning A to be
+对A矩阵进行赋值
+
+226
+00:08:18,576 --> 00:08:22,183
+the values of this three by two matrix.
+考虑到这是一个三行两列的矩阵
+
+227
+00:08:22,200 --> 00:08:23,568
+Similarly you can assign vectors.
+你同样可以用向量
+
+228
+00:08:23,568 --> 00:08:25,532
+So V equals 1, 2, 3.
+建立向量V并赋值1 2 3
+
+229
+00:08:25,560 --> 00:08:27,359
+This is actually a row vector.
+V是一个行向量
+
+230
+00:08:27,359 --> 00:08:29,915
+Or this is a 3 by 1 vector.
+或者说是一个3 ( 列 )×1 ( 行 ) 的向量
+
+231
+00:08:29,940 --> 00:08:32,016
+Where that is a fat Y vector,
+一个胖胖的Y向量
+
+232
+00:08:32,030 --> 00:08:34,375
+excuse me, not, this is
+或者说
+
+233
+00:08:34,380 --> 00:08:37,998
+a 1 by 3 matrix, right.
+一行三列的矩阵
+
+234
+00:08:37,998 --> 00:08:39,256
+Not 3 by 1.
+注意不是三行一列
+
+235
+00:08:39,256 --> 00:08:41,015
+If I want to assign
+如果我想
+
+236
+00:08:41,015 --> 00:08:43,975
+this to a column vector,
+分配一个列向量
+
+237
+00:08:43,975 --> 00:08:48,778
+what I would do instead is do v 1;2;3.
+我可以写“1;2;3”
+
+238
+00:08:48,830 --> 00:08:50,030
+And this will give me a 3 by 1.
+现在便有了一个
+
+239
+00:08:50,100 --> 00:08:51,797
+There's a 1 by 3 vector.
+3 行 1 列 的向量
+
+240
+00:08:51,797 --> 00:08:55,892
+So this will be a column vector.
+同时这是一个列向量
+
+241
+00:08:56,250 --> 00:08:57,968
+Here's some more useful notation.
+下面是一些更为有用的符号
+
+242
+00:08:57,968 --> 00:09:02,343
+V equals 1: 0.1: 2.
+V等于1:0.1:2
+
+243
+00:09:02,343 --> 00:09:03,598
+What this does is
+这个该如何理解呢
+
+244
+00:09:03,620 --> 00:09:05,716
+it sets V to the bunch
+这个集合V是一组值
+
+245
+00:09:05,716 --> 00:09:08,714
+of elements that start from 1.
+从数值1开始
+
+246
+00:09:08,714 --> 00:09:10,392
+And increments and steps
+增量或说是步长为0.1
+
+247
+00:09:10,410 --> 00:09:13,657
+of 0.1 until you get up to 2.
+直到增加到2
+
+248
+00:09:13,660 --> 00:09:19,168
+So if I do this, V is going to be this, you know, row vector.
+按照这样的方法对向量V操作 可以得到一个行向量
+
+249
+00:09:19,168 --> 00:09:23,022
+This is what one by eleven matrix really.
+这是一个1行11列的矩阵
+
+250
+00:09:23,022 --> 00:09:23,739
+That's 1, 1.1, 1.2, 1.3 and
+其矩阵的元素是1 1.1 1.2 1.3
+
+251
+00:09:23,739 --> 00:09:26,921
+so on until we
+依此类推
+
+252
+00:09:27,630 --> 00:09:30,141
+get up to two.
+直到数值2
+
+253
+00:09:31,440 --> 00:09:33,269
+Now, and I can also
+现在 我也可以
+
+254
+00:09:33,269 --> 00:09:35,049
+set V equals one colon six,
+建立一个集合V并用命令“1:6”进行赋值
+
+255
+00:09:35,060 --> 00:09:38,270
+and that sets V to be these numbers.
+这样V就被赋值了
+
+256
+00:09:38,270 --> 00:09:41,291
+1 through 6, okay.
+1至6的六个整数
+
+257
+00:09:41,620 --> 00:09:44,254
+Now here are some other ways to generate matrices.
+这里还有一些其他的方法来生成矩阵
+
+258
+00:09:44,254 --> 00:09:47,426
+Ones 2.3 is a command
+例如“ones(2, 3)”
+
+259
+00:09:47,426 --> 00:09:49,134
+that generates a matrix that
+也可以用来生成矩阵
+
+260
+00:09:49,140 --> 00:09:50,790
+is a two by three matrix
+其结果为一个两行三列的矩阵
+
+261
+00:09:50,790 --> 00:09:52,712
+that is the matrix of all ones.
+不过矩阵中的所有元素都为1
+
+262
+00:09:52,712 --> 00:09:53,991
+So if I set that c2
+当我想生成一个
+
+263
+00:09:54,000 --> 00:09:56,845
+times ones two by
+元素都为2
+
+264
+00:09:56,845 --> 00:09:59,798
+three this generates a
+两行三列的矩阵
+
+265
+00:09:59,798 --> 00:10:03,061
+two by three matrix that is all two's.
+就可以使用这个命令
+
+266
+00:10:03,080 --> 00:10:04,258
+You can think of this as a
+你可以把这个方法当成一个
+
+267
+00:10:04,258 --> 00:10:05,513
+shorter way of writing this and
+生成矩阵的快速方法
+
+268
+00:10:05,550 --> 00:10:06,943
+c2,2,2's and you can
+当你想生成一个三维2×2×2的矩阵时
+
+269
+00:10:06,943 --> 00:10:10,951
+call them 2,2,2, which would also give you the same result.
+你就可以用这个“ones”命令
+
+270
+00:10:11,450 --> 00:10:13,910
+Let's say W equals one's, one
+比方说
+
+271
+00:10:13,920 --> 00:10:15,485
+by three, so this is
+w是一个有三个1的
+
+272
+00:10:15,485 --> 00:10:17,937
+going to be a row vector
+行向量
+
+273
+00:10:17,940 --> 00:10:20,998
+or a row of
+或者说一行
+
+274
+00:10:20,998 --> 00:10:23,853
+three one's and similarly
+由三个同样的1组成的向量
+
+275
+00:10:23,853 --> 00:10:25,463
+you can also say w equals
+你也可以说
+
+276
+00:10:25,463 --> 00:10:27,469
+zeroes, one by
+w为一个
+
+277
+00:10:27,469 --> 00:10:30,209
+three, and this generates a matrix.
+一行三列的零矩阵
+
+278
+00:10:30,220 --> 00:10:34,732
+A one by three matrix of all zeros.
+一行三列的A矩阵里的元素全部是零
+
+279
+00:10:34,732 --> 00:10:36,910
+Just a couple more ways to generate matrices .
+还有很多的方式来生成矩阵
+
+280
+00:10:36,930 --> 00:10:39,175
+If I do W equals
+如果我对W进行赋值
+
+281
+00:10:39,175 --> 00:10:41,512
+Rand one by three,
+用Rand命令建立一个一行三列的矩阵
+
+282
+00:10:41,520 --> 00:10:43,050
+this gives me a one
+因为使用了Rand命令
+
+283
+00:10:43,050 --> 00:10:45,370
+by three matrix of all random numbers.
+则其一行三列的元素均为随机值
+
+284
+00:10:45,372 --> 00:10:47,118
+If I do Rand
+如果我使用
+
+285
+00:10:47,215 --> 00:10:49,008
+three by three.
+“rand(3, 3)”命令
+
+286
+00:10:49,050 --> 00:10:50,417
+This gives me a three by
+这就生成了一个
+
+287
+00:10:50,417 --> 00:10:51,918
+three matrix of all
+3×3的矩阵
+
+288
+00:10:51,930 --> 00:10:54,009
+random numbers drawn from the
+并且其所有元素均为随机
+
+289
+00:10:54,009 --> 00:10:55,830
+uniform distribution between zero and one.
+数值介于0和1之间
+
+290
+00:10:55,830 --> 00:10:56,937
+So every time I do
+所以
+
+291
+00:10:56,937 --> 00:10:58,608
+this, I get a different
+正是因为这一点
+
+292
+00:10:58,608 --> 00:11:00,510
+set of random numbers drawn
+我们可以得到
+
+293
+00:11:00,540 --> 00:11:02,573
+uniformly between zero and one.
+数值均匀介于0和1之间的元素
+
+294
+00:11:02,573 --> 00:11:03,718
+For those of you that
+如果
+
+295
+00:11:03,718 --> 00:11:05,375
+know what a Gaussian random variable
+你知道什么是高斯随机变量
+
+296
+00:11:05,410 --> 00:11:06,275
+is or for those of you that
+或者
+
+297
+00:11:06,275 --> 00:11:07,659
+know what a normal random variable
+你知道什么是正态分布的随机变量
+
+298
+00:11:07,660 --> 00:11:09,112
+is, you can also set W
+你可以设置集合W
+
+299
+00:11:09,112 --> 00:11:11,956
+equals Rand N, one by three.
+使其等于一个一行三列的N矩阵
+
+300
+00:11:11,990 --> 00:11:13,565
+And so these are going
+并且
+
+301
+00:11:13,570 --> 00:11:15,435
+to be three values drawn from
+来自三个值
+
+302
+00:11:15,435 --> 00:11:17,798
+a Gaussian distribution with mean
+一个平均值为0的高斯分布
+
+303
+00:11:17,798 --> 00:11:19,266
+zero and variance or
+方差
+
+304
+00:11:19,266 --> 00:11:21,642
+standard deviation equal to one.
+或者等于1的标准偏差
+
+305
+00:11:21,642 --> 00:11:23,148
+And you can set more complex
+还可以设置地更复杂
+
+306
+00:11:23,150 --> 00:11:24,698
+things like W equals minus
+例如
+
+307
+00:11:24,698 --> 00:11:26,194
+six, plus the square root
+W减去6 再加上10的平方
+
+308
+00:11:26,210 --> 00:11:28,656
+ten, times, lets say
+两者相乘
+
+309
+00:11:28,660 --> 00:11:31,978
+Rand N, one by ten thousand.
+Rand命令生成一个1行10000列的矩阵
+
+310
+00:11:31,978 --> 00:11:33,106
+And I'm going to put a semicolon at
+把分号放到末尾
+
+311
+00:11:33,106 --> 00:11:35,623
+the end because I don't really want this printed out.
+这样结果就打印不出来
+
+312
+00:11:35,623 --> 00:11:37,599
+This is going to be a what?
+那这样会得到什么呢
+
+313
+00:11:37,599 --> 00:11:38,905
+Well, it's going to
+这样就可以
+
+314
+00:11:38,910 --> 00:11:40,582
+be a vector of, with
+得到
+
+315
+00:11:40,610 --> 00:11:44,481
+a hundred thousand, excuse me, ten thousand elements.
+一个有10000元素的向量
+
+316
+00:11:44,490 --> 00:11:47,596
+So, well, actually, you know what?
+想知道具体是多少
+
+317
+00:11:47,596 --> 00:11:48,373
+Let's print it out.
+我们也可把它打印出来
+
+318
+00:11:48,373 --> 00:11:51,570
+So this will generate a matrix like this.
+这将产生一个这样的矩阵
+
+319
+00:11:51,570 --> 00:11:52,408
+Right?
+看
+
+320
+00:11:52,408 --> 00:11:53,978
+With 10,000 elements.
+这就是一个
+
+321
+00:11:53,978 --> 00:11:55,835
+So that's what W is.
+有着10000个元素的矩阵W
+
+322
+00:11:55,835 --> 00:11:57,392
+And if I now
+如果我现在
+
+323
+00:11:57,392 --> 00:11:59,442
+plot a histogram of W
+用绘制直方图命令
+
+324
+00:11:59,442 --> 00:12:01,818
+with a hist command, I can
+绘制出一个直方图
+
+325
+00:12:01,820 --> 00:12:04,752
+now. And Octave's print hist
+使用Octave的
+
+326
+00:12:04,752 --> 00:12:06,130
+command, you know, takes a
+打印直方图命令
+
+327
+00:12:06,130 --> 00:12:07,297
+couple seconds to bring this up,
+你只需要数秒钟就可以将它绘制出来
+
+328
+00:12:07,297 --> 00:12:08,965
+but this is a histogram of
+这是一个对随机变量W
+
+329
+00:12:08,970 --> 00:12:10,646
+my random variable for W.
+绘制出的直方图
+
+330
+00:12:10,650 --> 00:12:12,732
+There was minus 6 plus zero
+这里是-6+0
+
+331
+00:12:12,732 --> 00:12:15,537
+ten times this Gaussian random variable.
+乘上十倍的高斯随机变量
+
+332
+00:12:15,537 --> 00:12:17,537
+And I can plot a histogram with
+这样 可以绘制出一个
+
+333
+00:12:17,560 --> 00:12:21,032
+more buckets, with more bins, with say, 50 bins.
+有着更多条的 乃至50个条的直方图来
+
+334
+00:12:21,032 --> 00:12:22,578
+And this is my
+这样 就有一个
+
+335
+00:12:22,578 --> 00:12:25,735
+histogram of a Gaussian with mean minus 6.
+均值减去6的高斯直方图
+
+336
+00:12:25,735 --> 00:12:27,285
+Because I have a minus
+因为这里是
+
+337
+00:12:27,285 --> 00:12:29,208
+6 there plus square root 10 times this.
+-6加10的平方根并与这项相乘
+
+338
+00:12:29,230 --> 00:12:32,952
+So the variance of
+因此
+
+339
+00:12:32,952 --> 00:12:34,961
+this Gaussian random variable
+这个高斯随机变量的方差
+
+340
+00:12:34,961 --> 00:12:36,696
+is 10 on the standard deviation is
+是10
+
+341
+00:12:36,700 --> 00:12:38,935
+square root of 10, which is about what?
+且其标准偏差为10的平方根
+
+342
+00:12:38,950 --> 00:12:41,063
+Three point one.
+3.1
+
+343
+00:12:41,780 --> 00:12:43,857
+Finally, one special command
+最后 说一个生成矩阵的
+
+344
+00:12:43,857 --> 00:12:46,208
+for generator matrix, which is the I command.
+特殊命令I
+
+345
+00:12:46,208 --> 00:12:48,394
+So I stands for this
+其实
+
+346
+00:12:48,394 --> 00:12:51,028
+is maybe a pun on the word identity.
+I也可说是一个双关语字标识
+
+347
+00:12:51,050 --> 00:12:52,650
+It's server set eye 4.
+设置一个4阶单位矩阵
+
+348
+00:12:52,720 --> 00:12:56,004
+This is the 4 by 4 identity matrix.
+这是一个4×4矩阵
+
+349
+00:12:56,004 --> 00:12:57,681
+So I equals eye 4.
+所以I为“eye(4)”
+
+350
+00:12:57,681 --> 00:13:00,458
+This gives me a 4 by 4 identity matrix.
+通过上面的命令得到4×4矩阵
+
+351
+00:13:00,458 --> 00:13:04,475
+And I equals eye 5, eye 6.
+I可以等于5阶单位阵 6阶单位阵
+
+352
+00:13:04,475 --> 00:13:05,611
+That gives me a 6 by
+那么就有
+
+353
+00:13:05,611 --> 00:13:08,089
+6 identity matrix, i3
+6阶单位阵
+
+354
+00:13:08,120 --> 00:13:09,134
+is the 3 by 3 identity matrix.
+eye( 3 )是一个3阶方阵
+
+355
+00:13:09,134 --> 00:13:12,064
+Lastly, to
+在本节视频的最后
+
+356
+00:13:12,064 --> 00:13:14,263
+wrap up this video, there's one more useful command.
+还有一个比较有用的命令
+
+357
+00:13:14,280 --> 00:13:15,479
+Which is the help command.
+那就是帮助命令
+
+358
+00:13:15,479 --> 00:13:17,454
+So you can type help i and
+例如 你可以键入help i
+
+359
+00:13:17,454 --> 00:13:21,181
+this brings up the help function for the identity matrix.
+它就会将矩阵的相关信息显示出来
+
+360
+00:13:21,190 --> 00:13:22,803
+Hit Q to quit.
+命令Q可以退出Octave
+
+361
+00:13:22,803 --> 00:13:25,375
+And you can also type help rand.
+你也可以键入help rand
+
+362
+00:13:25,380 --> 00:13:27,793
+Brings up documentation for the rand or the
+将会显示出有关rand函数的相关帮助文档
+
+363
+00:13:27,793 --> 00:13:29,734
+random number generation function.
+以及相关的随机数生成函数
+
+364
+00:13:29,734 --> 00:13:31,898
+Or even help help, which
+甚至可以使用命令help help
+
+365
+00:13:31,900 --> 00:13:35,615
+shows you, you know help on the help function.
+将会显示出help命令的使用方法
+
+366
+00:13:36,455 --> 00:13:39,022
+So, those are the
+以上讲解的内容
+
+367
+00:13:39,022 --> 00:13:41,612
+basic operations in Octave.
+都是Octave的基本操作
+
+368
+00:13:41,612 --> 00:13:42,699
+And with this you should be
+希望你能通过上面的讲解
+
+369
+00:13:42,699 --> 00:13:47,131
+able to generate a few matrices, multiply, add things.
+自己练习一些矩阵、乘、加等操作
+
+370
+00:13:47,131 --> 00:13:50,553
+And use the basic operations in Octave.
+将这些操作在Octave中熟练
+
+371
+00:13:50,560 --> 00:13:51,893
+In the next video, I'd like
+在接下来的视频中
+
+372
+00:13:51,920 --> 00:13:53,818
+to start talking about more
+将会涉及
+
+373
+00:13:53,818 --> 00:13:55,700
+sophisticated commands and how
+更多复杂的命令
+
+374
+00:13:55,750 --> 00:13:59,180
+to use data around and start to process data in Octave.
+并使用它们在Octave中对数据进行更多的操作
+
diff --git a/srt/5 - 2 - Moving Data Around (16 min).srt b/srt/5 - 2 - Moving Data Around (16 min).srt
new file mode 100644
index 00000000..694bd084
--- /dev/null
+++ b/srt/5 - 2 - Moving Data Around (16 min).srt
@@ -0,0 +1,2136 @@
+1
+00:00:00,111 --> 00:00:02,628
+In this second tutorial video on
+在第二段关于 Octave的
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,630 --> 00:00:03,904
+Octave, I'd like to start
+辅导课视频中 我将开始介绍
+
+3
+00:00:03,930 --> 00:00:07,322
+to tell you how to move data around in Octave.
+如何在 Octave 中移动数据
+
+4
+00:00:07,340 --> 00:00:08,783
+So, if you have data for
+具体来说
+
+5
+00:00:08,783 --> 00:00:12,125
+a machine learning problem, how do you load that data in Octave?
+如果你有一个机器学习问题 你怎样把数据加载到 Octave 中?
+
+6
+00:00:12,125 --> 00:00:13,693
+How do you put it into matrix?
+怎样把数据存入一个矩阵?
+
+7
+00:00:13,693 --> 00:00:15,284
+How do you manipulate these matrices?
+如何对矩阵进行相乘?
+
+8
+00:00:15,290 --> 00:00:16,982
+How do you save the results?
+如何保存计算结果?
+
+9
+00:00:17,000 --> 00:00:22,185
+How do you move data around and operate with data?
+如何移动这些数据 并用数据进行操作?
+
+10
+00:00:22,900 --> 00:00:25,044
+Here's my Octave window as
+和之前一样 这是我的 Octave 窗口
+
+11
+00:00:25,044 --> 00:00:29,256
+before, picking up from where we left off in the last video.
+我们继续沿用上次的窗口
+
+12
+00:00:29,290 --> 00:00:31,132
+If I type A, that's
+我键入 A
+
+13
+00:00:31,140 --> 00:00:32,258
+the matrix so we generate it, right,
+得到我们之前构建的矩阵 A
+
+14
+00:00:32,258 --> 00:00:35,197
+with this command equals one, two,
+也就是用这个命令生成的
+
+15
+00:00:35,197 --> 00:00:38,152
+three, four, five, six, and
+A = [1 2; 3 4; 5 6]
+
+16
+00:00:38,190 --> 00:00:40,696
+this is a three by two matrix.
+这是一个三行二列的矩阵
+
+17
+00:00:40,710 --> 00:00:42,415
+The size command in Octave
+Octave 中的 size() 命令
+
+18
+00:00:42,430 --> 00:00:46,361
+lets you, tells you what is the size of a matrix.
+返回矩阵的尺寸
+
+19
+00:00:46,361 --> 00:00:48,207
+So size A returns three, two.
+所以 size(A) 命令返回3 2
+
+20
+00:00:48,207 --> 00:00:50,160
+It turns out that
+实际上
+
+21
+00:00:50,180 --> 00:00:52,155
+this size command itself is actually
+ size() 命令返回的
+
+22
+00:00:52,155 --> 00:00:54,591
+returning a one by two matrix.
+是一个 1×2 的矩阵
+
+23
+00:00:54,591 --> 00:00:56,598
+So you can actually set SZ equals
+我们可以用 sz 来存放
+
+24
+00:00:56,598 --> 00:00:58,370
+size of A and SZ
+设置 sz = size(A)
+
+25
+00:00:58,380 --> 00:00:59,597
+is now a one by two
+因此 sz 就是一个1×2的矩阵
+
+26
+00:00:59,597 --> 00:01:01,627
+matrix where the first element
+第一个元素是3
+
+27
+00:01:01,640 --> 00:01:04,689
+of this is three, and the second element of this is two.
+第二个元素是2
+
+28
+00:01:04,700 --> 00:01:07,494
+So, if you just type size of SZ. Does SZ
+所以如果键入 size(sz) 看看 sz 的尺寸
+
+29
+00:01:07,494 --> 00:01:08,898
+is a one by
+返回的是1 2
+
+30
+00:01:08,898 --> 00:01:10,862
+two matrix whose two elements
+表示是一个1×2的矩阵
+
+31
+00:01:10,862 --> 00:01:13,721
+contain the dimensions of the
+1 和 2 分别表示
+
+32
+00:01:13,721 --> 00:01:15,279
+matrix A. You can
+矩阵 A 的维度 (此处口误 应为 sz 的维度 译者注)
+
+33
+00:01:15,279 --> 00:01:17,787
+also type size A one
+你也可以键入 size(A, 1)
+
+34
+00:01:17,787 --> 00:01:19,505
+to give you back the first
+这个命令会返回
+
+35
+00:01:19,510 --> 00:01:21,542
+dimension of A, size
+ A 矩阵的第一个元素
+
+36
+00:01:21,542 --> 00:01:22,662
+of the first dimension of A.
+A 矩阵的第一个维度的尺寸
+
+37
+00:01:22,680 --> 00:01:24,108
+So that's the number
+也就是 A 矩阵的行数
+
+38
+00:01:24,110 --> 00:01:26,307
+of rows and size A two
+同样 命令 size(A, 2)
+
+39
+00:01:26,320 --> 00:01:28,361
+to give you back two, which
+将返回2
+
+40
+00:01:28,361 --> 00:01:29,598
+is the number of columns in
+也就是 A 矩阵的列数
+
+41
+00:01:29,598 --> 00:01:31,942
+the matrix A. If you
+也就是 A 矩阵的列数
+
+42
+00:01:31,950 --> 00:01:34,034
+have a vector V, so
+如果你有一个向量 v
+
+43
+00:01:34,034 --> 00:01:36,016
+let's say V equals one, two,
+假如 v = [1 2 3 4]
+
+44
+00:01:36,030 --> 00:01:38,089
+three, four, and you
+假如 v = [1 2 3 4]
+
+45
+00:01:38,089 --> 00:01:40,830
+type length V. What
+然后键入 length(v)
+
+46
+00:01:40,830 --> 00:01:42,097
+this does is it gives you
+这个命令将返回
+
+47
+00:01:42,097 --> 00:01:44,123
+the size of the longest dimension.
+最大维度的大小
+
+48
+00:01:44,170 --> 00:01:45,609
+So you can also type
+你也可以键入 length(A)
+
+49
+00:01:45,609 --> 00:01:48,487
+length A and because
+由于矩阵 A
+
+50
+00:01:48,500 --> 00:01:49,856
+A is a three by
+是一个3×2的矩阵
+
+51
+00:01:49,860 --> 00:01:52,305
+two matrix, the longer
+因此最大的维度
+
+52
+00:01:52,330 --> 00:01:53,825
+dimension is of size
+应该是3
+
+53
+00:01:53,825 --> 00:01:56,145
+three, so this should print out three.
+因此该命令会返回3
+
+54
+00:01:56,145 --> 00:01:58,805
+But usually we apply length only to vectors.
+但通常我们还是对向量使用 length 命令
+
+55
+00:01:58,810 --> 00:02:00,194
+So you know, length one, two,
+比如 length([1;2;3;4;5])
+
+56
+00:02:00,200 --> 00:02:02,222
+three, four, five, rather
+比如 length([1;2;3;4;5])
+
+57
+00:02:02,230 --> 00:02:04,010
+than apply length to matrices
+而不是对矩阵使用 length 命令
+
+58
+00:02:04,010 --> 00:02:07,205
+because that's a little more confusing.
+因为毕竟有点容易让人弄混
+
+59
+00:02:07,620 --> 00:02:10,122
+Now, let's look
+下面让我们来看看
+
+60
+00:02:10,122 --> 00:02:11,843
+at how the load data and
+如何在系统中
+
+61
+00:02:11,860 --> 00:02:13,732
+find data on the file system.
+加载数据和寻找数据
+
+62
+00:02:13,732 --> 00:02:15,254
+When we start an Octave
+当我们打开 Octave 时
+
+63
+00:02:15,254 --> 00:02:16,882
+we're usually, we're often in
+我们通常已经在一个
+
+64
+00:02:16,920 --> 00:02:19,098
+a path that
+默认路径中
+
+65
+00:02:19,098 --> 00:02:21,738
+is, you know, the location of where the Octave location is.
+这个路径是 Octave 的安装位置
+
+66
+00:02:21,750 --> 00:02:24,042
+So the PWD command shows
+pwd 命令可以显示出
+
+67
+00:02:24,060 --> 00:02:25,619
+the current directory, or the
+Octave 当前所处路径
+
+68
+00:02:25,640 --> 00:02:28,738
+current path that Octave is in.
+Octave 当前所处路径
+
+69
+00:02:28,738 --> 00:02:31,932
+So right now we're in this maybe somewhat off scale directory.
+所以现在我们就在这个目录下
+
+70
+00:02:31,932 --> 00:02:33,999
+The CD command stands
+cd 命令
+
+71
+00:02:34,000 --> 00:02:35,322
+for change directory, so I
+意思是改变路径
+
+72
+00:02:35,330 --> 00:02:40,681
+can go to C:/Users/Ang/Desktop, and
+我可以把路径改为C:\Users\ang\Desktop
+
+73
+00:02:40,681 --> 00:02:43,657
+now I'm in, you know, in my Desktop
+这样当前目录就变为了桌面
+
+74
+00:02:43,657 --> 00:02:45,925
+and if I type ls,
+如果键入 ls
+
+75
+00:02:45,925 --> 00:02:49,447
+ls is, it comes from a Unix or a Linux command.
+ls 来自于一个 Unix 或者 Linux 命令
+
+76
+00:02:49,447 --> 00:02:50,648
+But, ls will list the
+ls 命令将列出
+
+77
+00:02:50,648 --> 00:02:52,435
+directories on my desktop and
+我桌面上的所有路径
+
+78
+00:02:52,435 --> 00:02:54,137
+so these are the files
+因此这些就是
+
+79
+00:02:54,140 --> 00:02:58,184
+that are on my Desktop right now.
+我桌面上的所有文件了
+
+80
+00:03:15,850 --> 00:03:17,838
+In fact, on my desktop are
+事实上 我的桌面上
+
+81
+00:03:17,838 --> 00:03:19,920
+two files: Features X and
+有两个文件
+
+82
+00:03:19,920 --> 00:03:21,689
+Price Y that's maybe come
+featuresX.dat 和 priceY.dat
+
+83
+00:03:21,689 --> 00:03:23,596
+from a machine learning problem I want to solve.
+是两个我想解决的机器学习问题
+
+84
+00:03:23,620 --> 00:03:25,830
+So, here's my desktop.
+这是我的桌面
+
+85
+00:03:25,830 --> 00:03:29,144
+Here's Features X, and
+这是 featuresX 文件
+
+86
+00:03:29,144 --> 00:03:31,598
+Features X is this window,
+featuresX 文件如这个窗口所示
+
+87
+00:03:31,630 --> 00:03:34,492
+excuse me, is this file with two columns of data.
+是一个含有两列数据的文件
+
+88
+00:03:34,492 --> 00:03:36,702
+This is actually my housing prices data.
+这其实就是我的房屋价格数据
+
+89
+00:03:36,750 --> 00:03:38,374
+So I think, you know, I
+我想应该是
+
+90
+00:03:38,374 --> 00:03:40,652
+think I have forty-seven rows in this data set.
+数据集中有47行
+
+91
+00:03:40,652 --> 00:03:42,344
+And so the first house
+第一个房子样本
+
+92
+00:03:42,350 --> 00:03:43,966
+has size two hundred four
+面积是2104平方英尺
+
+93
+00:03:43,970 --> 00:03:46,172
+square feet, has three bedrooms; second
+有3个卧室
+
+94
+00:03:46,190 --> 00:03:47,367
+house has sixteen hundred square
+第二套房子面积为1600
+
+95
+00:03:47,367 --> 00:03:49,862
+feet, has three bedrooms; and so on.
+有3个卧室 等等
+
+96
+00:03:49,880 --> 00:03:52,302
+And Price Y is this
+priceY 是这个文件
+
+97
+00:03:52,302 --> 00:03:55,020
+file that has
+也就是
+
+98
+00:03:55,040 --> 00:03:57,575
+the prices of the data in my training set.
+训练集中的价格数据
+
+99
+00:03:57,575 --> 00:03:59,735
+So, Features X and
+所以 featuresX 和
+
+100
+00:03:59,735 --> 00:04:03,061
+Price Y are just text files with my data.
+priceY 就是两个存放数据的文档
+
+101
+00:04:03,061 --> 00:04:04,770
+How do I load this data into Octave?
+那么应该怎样把数据读入 Octave 呢?
+
+102
+00:04:04,770 --> 00:04:06,050
+Well, I just type
+好的 我们只需要键
+
+103
+00:04:06,090 --> 00:04:08,163
+the command load Features X dot
+键入 featuresX.dat
+
+104
+00:04:08,163 --> 00:04:10,069
+dat and if I
+这样
+
+105
+00:04:10,069 --> 00:04:11,991
+do that, I load the Features X
+我将加载了 featuresX 文件
+
+106
+00:04:11,991 --> 00:04:15,772
+and can load Price Y dot dat. And
+同样地我可以加载 priceY.dat
+
+107
+00:04:15,772 --> 00:04:17,323
+by the way, there are multiple ways to do this.
+其实有好多种办法可以完成
+
+108
+00:04:17,323 --> 00:04:19,245
+This command if you put
+如果你把命令写成
+
+109
+00:04:19,245 --> 00:04:20,916
+Features X dot dat on that
+字符串的形式 load('featureX.dat')
+
+110
+00:04:20,916 --> 00:04:22,533
+in strings and load it like so.
+也是可以的
+
+111
+00:04:22,550 --> 00:04:25,477
+This is a typo there.
+这里打错了
+
+112
+00:04:25,490 --> 00:04:27,317
+This is an equivalent command.
+这跟刚才的命令效果是相同的
+
+113
+00:04:27,317 --> 00:04:29,334
+So you can, this
+只不过是把文件名
+
+114
+00:04:29,360 --> 00:04:31,985
+way I'm just putting the file name of the string
+写成了一个字符串的形式
+
+115
+00:04:32,000 --> 00:04:34,148
+in the founding in a
+现在文件名被存在一个
+
+116
+00:04:34,148 --> 00:04:35,716
+string and in an
+字符串中
+
+117
+00:04:35,716 --> 00:04:38,902
+Octave use single quotes to
+Octave 中使用引号
+
+118
+00:04:38,930 --> 00:04:41,876
+represent strings, like so.
+来表示字符串 就像这样
+
+119
+00:04:41,910 --> 00:04:42,837
+So that's a string, and we
+这就是一个字符串
+
+120
+00:04:42,860 --> 00:04:45,517
+can load the file
+因此我们读取的文件
+
+121
+00:04:45,517 --> 00:04:48,324
+whose name is given by that string.
+文件名由这个字符串给出
+
+122
+00:04:48,324 --> 00:04:50,919
+Now the WHO command now
+另外 who 命令
+
+123
+00:04:50,960 --> 00:04:52,538
+shows me what variables I
+能显示出 在我的 Octave
+
+124
+00:04:52,538 --> 00:04:54,605
+have in my Octave workspace.
+工作空间中的所有变量
+
+125
+00:04:54,605 --> 00:04:56,310
+So Who shows me whether
+因此 who 命令显示出
+
+126
+00:04:56,330 --> 00:04:59,952
+the variables that Octave has in memory currently.
+当前 Octave 储存的变量
+
+127
+00:04:59,952 --> 00:05:01,367
+Features X and Price Y
+包括 featureX 和 priceY
+
+128
+00:05:01,370 --> 00:05:02,991
+are among them, as well as
+同样还包括
+
+129
+00:05:02,991 --> 00:05:04,120
+the variables that, you know,
+在此之前你创建的
+
+130
+00:05:04,170 --> 00:05:06,311
+we created earlier in this session.
+那些变量
+
+131
+00:05:06,311 --> 00:05:09,198
+So I can type Features X
+所以我可以键入
+
+132
+00:05:09,198 --> 00:05:11,062
+to display features X. And
+featuresX 回车 来显示 featuresX
+
+133
+00:05:11,062 --> 00:05:14,164
+there's my data.
+这些就是存在里面的数据
+
+134
+00:05:14,200 --> 00:05:16,419
+And I can type size features
+还可以键入 size(featuresX)
+
+135
+00:05:16,419 --> 00:05:18,022
+X and that's my
+得出的结果是 47 2
+
+136
+00:05:18,022 --> 00:05:20,519
+47 by two matrix.
+代表这是一个47×2的矩阵
+
+137
+00:05:20,519 --> 00:05:22,307
+And some of these size, press
+类似地
+
+138
+00:05:22,320 --> 00:05:23,729
+Y, that gives me
+输入 size(priceY)
+
+139
+00:05:23,729 --> 00:05:26,753
+my 47 by one vector.
+结果是 47 1
+
+140
+00:05:26,753 --> 00:05:30,125
+This is a 47 dimensional vector.
+表示这是一个47维的向量
+
+141
+00:05:30,125 --> 00:05:32,080
+This is all common vector that
+是一个列矩阵
+
+142
+00:05:32,080 --> 00:05:35,231
+has all the prices Y in my training set.
+存放的是训练集中的所有价格 Y 的值
+
+143
+00:05:35,240 --> 00:05:37,584
+Now the who function shows
+who 函数能让你看到
+
+144
+00:05:37,600 --> 00:05:40,086
+you one of the variables that, in the current workspace.
+当前工作空间中的所有变量
+
+145
+00:05:40,086 --> 00:05:42,195
+There's also the who S
+同样还有另一个 whos 命令
+
+146
+00:05:42,195 --> 00:05:45,369
+variable that gives you the detailed view.
+能更详细地进行查看
+
+147
+00:05:45,369 --> 00:05:47,252
+And so this also, with
+因此
+
+148
+00:05:47,270 --> 00:05:48,574
+an S at the end this also
+在 who 后面加一个 s
+
+149
+00:05:48,574 --> 00:05:49,979
+lists my variables except that it
+同样也列出我所有的变量
+
+150
+00:05:49,979 --> 00:05:51,782
+now lists the sizes as well.
+不仅如此 还列出了变量的维度
+
+151
+00:05:51,790 --> 00:05:52,759
+So A is a three by
+我们看到 A 是一个
+
+152
+00:05:52,759 --> 00:05:54,764
+two matrix and features
+3×2的矩阵
+
+153
+00:05:54,764 --> 00:05:56,545
+X as a 47 by 2 matrix.
+X 是一个47×2的矩阵
+
+154
+00:05:56,545 --> 00:05:59,327
+Price Y is a 47 by one matrix.
+priceY 是一个47×1的矩阵
+
+155
+00:05:59,327 --> 00:06:01,098
+Meaning this is just a vector.
+也就是一个向量
+
+156
+00:06:01,130 --> 00:06:03,438
+And it shows, you know, how many bytes of memory it's taking up.
+同时还显示出 需要占用多少内存空间
+
+157
+00:06:03,438 --> 00:06:06,020
+As well as what type of data this is.
+以及数据类型是什么
+
+158
+00:06:06,020 --> 00:06:07,765
+Double means double position floating
+double 意思是双精度浮点型
+
+159
+00:06:07,765 --> 00:06:08,915
+point so that just means that
+这也就是说
+
+160
+00:06:08,915 --> 00:06:13,148
+these are real values, the floating point numbers.
+这些数都是实数 是浮点数
+
+161
+00:06:13,148 --> 00:06:14,190
+Now if you want to get
+如果你想删除某个变量
+
+162
+00:06:14,190 --> 00:06:17,316
+rid of a variable you can use the clear command.
+你可以使用 clear 命令
+
+163
+00:06:17,340 --> 00:06:21,124
+So clear features X and type whose again.
+因此 我们键入 clear featuresX
+
+164
+00:06:21,130 --> 00:06:23,448
+You notice that the features X
+然后再输入 whos 命令
+
+165
+00:06:23,448 --> 00:06:26,465
+variable has now disappeared.
+你会发现 featuresX 消失了
+
+166
+00:06:27,270 --> 00:06:28,567
+And how do we save data?
+另外 我们怎么储存数据呢?
+
+167
+00:06:28,567 --> 00:06:29,221
+Let's see.
+我们来看
+
+168
+00:06:29,221 --> 00:06:30,411
+Let's take the variable V and
+我们设变量 v
+
+169
+00:06:30,411 --> 00:06:33,075
+say that it's a price Y 1 colon 10.
+为 priceY(1:10)
+
+170
+00:06:33,075 --> 00:06:34,826
+This sets V to be
+这表示的是将向量 Y 的
+
+171
+00:06:34,826 --> 00:06:38,574
+the first 10 elements of
+前10个元素存入 v 中
+
+172
+00:06:38,860 --> 00:06:43,215
+vector Y. So let's type who or whose.
+我们输入 who 或者 whos
+
+173
+00:06:43,220 --> 00:06:46,612
+Whereas Y was a 47 by 1 vector.
+Y 是一个47×1的向量
+
+174
+00:06:46,612 --> 00:06:48,474
+V is now 10 by 1.
+因此现在 v 就是10×1的向量
+
+175
+00:06:48,474 --> 00:06:50,809
+B equals price Y, one
+因为刚才设置了
+
+176
+00:06:50,809 --> 00:06:52,451
+column ten that sets it
+v = priceY(1:10)
+
+177
+00:06:52,451 --> 00:06:53,520
+to the just the first ten
+这便将 v 的值
+
+178
+00:06:53,520 --> 00:06:55,705
+elements of Y. Let's say
+设为了 Y 的前十个元素
+
+179
+00:06:55,705 --> 00:06:57,398
+I wanna save this to date to disc
+假如我们想把它存入硬盘
+
+180
+00:06:57,398 --> 00:07:00,129
+the command save, hello.mat
+那么用 save hello.mat v 命令
+
+181
+00:07:00,129 --> 00:07:02,302
+V. This will
+这个命令
+
+182
+00:07:02,310 --> 00:07:04,357
+save the variable V into
+会将变量 v
+
+183
+00:07:04,370 --> 00:07:05,690
+a file called hello.mat.
+存成一个叫 hello.mat 的文件
+
+184
+00:07:05,720 --> 00:07:08,490
+So let's do that.
+让我们回车
+
+185
+00:07:08,640 --> 00:07:10,965
+And now a file
+现在我的桌面上
+
+186
+00:07:11,030 --> 00:07:13,181
+has appeared on my Desktop, you
+就出现了一个新文件
+
+187
+00:07:13,181 --> 00:07:15,066
+know, called Hello.mat.
+名为 hello.mat
+
+188
+00:07:15,066 --> 00:07:16,509
+I happen to have MATLAB installed
+由于我的电脑里
+
+189
+00:07:16,530 --> 00:07:17,962
+in this window, which is why,
+也同时安装了 MATLAB
+
+190
+00:07:17,962 --> 00:07:19,711
+you know, this icon looks
+所以这个图标
+
+191
+00:07:19,711 --> 00:07:21,621
+like this because Windows is recognized
+上面有 MATLAB 的标识
+
+192
+00:07:21,621 --> 00:07:23,559
+as it's a MATLAB file,but don't
+因为操作系统把文件识别为 MATLAB 文件
+
+193
+00:07:23,559 --> 00:07:24,882
+worry about it if this file
+所以如果在你的电脑上
+
+194
+00:07:24,890 --> 00:07:26,051
+looks like it has a different
+图标显示的不一样的话
+
+195
+00:07:26,051 --> 00:07:28,778
+icon on your machine and
+也没有关系
+
+196
+00:07:28,778 --> 00:07:31,017
+let's say I clear all my variables.
+现在我们清除所有变量
+
+197
+00:07:31,020 --> 00:07:32,602
+So, if you type clear without
+直接键入 clear
+
+198
+00:07:32,602 --> 00:07:36,061
+anything then this actually deletes all of the variables in your workspace.
+这样将删除工作空间中的所有变量
+
+199
+00:07:36,080 --> 00:07:39,078
+So there's now nothing left in the workspace.
+所以现在工作空间中啥都没了
+
+200
+00:07:39,078 --> 00:07:41,856
+And if I load hello.mat,
+但如果我载入 hello.mat 文件
+
+201
+00:07:41,856 --> 00:07:44,388
+I can now load back my
+我又重新读取了变量 v
+
+202
+00:07:44,388 --> 00:07:46,054
+variable v, which is
+因为我之前
+
+203
+00:07:46,054 --> 00:07:47,830
+the data that I
+把变量 v存入了
+
+204
+00:07:47,830 --> 00:07:51,035
+previously saved into the hello.mat file.
+ hello.mat 文件中
+
+205
+00:07:51,035 --> 00:07:54,636
+So, hello.mat, what we did just now to save hello.mat
+所以我们刚才用 save 命令做了什么
+
+206
+00:07:54,636 --> 00:07:55,877
+to view, this save the
+这个命令把数据
+
+207
+00:07:55,877 --> 00:07:57,811
+data in a binary format,
+按照二进制形式储存
+
+208
+00:07:57,850 --> 00:07:59,702
+a somewhat more compressed binary format.
+或者说是更压缩的二进制形式
+
+209
+00:07:59,702 --> 00:08:01,077
+So if v is a lot
+因此 如果 v 是很大的数据
+
+210
+00:08:01,077 --> 00:08:03,899
+of data, this, you know, will be somewhat more compressing.
+那么压缩幅度也更大
+
+211
+00:08:03,899 --> 00:08:05,645
+Will take off less the space.
+占用空间也更小
+
+212
+00:08:05,650 --> 00:08:06,784
+If you want to save your
+如果你想把数据
+
+213
+00:08:06,784 --> 00:08:08,959
+data in a human readable
+存成一个人能看懂的形式
+
+214
+00:08:08,959 --> 00:08:11,870
+format then you type save hello.text
+那么可以键入
+
+215
+00:08:11,870 --> 00:08:14,055
+the variable v and then -ascii.
+save hello.txt v -ascii
+
+216
+00:08:14,110 --> 00:08:16,083
+So, this will save
+这样就会把数据
+
+217
+00:08:16,083 --> 00:08:18,787
+it as a text
+存成一个文本文档
+
+218
+00:08:18,840 --> 00:08:21,352
+or as ascii format of text.
+或者将数据的 ascii 码存成文本文档
+
+219
+00:08:21,352 --> 00:08:22,802
+And now, once I've done
+现在 我键入了这个命令以后
+
+220
+00:08:22,802 --> 00:08:24,973
+that, I have this file.
+我的桌面上
+
+221
+00:08:24,973 --> 00:08:26,115
+Hello.text has just
+就有了 hello.txt 文件
+
+222
+00:08:26,130 --> 00:08:28,463
+appeared on my desktop, and
+就有了 hello.txt 文件
+
+223
+00:08:28,463 --> 00:08:29,951
+if I open this up, we
+如果打开它
+
+224
+00:08:29,951 --> 00:08:31,016
+see that this is a text
+我们可以发现
+
+225
+00:08:31,016 --> 00:08:33,958
+file with my data saved away.
+这个文本文档存放着我们的数据
+
+226
+00:08:33,958 --> 00:08:36,698
+So that's how you load and save data.
+这就是读取和储存数据的方法
+
+227
+00:08:36,698 --> 00:08:38,832
+Now let's talk a bit about how to manipulate data.
+接下来我们再来讲讲操作数据的方法
+
+228
+00:08:38,832 --> 00:08:40,526
+Let's set a equals to that
+假如 A 还是那个矩阵
+
+229
+00:08:40,526 --> 00:08:44,910
+matrix again so is my three by two matrix.
+跟刚才一样还是那个 3×2 的矩阵
+
+230
+00:08:45,710 --> 00:08:46,778
+So as indexing.
+现在我们加上索引值
+
+231
+00:08:46,778 --> 00:08:48,493
+So type A 3, 2.
+比如键入 A(3,2)
+
+232
+00:08:48,493 --> 00:08:51,219
+This indexes into
+这将索引到
+
+233
+00:08:51,219 --> 00:08:52,917
+the 3, 2 elements of
+A 矩阵的 (3,2) 元素
+
+234
+00:08:52,917 --> 00:08:54,308
+the matrix A. So, this
+A 矩阵的 (3,2) 元素
+
+235
+00:08:54,370 --> 00:08:56,320
+is what, you know,
+这就是我们通常
+
+236
+00:08:56,400 --> 00:08:57,488
+in normally, we will write this
+书写矩阵的形式
+
+237
+00:08:57,510 --> 00:09:00,421
+as a subscript 3, 2
+写成 A 下标32
+
+238
+00:09:00,430 --> 00:09:02,280
+or A subscript,
+下标32
+
+239
+00:09:03,570 --> 00:09:05,320
+you know, 3, 2
+3和2分别表示
+
+240
+00:09:05,380 --> 00:09:07,028
+and so that's the element and
+矩阵的第三行
+
+241
+00:09:07,028 --> 00:09:08,664
+third row and second column
+和第二列对应的元素
+
+242
+00:09:08,664 --> 00:09:11,539
+of A which is the element of six.
+因此也就对应 6
+
+243
+00:09:11,590 --> 00:09:13,820
+I can also type A to
+我也可以键入
+
+244
+00:09:14,550 --> 00:09:16,770
+comma colon to fetch
+A(2,:) 来返回
+
+245
+00:09:16,770 --> 00:09:18,851
+everything in the second row.
+第二列的所有元素
+
+246
+00:09:18,851 --> 00:09:22,806
+So, the colon means every
+因此 冒号表示
+
+247
+00:09:22,810 --> 00:09:27,381
+element along that row or column.
+该行或该列的所有元素
+
+248
+00:09:27,420 --> 00:09:29,274
+So, a of 2 comma
+因此 A(2,:)
+
+249
+00:09:29,274 --> 00:09:32,425
+colon is this second row of a. Right.
+表示 A 矩阵的第二行的所有元素
+
+250
+00:09:32,470 --> 00:09:35,662
+And similarly, if I do a colon comma 2
+类似地 如果我键入 A(:,2)
+
+251
+00:09:35,680 --> 00:09:38,262
+then this means get everything in
+这将返回 A 矩阵第二列的所有元素
+
+252
+00:09:38,262 --> 00:09:41,328
+the second column of A. So, this gives me 2 4 6.
+这将得到 2 4 6
+
+253
+00:09:41,328 --> 00:09:42,921
+Right this means of
+这表示返回
+
+254
+00:09:42,930 --> 00:09:45,467
+A. everything, second column.
+A 矩阵的第二列的所有元素
+
+255
+00:09:45,500 --> 00:09:46,967
+So, this is my second
+因此这就是
+
+256
+00:09:46,970 --> 00:09:49,636
+column A, which is 2 4 6.
+矩阵 A 的第二列 就是 2 4 6
+
+257
+00:09:49,650 --> 00:09:51,267
+Now, you can also
+你也可以在运算中
+
+258
+00:09:51,280 --> 00:09:54,148
+use somewhat most of the sophisticated index in the operations.
+使用这些较为复杂的索引
+
+259
+00:09:54,148 --> 00:09:56,575
+So So, we just click each of an example.
+我再给你展示几个例子
+
+260
+00:09:56,575 --> 00:09:58,537
+You do this maybe less often,
+可能你也不会经常使用
+
+261
+00:09:58,550 --> 00:10:02,231
+but let me do this A 1 3 comma colon.
+但我还是输入给你看 A([1 3],:)
+
+262
+00:10:02,231 --> 00:10:03,471
+This means get all of
+这个命令意思是
+
+263
+00:10:03,500 --> 00:10:07,444
+the elements of A who's first indexes one or three.
+取 A 矩阵第一个索引值为1或3的元素
+
+264
+00:10:07,450 --> 00:10:08,765
+This means I get everything from
+也就是说我取的是
+
+265
+00:10:08,765 --> 00:10:10,588
+the first and third rows of
+A 矩阵的第一行和
+
+266
+00:10:10,603 --> 00:10:12,780
+A and from all
+第三行的每一列
+
+267
+00:10:13,240 --> 00:10:13,240
+columns.
+第三行的每一列
+
+268
+00:10:14,163 --> 00:10:16,430
+So, this was the
+这是 A 矩阵
+
+269
+00:10:16,800 --> 00:10:18,260
+matrix A and so A
+因此
+
+270
+00:10:18,440 --> 00:10:21,872
+1 3 comma colon means get
+输入 A([1 3], :)
+
+271
+00:10:21,900 --> 00:10:23,222
+everything from the first row
+返回第一行
+
+272
+00:10:23,250 --> 00:10:25,023
+and from the second row and
+返回第三行
+
+273
+00:10:25,023 --> 00:10:27,172
+from the third row and the
+冒号表示的是
+
+274
+00:10:27,172 --> 00:10:28,313
+colon means, you know, one both
+取这两行的每一列元素
+
+275
+00:10:28,313 --> 00:10:29,585
+of first and the second
+也就是第一行
+
+276
+00:10:29,585 --> 00:10:31,045
+columns and so this
+和第二行的所有元素(此处口误 应为第三行 译者注)
+
+277
+00:10:31,045 --> 00:10:32,842
+gives me this 1 2 5 6.
+因此返回结果为 1 2 5 6
+
+278
+00:10:32,842 --> 00:10:34,353
+Although, you use the source
+可能这些比较复杂一点的
+
+279
+00:10:34,353 --> 00:10:37,182
+of more subscript index
+索引操作
+
+280
+00:10:37,182 --> 00:10:39,819
+operations maybe somewhat less often.
+你不会经常用到
+
+281
+00:10:40,210 --> 00:10:41,453
+To show you what else we can do.
+我们还能做什么呢
+
+282
+00:10:41,453 --> 00:10:43,617
+Here's the A matrix and this
+这依然是 A 矩阵
+
+283
+00:10:43,617 --> 00:10:47,276
+source A colon, to give me the second column.
+A(:,2) 命令返回第二列
+
+284
+00:10:47,276 --> 00:10:49,773
+You can also use this to do assignments.
+你也可以为它赋值
+
+285
+00:10:49,773 --> 00:10:51,178
+So I can take the second column of
+所以我可以取 A 矩阵的第二列
+
+286
+00:10:51,190 --> 00:10:52,949
+A and assign that to
+然后将它赋值为
+
+287
+00:10:52,950 --> 00:10:55,605
+10, 11, 12, and
+10 11 12
+
+288
+00:10:55,670 --> 00:10:58,084
+if I do that I'm now, you
+如果我这样做的话
+
+289
+00:10:58,120 --> 00:10:59,220
+know, taking the second column of
+我实际上是取出了 A 的第二列
+
+290
+00:10:59,290 --> 00:11:02,768
+a and I'm assigning this column vector 10, 11, 12 to it.
+然后把一个列向量[10;11;12]赋给了它
+
+291
+00:11:02,768 --> 00:11:05,440
+So, now a is this matrix that's 1, 3, 5.
+因此现在 A 矩阵的第一列还是 1 3 5
+
+292
+00:11:05,480 --> 00:11:08,760
+And the second column has been replaced by 10, 11, 12.
+第二列就被替换为 10 11 12
+
+293
+00:11:08,760 --> 00:11:14,513
+And here's another operation.
+接下来一个操作
+
+294
+00:11:14,680 --> 00:11:15,917
+Let's set A to be equal
+让我们把 A 设为
+
+295
+00:11:15,917 --> 00:11:17,738
+to A comma 100, 101,
+A = [A, [100, 101, 102]]
+
+296
+00:11:17,750 --> 00:11:21,605
+102 like so and what
+这样做的结果是
+
+297
+00:11:21,605 --> 00:11:24,109
+this will do is
+在原矩阵的右边
+
+298
+00:11:24,120 --> 00:11:28,025
+depend another column vector
+附加了一个新的列矩阵
+
+299
+00:11:28,047 --> 00:11:29,855
+to the right.
+附加了一个新的列矩阵
+
+300
+00:11:29,890 --> 00:11:33,230
+So, now, oops.
+现在 见证奇迹的时刻...
+
+301
+00:11:33,260 --> 00:11:36,798
+I think I made a little mistake.
+噢 我又犯错了
+
+302
+00:11:36,800 --> 00:11:41,065
+Should have put semicolons there
+应该放分号的
+
+303
+00:11:41,700 --> 00:11:43,910
+and now A is equals to this.
+现在 A 矩阵就是这样了
+
+304
+00:11:43,910 --> 00:11:44,564
+Okay?
+对吧?
+
+305
+00:11:44,564 --> 00:11:45,479
+I hope that makes sense.
+我希望你听懂了
+
+306
+00:11:45,479 --> 00:11:46,480
+So this 100, 101, 102.
+所以 [100;101;102]
+
+307
+00:11:46,480 --> 00:11:48,804
+This is a column vector
+这是个列矩阵
+
+308
+00:11:48,820 --> 00:11:51,668
+and what we did
+而我们所做的
+
+309
+00:11:51,668 --> 00:11:53,386
+was we set A, take
+就是把 A 矩阵设置为
+
+310
+00:11:53,386 --> 00:11:56,156
+A and set it to the original definition.
+原来的 A 矩阵
+
+311
+00:11:56,156 --> 00:11:57,368
+And then we put that column
+再在右边附上一个
+
+312
+00:11:57,380 --> 00:11:59,192
+vector to the right
+新添加的列矩阵
+
+313
+00:11:59,192 --> 00:12:00,217
+and so, we ended up taking
+我们的原矩阵 A
+
+314
+00:12:00,217 --> 00:12:04,288
+the matrix A and--which was
+就是右边这个6个元素
+
+315
+00:12:04,288 --> 00:12:05,405
+these six elements on the left.
+就是右边这个6个元素
+
+316
+00:12:05,405 --> 00:12:06,785
+So we took matrix
+所以我们就是把 A 矩阵
+
+317
+00:12:06,810 --> 00:12:08,564
+A and we appended another
+右边加上了一个
+
+318
+00:12:08,564 --> 00:12:09,793
+column vector to the right;
+新的列向量
+
+319
+00:12:09,793 --> 00:12:11,814
+which is now why A is
+所以现在 A 矩阵
+
+320
+00:12:11,814 --> 00:12:16,083
+a three by three matrix that looks like that.
+变成这样一个 3×3 的矩阵
+
+321
+00:12:16,200 --> 00:12:18,005
+And finally, one neat
+最后 还有一个小技巧
+
+322
+00:12:18,010 --> 00:12:19,802
+trick that I sometimes use
+我也经常使用
+
+323
+00:12:19,810 --> 00:12:22,022
+if you do just a and just a colon like so.
+如果你就输入 A(:)
+
+324
+00:12:22,022 --> 00:12:25,585
+This is a somewhat special case syntax.
+这是一个很特别的语法结构
+
+325
+00:12:25,590 --> 00:12:28,695
+What this means is that put all elements with A into
+意思是把 A 中的所有元素
+
+326
+00:12:28,695 --> 00:12:30,751
+a single column vector
+放入一个单独的列向量
+
+327
+00:12:30,850 --> 00:12:34,513
+and this gives me a 9 by 1 vector.
+这样我们就得到了一个 9×1 的向量
+
+328
+00:12:34,513 --> 00:12:38,584
+They adjust the other ones are combined together.
+这些元素都是 A 中的元素排列起来的
+
+329
+00:12:39,700 --> 00:12:45,258
+Just a couple more examples. Let's see. Let's
+再来几个例子好了
+
+330
+00:12:45,300 --> 00:12:52,073
+say I set A to be equal to 123456, okay?
+我还是把 A 重新设为 [1 2; 3 4; 5 6]
+
+331
+00:12:52,181 --> 00:12:54,035
+And let's say
+假如说
+
+332
+00:12:54,060 --> 00:12:55,674
+I set a B to B
+我再设一个 B
+
+333
+00:12:55,680 --> 00:12:58,984
+equal to 11, 12, 13, 14, 15, 16.
+为[11 12; 13 14; 15 16]
+
+334
+00:12:58,984 --> 00:13:00,346
+I can create a new
+我可以新建一个矩阵 C
+
+335
+00:13:00,346 --> 00:13:03,161
+matrix C as A B.
+C = [A B]
+
+336
+00:13:03,200 --> 00:13:05,010
+This just means my
+这个意思就是
+
+337
+00:13:05,080 --> 00:13:06,666
+Matrix A. Here's my Matrix
+这是我的矩阵 A
+
+338
+00:13:06,666 --> 00:13:08,426
+B and I've set C
+这是我的矩阵 B
+
+339
+00:13:08,426 --> 00:13:11,053
+to be equal to AB.
+我设 C = [A B]
+
+340
+00:13:11,070 --> 00:13:12,225
+What I'm doing is I'm taking
+这样做的结果就是
+
+341
+00:13:12,225 --> 00:13:15,438
+these two matrices and just concatenating onto each other.
+把这两个矩阵直接连在一起
+
+342
+00:13:15,438 --> 00:13:18,408
+So the left, matrix A on the left.
+矩阵 A 在左边
+
+343
+00:13:18,420 --> 00:13:20,786
+And I have the matrix B on the right.
+矩阵 B 在右边
+
+344
+00:13:20,800 --> 00:13:23,738
+And that's how I formed
+这样组成了 C 矩阵
+
+345
+00:13:23,830 --> 00:13:27,145
+this matrix C by putting them together.
+就是直接把 A 和 B 合起来
+
+346
+00:13:27,145 --> 00:13:28,927
+I can also do C equals
+我还可以设
+
+347
+00:13:28,927 --> 00:13:31,975
+A semicolon B. The semi
+C = [A; B]
+
+348
+00:13:32,000 --> 00:13:35,552
+colon notation means that
+这里的分号表示
+
+349
+00:13:35,552 --> 00:13:38,881
+I go put the next thing at the bottom.
+把分号后面的东西放到下面
+
+350
+00:13:38,881 --> 00:13:39,880
+So, I'll do is a
+所以
+
+351
+00:13:39,910 --> 00:13:41,169
+equals semicolon B. It also
+[A; B]的作用
+
+352
+00:13:41,170 --> 00:13:42,408
+puts the matrices A
+依然还是把两个矩阵
+
+353
+00:13:42,460 --> 00:13:44,048
+and B together except that it
+放在一起
+
+354
+00:13:44,048 --> 00:13:46,408
+now puts them on top of each other.
+只不过现在是上下排列
+
+355
+00:13:46,408 --> 00:13:49,675
+so now I have A on top and B at the bottom and C here
+所以现在 A 在上面 B 在下面
+
+356
+00:13:49,675 --> 00:13:52,038
+is now in 6 by 2 matrix.
+C 就是一个 6×2 矩阵
+
+357
+00:13:52,038 --> 00:13:54,263
+So, just say the semicolon
+简单地说
+
+358
+00:13:54,270 --> 00:13:56,705
+thing usually means, you know, go to the next line.
+分号的意思就是换到下一行
+
+359
+00:13:56,705 --> 00:13:58,463
+So, C is comprised by a
+所以 C 就包括上面的 A
+
+360
+00:13:58,463 --> 00:13:59,598
+and then go to the bottom
+然后换行到下面
+
+361
+00:13:59,598 --> 00:14:00,610
+of that and then put b
+然后在下面放上一个 B
+
+362
+00:14:00,690 --> 00:14:02,320
+in the bottom and by the
+另外顺便说一下
+
+363
+00:14:02,390 --> 00:14:04,225
+way, this A B is
+这个[A B]命令
+
+364
+00:14:04,225 --> 00:14:05,734
+the same as A, B and
+跟 [A, B] 是一样的
+
+365
+00:14:05,750 --> 00:14:09,106
+so you know, either of these gives you the same result.
+这两种写法的结果是相同的
+
+366
+00:14:10,310 --> 00:14:11,916
+So, with that, hopefully you
+好了 通过以上这些操作
+
+367
+00:14:11,916 --> 00:14:14,256
+now know how to construct
+希望你现在掌握了
+
+368
+00:14:14,260 --> 00:14:17,207
+matrices and hopefully starts
+怎样构建矩阵
+
+369
+00:14:17,207 --> 00:14:18,223
+to show you some of the
+也希望我展示的这些命令
+
+370
+00:14:18,223 --> 00:14:19,822
+commands that you use
+能让你很快地学会
+
+371
+00:14:19,850 --> 00:14:21,913
+to quickly put together matrices and
+怎样把矩阵放到一起
+
+372
+00:14:21,940 --> 00:14:23,390
+take matrices and, you know,
+怎样取出矩阵
+
+373
+00:14:23,390 --> 00:14:24,984
+slam them together to form
+并且把它们放到一起
+
+374
+00:14:25,000 --> 00:14:27,009
+bigger matrices, and with
+组成更大的矩阵
+
+375
+00:14:27,009 --> 00:14:28,962
+just a few lines of code, Octave
+通过几句简单的代码
+
+376
+00:14:28,962 --> 00:14:30,770
+is very convenient in terms
+Octave 能够很方便地
+
+377
+00:14:30,770 --> 00:14:32,683
+of how quickly we can assemble
+很快速地帮助我们
+
+378
+00:14:32,683 --> 00:14:36,033
+complex matrices and move data around.
+组合复杂的矩阵以及对数据进行移动
+
+379
+00:14:36,050 --> 00:14:38,027
+So that's it for moving data around.
+这就是移动数据这一节课
+
+380
+00:14:38,027 --> 00:14:39,347
+In the next video we'll start
+在下一段视频中
+
+381
+00:14:39,347 --> 00:14:40,783
+to talk about how to actually
+我们将一起来谈谈
+
+382
+00:14:40,860 --> 00:14:46,232
+do complex computations on this, on our data.
+怎样利用数据进行更为复杂的计算
+
+383
+00:14:46,830 --> 00:14:48,256
+So, hopefully that gives you
+希望这节课的内容
+
+384
+00:14:48,256 --> 00:14:49,961
+a sense of how, with
+能让你明白
+
+385
+00:14:49,961 --> 00:14:51,049
+just a few commands, you can
+在 Octave 中 怎样用几句简单的命令
+
+386
+00:14:51,049 --> 00:14:54,573
+very quickly move data around in Octave.
+很快地对数据进行移动
+
+387
+00:14:54,590 --> 00:14:56,164
+You know, you load and save vectors and
+包括加载和储存一个向量
+
+388
+00:14:56,180 --> 00:14:58,059
+matrices, load and save data,
+或矩阵 加载和存储数据
+
+389
+00:14:58,090 --> 00:15:00,201
+put together matrices to create
+把矩阵放在一起
+
+390
+00:15:00,201 --> 00:15:02,990
+bigger matrices, index into or select
+构建更大的矩阵
+
+391
+00:15:02,990 --> 00:15:05,021
+specific elements on the matrices.
+用索引对矩阵某个特定元素进行操作等等
+
+392
+00:15:05,021 --> 00:15:06,015
+I know I went through a lot
+我知道可能我一下子
+
+393
+00:15:06,015 --> 00:15:06,944
+of commands, so I think
+讲了很多命令
+
+394
+00:15:06,980 --> 00:15:08,244
+the best thing for you to do
+所以我认为对你来讲
+
+395
+00:15:08,244 --> 00:15:09,741
+is afterward, to look
+最好的学习方法是
+
+396
+00:15:09,741 --> 00:15:12,248
+at the transcript of the things I was typing.
+下课后复习一下我键入的这些代码
+
+397
+00:15:12,248 --> 00:15:13,286
+You know, look at it.
+好好地看一看
+
+398
+00:15:13,286 --> 00:15:14,661
+Look at the coursework site and download
+从课程的网上
+
+399
+00:15:14,661 --> 00:15:15,927
+the transcript of the session
+把代码的副本下载下来
+
+400
+00:15:15,950 --> 00:15:17,479
+from there and look through
+重新好好看看这些副本
+
+401
+00:15:17,479 --> 00:15:18,820
+the transcript and type some
+然后自己在 Octave 中
+
+402
+00:15:18,820 --> 00:15:21,942
+of those commands into Octave yourself
+把这些命令重新输一遍
+
+403
+00:15:21,942 --> 00:15:24,752
+and start to play with these commands and get it to work.
+慢慢开始学会使用这些命令
+
+404
+00:15:24,752 --> 00:15:28,113
+And obviously, you know, there's no point at all to try to memorize all these commands.
+当然 没有必要把这些命令都记住
+
+405
+00:15:28,113 --> 00:15:30,030
+It's just, but what you
+你也不可能记得住
+
+406
+00:15:30,030 --> 00:15:31,852
+should do is, hopefully from
+你要做的就是
+
+407
+00:15:31,852 --> 00:15:32,910
+this video you have gotten a
+从这段视频里
+
+408
+00:15:32,910 --> 00:15:35,065
+sense of the sorts of things you can do.
+了解一下你可以用哪些命令 做哪些事
+
+409
+00:15:35,100 --> 00:15:36,519
+So that when later on when
+这样在你今后需要
+
+410
+00:15:36,520 --> 00:15:37,902
+you are trying to program a learning
+编写学习算法时
+
+411
+00:15:37,902 --> 00:15:39,630
+algorithms yourself, if you
+如果你要找到某个
+
+412
+00:15:39,630 --> 00:15:40,921
+are trying to find a specific
+Octave 中的命令
+
+413
+00:15:40,930 --> 00:15:42,455
+command that maybe you think
+你可能回想起
+
+414
+00:15:42,455 --> 00:15:43,878
+Octave can do because you think
+你之前在这里学到过
+
+415
+00:15:43,878 --> 00:15:45,325
+you might have seen it here, you
+然后你就可以查找
+
+416
+00:15:45,325 --> 00:15:47,300
+should refer to the transcript
+课程中提供的程序副本
+
+417
+00:15:47,300 --> 00:15:48,545
+of the session and look through
+这样就能很轻松地找到
+
+418
+00:15:48,560 --> 00:15:51,693
+that in order to find the commands you wanna use.
+你想使用的命令了
+
+419
+00:15:51,693 --> 00:15:53,069
+So, that's it for
+好了 这就是
+
+420
+00:15:53,069 --> 00:15:54,841
+moving data around and in
+移动数据这节课的全部内容
+
+421
+00:15:54,841 --> 00:15:56,060
+the next video what I'd like
+在下一段视频中
+
+422
+00:15:56,120 --> 00:15:57,699
+to do is start to tell
+我将开始向你介绍
+
+423
+00:15:57,740 --> 00:15:59,257
+you how to actually do
+怎样进行一些
+
+424
+00:15:59,257 --> 00:16:01,404
+complex computations on our
+更复杂的计算
+
+425
+00:16:01,410 --> 00:16:03,548
+data, and how to
+怎样对数据进行计算
+
+426
+00:16:03,550 --> 00:16:04,866
+compute on the data, and
+怎样对数据进行计算
+
+427
+00:16:04,866 --> 00:16:06,560
+actually start to implement learning algorithms.
+同时开始实现学习算法
+
diff --git a/srt/5 - 3 - Computing on Data (13 min).srt b/srt/5 - 3 - Computing on Data (13 min).srt
new file mode 100644
index 00000000..63d69330
--- /dev/null
+++ b/srt/5 - 3 - Computing on Data (13 min).srt
@@ -0,0 +1,1886 @@
+1
+00:00:00,220 --> 00:00:01,128
+Now that you know how to load
+现在 你已经学会了在Octave中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,128 --> 00:00:03,062
+and save data in Octave, put
+如何加载或存储数据
+
+3
+00:00:03,062 --> 00:00:04,743
+your data into matrices and so
+如何把数据存入矩阵 等等
+
+4
+00:00:04,743 --> 00:00:06,301
+on. In this video I'd like
+在这段视频中
+
+5
+00:00:06,301 --> 00:00:08,252
+to show you how to do computational
+我将向你介绍
+
+6
+00:00:08,252 --> 00:00:10,343
+operations on data and
+如何对数据进行运算
+
+7
+00:00:10,343 --> 00:00:12,296
+later on we'll be using this
+稍后我们将使用这些
+
+8
+00:00:12,320 --> 00:00:16,860
+sorts of computation operations to implement our learning algorithms.
+运算操作来实现我们的学习算法
+
+9
+00:00:16,860 --> 00:00:19,360
+Let's get started.
+现在我们开始吧
+
+10
+00:00:19,610 --> 00:00:21,031
+Here's my Octave window.
+这是我的 Octave 窗口
+
+11
+00:00:21,031 --> 00:00:22,737
+Let me just quickly initialize some
+我现在快速地
+
+12
+00:00:22,737 --> 00:00:24,939
+variables to use
+初始化一些变量
+
+13
+00:00:24,940 --> 00:00:26,679
+for examples and set A
+比如设置A
+
+14
+00:00:26,679 --> 00:00:29,185
+to be a 3 by 2 matrix.
+为一个3×2的矩阵
+
+15
+00:00:29,820 --> 00:00:31,495
+and set B to a
+设置B为
+
+16
+00:00:31,510 --> 00:00:33,319
+3 by 2 matrix and let's
+一个3 × 2矩阵
+
+17
+00:00:33,330 --> 00:00:35,106
+set C to a
+设置C为
+
+18
+00:00:35,120 --> 00:00:38,419
+2 by 2 matrix, like so.
+2 × 2矩阵
+
+19
+00:00:39,150 --> 00:00:41,948
+Now, let's say I want to multiply 2 of my matrices.
+现在 我想算两个矩阵的乘积
+
+20
+00:00:41,960 --> 00:00:44,121
+So, let's say I wanna compute AxC.
+比如说 A × C
+
+21
+00:00:44,121 --> 00:00:45,713
+I just type AxC.
+我只需键入A×C
+
+22
+00:00:45,740 --> 00:00:48,848
+So, it's a 3 by 2 matrix times a 2 by 2 matrix.
+这是一个 3×2 矩阵乘以 2×2 矩阵
+
+23
+00:00:48,860 --> 00:00:52,135
+This gives me this 3 by 2 matrix.
+得到这样一个3×2矩阵
+
+24
+00:00:52,160 --> 00:00:53,736
+You can also do elements wise
+你也可以对每一个元素
+
+25
+00:00:53,740 --> 00:00:56,472
+operations and do A.xB
+做运算 方法是做点乘运算A .*B
+
+26
+00:00:56,500 --> 00:00:57,615
+and what this would do is
+这么做
+
+27
+00:00:57,615 --> 00:00:59,138
+they'll take each elements of A
+Octave将矩阵 A 中的每一个元素
+
+28
+00:00:59,138 --> 00:01:00,584
+and multiply it by
+与矩阵 B 中的
+
+29
+00:01:00,590 --> 00:01:02,558
+the corresponding elements of B.
+对应元素相乘
+
+30
+00:01:02,560 --> 00:01:06,390
+So, that's A, that's B, that's A.xB.
+这是A 这是B 这是A .* B
+
+31
+00:01:06,700 --> 00:01:09,412
+So, for example, the first element
+比如说 这里第一个元素
+
+32
+00:01:09,420 --> 00:01:10,940
+gives 1 times 11 which gives 11.
+1乘以11得到11
+
+33
+00:01:10,950 --> 00:01:14,045
+The second element gives
+第二个元素是
+
+34
+00:01:14,045 --> 00:01:16,752
+2 x 12 which gives 24 and so on.
+2乘以12得到24
+
+35
+00:01:16,760 --> 00:01:18,196
+So it is the element
+这就是两个矩阵的
+
+36
+00:01:18,196 --> 00:01:19,673
+wise multiplication of two
+元素位运算
+
+37
+00:01:19,673 --> 00:01:21,500
+matrices, and in general
+通常来说
+
+38
+00:01:21,520 --> 00:01:23,359
+the P rand tends to,
+在Octave中
+
+39
+00:01:23,380 --> 00:01:25,132
+it's usually used, to denote
+点号一般
+
+40
+00:01:25,132 --> 00:01:27,435
+element wise operations in octave.
+用来表示元素位运算
+
+41
+00:01:27,435 --> 00:01:28,882
+So, here's a matrix
+这里是一个矩阵A
+
+42
+00:01:28,882 --> 00:01:31,735
+A and I'll do A dot carry 2.
+这里我输入A .^ 2
+
+43
+00:01:31,735 --> 00:01:33,001
+This gives me the multi,
+这将对矩阵A中
+
+44
+00:01:33,010 --> 00:01:35,671
+the element wise squaring of
+每一个元素平方
+
+45
+00:01:35,690 --> 00:01:37,411
+A, so 1 squared
+所以 1的平方是1
+
+46
+00:01:37,411 --> 00:01:40,813
+is 1, 2 squared is 4 and so on.
+2的平方是4 等等
+
+47
+00:01:40,870 --> 00:01:42,215
+Let's set V to a vector,
+我们设V是一个向量
+
+48
+00:01:42,260 --> 00:01:46,085
+we'll set V as 123 as a column vector.
+设V为 [1; 2; 3] 是列向量
+
+49
+00:01:46,180 --> 00:01:47,848
+You can also do 1.
+你也可以输入
+
+50
+00:01:47,860 --> 00:01:49,675
+over V to do
+1 ./ V
+
+51
+00:01:49,675 --> 00:01:51,533
+the element wise reciprocal of
+得到每一个元素的倒数
+
+52
+00:01:51,533 --> 00:01:53,176
+V so this gives me
+所以这样一来
+
+53
+00:01:53,210 --> 00:01:55,600
+one over one, one over two and one over three.
+就会分别算出 1/1 1/2 1/3
+
+54
+00:01:55,600 --> 00:01:56,898
+This works too for matrices so
+矩阵也可以这样操作
+
+55
+00:01:56,898 --> 00:01:58,436
+one dot over A, gives me
+1 ./ A 得到
+
+56
+00:01:58,470 --> 00:02:00,464
+that element wise inverse of
+A中每一个元素的倒数
+
+57
+00:02:00,520 --> 00:02:03,342
+A. and once
+同样地
+
+58
+00:02:03,342 --> 00:02:04,813
+again the P radians gives use
+这里的点号
+
+59
+00:02:04,830 --> 00:02:08,193
+a clue that this is an elements wise operation.
+还是表示对每一个元素进行操作
+
+60
+00:02:08,193 --> 00:02:09,663
+To also do things like log
+我们还可以进行求对数运算
+
+61
+00:02:09,663 --> 00:02:11,591
+V This is an element wise
+也就是对每个元素
+
+62
+00:02:11,600 --> 00:02:14,257
+logarithm of, the
+进行求对数运算
+
+63
+00:02:14,257 --> 00:02:15,418
+V, E to the
+还有自然数e的幂次运算
+
+64
+00:02:15,420 --> 00:02:17,394
+V, is the base E
+就是以e为底
+
+65
+00:02:17,394 --> 00:02:20,288
+exponentiation of these elements
+以这些元素为幂的运算
+
+66
+00:02:20,330 --> 00:02:21,432
+of this is E, this is E
+所以这是e 这是e的平方
+
+67
+00:02:21,432 --> 00:02:23,105
+squared EQ, this is
+这是e的立方
+
+68
+00:02:23,105 --> 00:02:26,010
+V. And I
+v 矩阵是这样的
+
+69
+00:02:26,120 --> 00:02:28,187
+can also do apps V to
+我还可以用 abs
+
+70
+00:02:28,230 --> 00:02:30,172
+take the element wise absolute
+来对 v 的每一个元素
+
+71
+00:02:30,172 --> 00:02:32,056
+value of V. So here,
+求绝对值
+
+72
+00:02:32,056 --> 00:02:34,418
+V was all positive, abs, say
+当然这里 v 都是正数
+
+73
+00:02:34,430 --> 00:02:36,503
+minus 1 to minus 3,
+我们换成另一个
+
+74
+00:02:36,503 --> 00:02:38,543
+the element wise Absolute
+这样对每个元素求绝对值
+
+75
+00:02:38,543 --> 00:02:40,428
+value gives me back these
+得到的结果就是
+
+76
+00:02:40,430 --> 00:02:43,929
+non-negative values and negative
+这些非负的元素
+
+77
+00:02:43,929 --> 00:02:45,465
+V gives me the minus
+还有 -v
+
+78
+00:02:45,465 --> 00:02:46,715
+of V. This is the same
+给出V中每个元素的相反数
+
+79
+00:02:46,730 --> 00:02:49,085
+as -1xV but usually
+这等价于 -1 乘以 v
+
+80
+00:02:49,085 --> 00:02:50,653
+you just write negative V and
+不过一般就直接用 -v 就好了
+
+81
+00:02:50,653 --> 00:02:55,340
+so that negative 1xV and what else can you do?
+其实就等于 -1*v
+
+82
+00:02:55,990 --> 00:02:57,185
+Here's another neat trick.
+还有什么呢?
+
+83
+00:02:57,185 --> 00:02:58,343
+So Let's see.
+还有一个技巧
+
+84
+00:02:58,343 --> 00:03:01,424
+Let's say I want to take V and increment each of these elements by 1.
+比如说 我们想对v中的每个元素都加1
+
+85
+00:03:01,424 --> 00:03:02,520
+Well, one way to do
+那么我们可以这么做
+
+86
+00:03:02,520 --> 00:03:05,407
+it is by constructing a
+首先构造一个
+
+87
+00:03:05,420 --> 00:03:09,010
+3 by 1 vector
+3行1列的1向量
+
+88
+00:03:09,660 --> 00:03:12,666
+this all ones and adding that to V. So, they do that.
+然后把这个1向量跟原来的向量相加
+
+89
+00:03:12,666 --> 00:03:15,373
+This increments V by for 123 to 234.
+因此 v 向量从[1 2 3] 增至 [2 3 4]
+
+90
+00:03:15,373 --> 00:03:16,804
+The way I did
+我用了一个
+
+91
+00:03:16,804 --> 00:03:21,439
+that was length of V, is three.
+length(v) 命令
+
+92
+00:03:21,890 --> 00:03:23,790
+So ones, length of
+因此这样一来
+
+93
+00:03:23,790 --> 00:03:25,792
+V by one, this is ones
+ones(length(v) ,1) 就相当于
+
+94
+00:03:25,820 --> 00:03:27,055
+of three by one.
+ones(3,1)
+
+95
+00:03:27,055 --> 00:03:29,525
+So that's ones, three by one.
+所以这是ones(3,1)
+
+96
+00:03:29,580 --> 00:03:31,150
+On the right and what I
+对吧 然后我做的是
+
+97
+00:03:31,230 --> 00:03:33,198
+did was B plus ones,
+v + ones(3,1)
+
+98
+00:03:33,198 --> 00:03:35,139
+V by one, which is adding
+也就是将 v 的各元素
+
+99
+00:03:35,150 --> 00:03:36,605
+this vector of all ones
+都加上这些1
+
+100
+00:03:36,610 --> 00:03:38,112
+to B. And so this increments
+这样就将 v 的每个元素
+
+101
+00:03:38,112 --> 00:03:40,340
+V by one.
+增加了1
+
+102
+00:03:40,340 --> 00:03:41,984
+And you, another simpler
+另一种更简单的方法是
+
+103
+00:03:41,984 --> 00:03:44,472
+way to do that is to type V+ one, right?
+直接用 v+1
+
+104
+00:03:44,472 --> 00:03:45,600
+So that's V and
+所以这是 v
+
+105
+00:03:45,650 --> 00:03:46,989
+V+ one also means to
+v + 1 也就等于
+
+106
+00:03:47,000 --> 00:03:49,257
+add one element wise to
+把 v 中的每一个元素
+
+107
+00:03:49,280 --> 00:03:52,458
+each of my elements of V.
+都加上1
+
+108
+00:03:52,458 --> 00:03:55,422
+Now, let's talk about more operations.
+现在 让我们来谈谈更多的操作
+
+109
+00:03:55,450 --> 00:03:58,848
+So, here's my matrix A. If you want to write A transpose.
+这是我的矩阵A 如果你想要求它的转置
+
+110
+00:03:58,848 --> 00:04:00,841
+The way to do that is to write A prime.
+那么方法是用A‘
+
+111
+00:04:00,900 --> 00:04:02,653
+That's the apostrophe symbol.
+这是单引号符号
+
+112
+00:04:02,660 --> 00:04:03,770
+It's the left quote.
+并且是左引号
+
+113
+00:04:03,770 --> 00:04:05,355
+So, on your keyboard
+可能你的键盘上
+
+114
+00:04:05,355 --> 00:04:06,975
+you probably have a left
+有一个左引号
+
+115
+00:04:06,975 --> 00:04:08,106
+quote and a right quote.
+和一个右引号
+
+116
+00:04:08,106 --> 00:04:09,901
+So this is a at the
+这里用的是左引号
+
+117
+00:04:09,950 --> 00:04:12,304
+standard quotation mark is a,
+也就是标准的引号
+
+118
+00:04:12,304 --> 00:04:14,765
+what to say, a transpose
+因此 A’
+
+119
+00:04:14,765 --> 00:04:16,172
+to excuse me the, you
+将得出 A 的转置矩阵
+
+120
+00:04:16,172 --> 00:04:17,228
+know, a transpose of my
+当然
+
+121
+00:04:17,228 --> 00:04:18,919
+major and of course
+如果我写 (A‘)’
+
+122
+00:04:18,919 --> 00:04:20,405
+a transpose if I transpose
+也就是 A 转置两次
+
+123
+00:04:20,405 --> 00:04:21,650
+that again then I should
+那么我又重新得到矩阵 A
+
+124
+00:04:21,650 --> 00:04:26,509
+get back my matrix A. Some more useful functions.
+还有一些有用的函数
+
+125
+00:04:26,540 --> 00:04:28,646
+Let's say locate A is
+假如说 小写a
+
+126
+00:04:28,646 --> 00:04:30,546
+1 15 to 0.5.
+是[1 15 2 0.5]
+
+127
+00:04:30,546 --> 00:04:34,266
+So, it's a, you know, 1 by 4 matrix.
+这是一个1行4列矩阵
+
+128
+00:04:34,266 --> 00:04:36,239
+Let's say set val equals max
+假如说 val=max(a)
+
+129
+00:04:36,239 --> 00:04:37,833
+of A. This returns the
+这将返回
+
+130
+00:04:37,833 --> 00:04:39,328
+maximum value of A, which
+A矩阵中的最大值
+
+131
+00:04:39,328 --> 00:04:41,481
+in this case is 15 and
+在这里是15
+
+132
+00:04:41,500 --> 00:04:44,465
+I can do val ind max
+我还可以写 [val, ind] = max(a)
+
+133
+00:04:44,490 --> 00:04:47,115
+A. And this returns
+这将返回
+
+134
+00:04:47,120 --> 00:04:49,634
+val of int which are
+a矩阵中的最大值
+
+135
+00:04:49,634 --> 00:04:51,289
+the maximum value of A
+存入val
+
+136
+00:04:51,289 --> 00:04:52,943
+which is 15 as was the index.
+以及该值对应的索引
+
+137
+00:04:52,943 --> 00:04:56,028
+So the elements number two of A that 15.
+ 因此元素15对应的索引值为2 存入ind
+
+138
+00:04:56,028 --> 00:04:58,766
+So, in is my index into this.
+所以 ind 等于2
+
+139
+00:04:58,766 --> 00:05:00,148
+Just as a warning: if
+特别注意一下
+
+140
+00:05:00,148 --> 00:05:03,155
+you do max A where A is a matrix.
+如果你用命令 max(A) A是一个矩阵的话
+
+141
+00:05:03,180 --> 00:05:04,746
+What this does is this actually
+这样做就是对每一列
+
+142
+00:05:04,780 --> 00:05:07,848
+does the column wise maximum,
+求最大值
+
+143
+00:05:07,860 --> 00:05:11,525
+but say a little bit more about this in a second.
+等下再仔细讲讲
+
+144
+00:05:11,570 --> 00:05:13,305
+So, using this example of the
+我们还是用这个例子
+
+145
+00:05:13,305 --> 00:05:17,008
+variable lowercase A. If I do A less than three.
+这个 小a 矩阵
+
+146
+00:05:17,040 --> 00:05:19,548
+This does the element wise operation.
+如果输入 a<3
+
+147
+00:05:19,590 --> 00:05:21,063
+Element wise comparison.
+这将进行逐元素的运算
+
+148
+00:05:21,063 --> 00:05:22,624
+So, the first element
+所以 第一个元素
+
+149
+00:05:22,624 --> 00:05:24,855
+Of A is less than three equals to one.
+是小于3的 因此返回1
+
+150
+00:05:24,855 --> 00:05:26,315
+Second elements of A is
+a的第二个元素
+
+151
+00:05:26,315 --> 00:05:27,435
+not less than three, so
+不小于3 所以
+
+152
+00:05:27,435 --> 00:05:29,948
+this value is zero, because it is also.
+这个值是0 表示"非"
+
+153
+00:05:29,950 --> 00:05:31,258
+The third and fourth numbers of
+第三个和第四个数字
+
+154
+00:05:31,300 --> 00:05:32,866
+A are the lesson,
+仍然是小于3
+
+155
+00:05:32,870 --> 00:05:35,667
+I meant less than three, third and fourth elements are less than three.
+2和0.5都小于3
+
+156
+00:05:35,667 --> 00:05:36,826
+So this is one, one, so
+因此 这返回[1 1 0 1]
+
+157
+00:05:36,826 --> 00:05:38,441
+this is just the element wide
+也就是说
+
+158
+00:05:38,460 --> 00:05:40,241
+comparison of all four
+对a矩阵的每一个元素
+
+159
+00:05:40,280 --> 00:05:42,504
+element variable lower case
+与3进行比较
+
+160
+00:05:42,520 --> 00:05:44,008
+three and it returns true
+然后根据每一个元素与3的大小关系
+
+161
+00:05:44,020 --> 00:05:47,382
+or false depending on whether or not it's less than three.
+返回1和0表示真与假
+
+162
+00:05:47,400 --> 00:05:48,843
+Now, if I do find
+现在 如果我写 find(a<3)
+
+163
+00:05:48,880 --> 00:05:50,708
+A less than three, this would
+这将告诉我
+
+164
+00:05:50,710 --> 00:05:52,149
+tell me which are the
+a 中的哪些元素
+
+165
+00:05:52,190 --> 00:05:53,805
+elements of A that the
+是小于3的
+
+166
+00:05:53,860 --> 00:05:55,202
+variable A of less than three
+是小于3的
+
+167
+00:05:55,202 --> 00:05:56,964
+and in this case the 1st, 3rd
+在这里就是第一 第三和第四个元素
+
+168
+00:05:56,964 --> 00:06:00,244
+and 4th elements are lesson three.
+是小于3的
+
+169
+00:06:00,244 --> 00:06:01,465
+For my next example Oh, let
+下一个例子
+
+170
+00:06:01,465 --> 00:06:03,335
+me set eight be code to
+设A = magic(3)
+
+171
+00:06:03,340 --> 00:06:05,765
+magic three. The magic
+magic 函数返回什么呢
+
+172
+00:06:05,765 --> 00:06:07,409
+function returns. Let's type help magic. Functions called
+让我们查看 magic 函数的帮助文件
+
+173
+00:06:09,390 --> 00:06:12,581
+The magic function returns.
+magic 函数将返回
+
+174
+00:06:12,581 --> 00:06:15,362
+Returns this matrices called magic squares.
+一个矩阵 称为魔方阵或幻方 (magic squares)
+
+175
+00:06:15,362 --> 00:06:17,722
+They have this, you know,
+它们具有以下
+
+176
+00:06:17,740 --> 00:06:20,012
+mathematical property that all
+这样的数学性质
+
+177
+00:06:20,030 --> 00:06:21,590
+of their rows and columns and
+它们所有的行和列和对角线
+
+178
+00:06:21,590 --> 00:06:23,730
+diagonals sum up to the same thing.
+加起来都等于相同的值
+
+179
+00:06:23,730 --> 00:06:25,535
+So, you know, it's
+当然据我所知
+
+180
+00:06:25,580 --> 00:06:27,378
+not actually useful for machine
+这在机器学习里
+
+181
+00:06:27,378 --> 00:06:28,385
+learning as far as I
+基本用不上
+
+182
+00:06:28,385 --> 00:06:29,688
+know, but I'm just using
+但我可以用这个方法
+
+183
+00:06:29,688 --> 00:06:31,720
+this as a convenient way,
+很方便地生成一个
+
+184
+00:06:31,720 --> 00:06:33,058
+you know, to generate a 3
+3行3列的矩阵
+
+185
+00:06:33,058 --> 00:06:36,206
+by 3 matrix and this magic square screen.
+而这个魔方矩阵这神奇的方形屏幕。
+
+186
+00:06:36,220 --> 00:06:37,228
+We have the power of 3
+每一行 每一列
+
+187
+00:06:37,228 --> 00:06:39,500
+at each row, each column and
+每一个对角线
+
+188
+00:06:39,510 --> 00:06:41,055
+the diagonals all add up
+三个数字加起来
+
+189
+00:06:41,055 --> 00:06:44,487
+to the same thing, so it's kind of a mathematical construct.
+都是等于同一个数
+
+190
+00:06:44,510 --> 00:06:45,789
+I use magic, I use this
+我只有在演示功能
+
+191
+00:06:45,800 --> 00:06:47,110
+magic function only when I'm
+或者上课教 Octave 的时候
+
+192
+00:06:47,110 --> 00:06:48,118
+doing demos, or when I'm
+会用到这个矩阵
+
+193
+00:06:48,140 --> 00:06:49,571
+teaching Octave like this and
+在其他有用的机器学习应用中
+
+194
+00:06:49,580 --> 00:06:51,103
+I don't actually use it for
+这个矩阵其实没多大作用
+
+195
+00:06:51,103 --> 00:06:53,846
+any, you know, useful machine learning application.
+让我来看看别的
+
+196
+00:06:53,860 --> 00:06:59,356
+But, let's see, if I type RC equals find A greater than or equals 7.
+如果我输入 [r,c] = find( A>=7 )
+
+197
+00:06:59,390 --> 00:07:02,657
+This finds all the elements
+这将找出所有A矩阵中
+
+198
+00:07:02,657 --> 00:07:03,797
+of a that are greater than
+大于等于7的元素
+
+199
+00:07:03,797 --> 00:07:05,246
+and equals to 7 and
+因此
+
+200
+00:07:05,246 --> 00:07:07,044
+so, RC sense a row and column.
+r 和 c 分别表示行和列
+
+201
+00:07:07,100 --> 00:07:09,392
+So, the 11 element is greater than 7.
+这就表示 第一行第一列的元素大于等于7
+
+202
+00:07:09,400 --> 00:07:10,973
+The three two elements is
+第三行第二列的元素大于等于7
+
+203
+00:07:10,980 --> 00:07:13,178
+greater than 7 and the two 3 elements is greater than 7.
+第二行第三列的元素大于等于7
+
+204
+00:07:13,200 --> 00:07:14,788
+So let's see, the two, three
+我们来看看 第二行第三列的元素
+
+205
+00:07:14,800 --> 00:07:18,803
+element for example, is A two, three.
+就是 A(2,3)
+
+206
+00:07:18,850 --> 00:07:21,102
+Is seven, is this element
+是等于7的
+
+207
+00:07:21,120 --> 00:07:24,248
+out here, and that is indeed greater than or equal seven.
+就是这个元素 确实是大于等于7的
+
+208
+00:07:24,248 --> 00:07:26,005
+By the way, I actually don't even
+顺便说一句 其实我从来都
+
+209
+00:07:26,030 --> 00:07:27,613
+memorize myself what these
+不去刻意记住这个 find 函数
+
+210
+00:07:27,613 --> 00:07:28,944
+find functions do in the
+到底是怎么用的
+
+211
+00:07:28,960 --> 00:07:30,323
+all these things do myself and
+我只需要会用 help 函数就可以了
+
+212
+00:07:30,323 --> 00:07:31,399
+whenever I use a find
+每当我在使用这个函数
+
+213
+00:07:31,399 --> 00:07:33,042
+function, sometimes I forget
+忘记怎么用的时候
+
+214
+00:07:33,070 --> 00:07:34,791
+myself exactly what does, and
+我就可以用 help 函数
+
+215
+00:07:34,791 --> 00:07:37,952
+you know, type help find to look up the document.
+键入 help find 来找到帮助文档
+
+216
+00:07:37,970 --> 00:07:40,042
+Okay, just two more things, if it's okay, to show you.
+好吧 最后再讲两个内容
+
+217
+00:07:40,042 --> 00:07:41,549
+One is the sum function.
+一个是求和函数
+
+218
+00:07:41,549 --> 00:07:43,452
+So here's my A and
+这是 a 矩阵
+
+219
+00:07:43,452 --> 00:07:44,755
+I type sum A. This adds
+键入 sum(a)
+
+220
+00:07:44,800 --> 00:07:46,500
+up all the elements of A.
+就把 a 中所有元素加起来了
+
+221
+00:07:46,510 --> 00:07:47,660
+And if I want to multiply them
+如果我想把它们都乘起来
+
+222
+00:07:47,660 --> 00:07:49,404
+together, I type prod A.
+键入 prod(a)
+
+223
+00:07:49,410 --> 00:07:50,795
+Prod sense of product,
+prod 意思是 product(乘积)
+
+224
+00:07:50,800 --> 00:07:53,022
+and it returns the products of
+它将返回
+
+225
+00:07:53,022 --> 00:07:55,773
+these four elements of A.
+这四个元素的乘积
+
+226
+00:07:56,040 --> 00:07:58,215
+Floor A rounds down,
+floor(a) 是向下四舍五入
+
+227
+00:07:58,215 --> 00:07:59,465
+these elements of A, so zero
+因此对于 a 中的元素
+
+228
+00:07:59,470 --> 00:08:01,766
+O point five gets rounded down to zero.
+0.5将被下舍入变成0
+
+229
+00:08:01,766 --> 00:08:03,352
+And ceil, or ceiling A,
+还有 ceil(A)
+
+230
+00:08:03,380 --> 00:08:04,815
+gets rounded up, so zero
+表示向上四舍五入
+
+231
+00:08:04,815 --> 00:08:06,212
+point five, rounded up to
+所以0.5将上舍入变为
+
+232
+00:08:06,220 --> 00:08:10,735
+the nearest integer, so zero point five gets rounded up to one.
+最接近的整数 也就是1
+
+233
+00:08:10,735 --> 00:08:12,143
+You can also.
+还有
+
+234
+00:08:12,143 --> 00:08:13,322
+Let's see.
+我们来看
+
+235
+00:08:13,322 --> 00:08:14,418
+Let me type rand 3.
+键入 type(3)
+
+236
+00:08:14,418 --> 00:08:16,643
+This generally sets a 3 by 3 matrix.
+这通常得到一个3×3的矩阵
+
+237
+00:08:16,680 --> 00:08:20,444
+If I type max randd 3, rand 3.
+如果键入 max(rand(3), rand(3))
+
+238
+00:08:20,460 --> 00:08:21,848
+What this does is it takes
+这样做的结果是
+
+239
+00:08:21,848 --> 00:08:24,963
+the element wise maximum of
+返回两个3×3的随机矩阵
+
+240
+00:08:24,963 --> 00:08:26,897
+2 random 3 by 3 matrices.
+并且逐元素比较 取最大值
+
+241
+00:08:26,900 --> 00:08:28,017
+So, you'll notice all these
+所以 你会发现所有这些
+
+242
+00:08:28,017 --> 00:08:29,063
+numbers tend to be a bit on the
+数字几乎都比较大
+
+243
+00:08:29,063 --> 00:08:30,948
+large side because each of
+因为这里的每个元素
+
+244
+00:08:30,948 --> 00:08:32,581
+these is actually the max of
+都实际上是
+
+245
+00:08:32,581 --> 00:08:35,093
+a randomly, of element Y's
+两个随机生成的矩阵
+
+246
+00:08:35,110 --> 00:08:38,269
+max of two randomly generated matrices.
+逐元素进行比较 取最大的那个值
+
+247
+00:08:38,269 --> 00:08:40,316
+This is my magic number.
+这是刚才生成的
+
+248
+00:08:40,316 --> 00:08:43,258
+This was my magic square 3x3a.
+3×3魔方阵 A
+
+249
+00:08:43,258 --> 00:08:47,704
+Let's say I type max A and then this will be it.
+假如我输入
+
+250
+00:08:47,730 --> 00:08:49,955
+Open, close, square brackets comma 1.
+max(A,[],1)
+
+251
+00:08:49,955 --> 00:08:51,344
+What this does is
+这样做会得到
+
+252
+00:08:51,360 --> 00:08:53,584
+this takes the column wise maximum.
+每一列的最大值
+
+253
+00:08:53,600 --> 00:08:54,892
+So, the maximum of the
+所以第一例的最大值
+
+254
+00:08:54,910 --> 00:08:56,517
+first column is eight, max
+就是8
+
+255
+00:08:56,517 --> 00:08:58,335
+of the second column is nine,
+第二列是9
+
+256
+00:08:58,335 --> 00:09:00,695
+the max of the third column is seven.
+第三列的最大值是7
+
+257
+00:09:00,695 --> 00:09:02,064
+This 1 means to take the
+这里的1表示
+
+258
+00:09:02,100 --> 00:09:03,665
+max along the first dimension of
+取A矩阵第一个维度的最大值
+
+259
+00:09:03,700 --> 00:09:05,860
+A. In contrast, if
+相对地
+
+260
+00:09:05,940 --> 00:09:07,874
+I were to type max a, this
+如果我键入
+
+261
+00:09:07,910 --> 00:09:10,033
+funny notation 2 then this
+max(A,[],2)
+
+262
+00:09:10,033 --> 00:09:12,433
+takes the per row maximum.
+这将得到每一行的最大值
+
+263
+00:09:12,460 --> 00:09:13,449
+So, the maximum for the first
+所以 第一行的最大值
+
+264
+00:09:13,449 --> 00:09:14,525
+row is 8, max of
+是等于8
+
+265
+00:09:14,560 --> 00:09:16,561
+second row is 7, max
+第二行最大值是7
+
+266
+00:09:16,580 --> 00:09:18,105
+of the third row is 9
+第三行是9
+
+267
+00:09:18,105 --> 00:09:21,605
+and so this allows you to take maxes.
+所以你可以用这个方法
+
+268
+00:09:21,605 --> 00:09:24,771
+You know, per row or per column.
+来求得每一行或每一列的最值
+
+269
+00:09:24,780 --> 00:09:26,988
+And if you want to, and
+另外
+
+270
+00:09:26,988 --> 00:09:29,019
+remember it defaults to column
+你要知道 默认情况下
+
+271
+00:09:29,020 --> 00:09:30,091
+mark wise elements on this,
+max(A)返回的是
+
+272
+00:09:30,091 --> 00:09:31,628
+so if you want to find
+每一列的最大值
+
+273
+00:09:31,630 --> 00:09:33,395
+the maximum element in
+如果你想要
+
+274
+00:09:33,395 --> 00:09:35,040
+the entire matrix A, you
+找出整个矩阵A的最大值
+
+275
+00:09:35,040 --> 00:09:36,985
+can type max of max
+你可以输入
+
+276
+00:09:36,985 --> 00:09:39,558
+of A, like so, which is nine.
+max(max(A)) 像这样
+
+277
+00:09:39,558 --> 00:09:40,640
+Or you can turn A into
+或者你可以将 A 矩阵转成
+
+278
+00:09:40,670 --> 00:09:42,507
+a vector and type max
+一个向量
+
+279
+00:09:42,507 --> 00:09:44,739
+of A colon, like
+然后键入 max(A(:))
+
+280
+00:09:44,750 --> 00:09:46,912
+so, this treats this as a vector
+这样做就是把 A 当做一个向量
+
+281
+00:09:46,912 --> 00:09:51,539
+and takes the max element of vector.
+并返回 A 向量中的最大值
+
+282
+00:09:51,572 --> 00:09:54,288
+Finally, let's set A
+最后 让我们把 A 设为一个
+
+283
+00:09:54,288 --> 00:09:56,234
+to be a nine by nine magic square.
+9行9列的魔方阵
+
+284
+00:09:56,234 --> 00:09:57,853
+So remember, the magic square
+别忘了
+
+285
+00:09:57,853 --> 00:09:59,969
+has this property that every
+魔方阵具有的特性是
+
+286
+00:09:59,969 --> 00:10:03,535
+column in every row sums the same thing and also the diagonals.
+每行每列和对角线的求和都是相等的
+
+287
+00:10:03,535 --> 00:10:06,209
+So here is 9X9 magic square.
+这是一个9×9的魔方阵
+
+288
+00:10:06,240 --> 00:10:07,715
+So let me just sum A one
+我们来求一个 sum(A,1)
+
+289
+00:10:07,715 --> 00:10:10,169
+so this does a per column sum.
+这样就得到每一列的总和
+
+290
+00:10:10,190 --> 00:10:11,104
+And so I'm going to take each
+所以这样做就是
+
+291
+00:10:11,104 --> 00:10:12,194
+column of A and add
+把 A 的每一列进行求和
+
+292
+00:10:12,194 --> 00:10:13,698
+them up and this, you
+从这里我们也可以看出
+
+293
+00:10:13,700 --> 00:10:15,365
+know, lets us verify that indeed
+这也验证了
+
+294
+00:10:15,365 --> 00:10:16,935
+for 9 by 9 magic square.
+一个9×9的魔方阵
+
+295
+00:10:16,935 --> 00:10:20,124
+Every column adds up to 369 as of the same thing.
+确实每一列加起来都相等 都为369
+
+296
+00:10:20,124 --> 00:10:22,020
+Now, let's do the row wise sum.
+现在我们来求每一行的和
+
+297
+00:10:22,020 --> 00:10:24,643
+So, the sum A comma 2
+键入sum(A,2)
+
+298
+00:10:24,643 --> 00:10:27,967
+and this sums
+这样就得到了
+
+299
+00:10:28,030 --> 00:10:29,269
+up each row of A
+A 中每一行的和
+
+300
+00:10:29,269 --> 00:10:30,522
+and each row of A
+A 中每一行的和
+
+301
+00:10:30,522 --> 00:10:32,113
+also sums up to 369.
+加起来还是369
+
+302
+00:10:32,113 --> 00:10:34,485
+Now let's sum the
+现在我们来算
+
+303
+00:10:34,500 --> 00:10:35,934
+diagonal elements of A
+A 的对角线元素的和
+
+304
+00:10:35,990 --> 00:10:37,362
+and make sure that they, that
+看看它们的和
+
+305
+00:10:37,370 --> 00:10:39,696
+that also sums up to the same thing.
+是不是也相等
+
+306
+00:10:39,730 --> 00:10:40,924
+So what I'm going to
+我们现在构造一个
+
+307
+00:10:40,924 --> 00:10:42,613
+do is, construct a nine
+9×9 的单位矩阵
+
+308
+00:10:42,613 --> 00:10:44,325
+by nine identity matrix, that's
+键入 eye(9)
+
+309
+00:10:44,360 --> 00:10:46,018
+I9, and let me
+设为I9
+
+310
+00:10:46,018 --> 00:10:49,326
+take A and construct, multiply
+然后我们要用 A
+
+311
+00:10:49,326 --> 00:10:51,272
+A elements wise.
+逐点乘以这个单位矩阵
+
+312
+00:10:51,300 --> 00:10:52,812
+So here's my matrix of A.
+这是矩阵A
+
+313
+00:10:52,812 --> 00:10:56,350
+I'm gonna do A.xI9 and what
+我现在用 A 逐点乘以 eye(9)
+
+314
+00:10:56,490 --> 00:10:58,018
+this will do is take the
+这样做的结果是
+
+315
+00:10:58,020 --> 00:11:00,035
+element wise product of these
+两个矩阵对应元素
+
+316
+00:11:00,035 --> 00:11:01,150
+2 matrices, and so this
+将进行相乘
+
+317
+00:11:01,150 --> 00:11:03,605
+should wipe out everything except
+除了对角线元素外
+
+318
+00:11:03,680 --> 00:11:06,421
+for the diagonal entries and now
+其他元素都会得到0
+
+319
+00:11:06,421 --> 00:11:08,761
+I'm going to sum, sum of
+然后我对刚才求到的结果
+
+320
+00:11:08,780 --> 00:11:11,179
+A of that and this
+键入sum(sum(A.*eye(9))
+
+321
+00:11:11,180 --> 00:11:14,512
+gives me the sum of
+这实际上是求得了
+
+322
+00:11:14,512 --> 00:11:16,684
+these diagonal elements, and indeed it is 369.
+这个矩阵对角线元素的和 确实是369
+
+323
+00:11:16,684 --> 00:11:20,218
+You can sum up the other diagonal as well.
+你也可以求另一条对角线的和
+
+324
+00:11:20,240 --> 00:11:22,385
+So this top left to bottom right.
+这个是从左上角到右下角的
+
+325
+00:11:22,400 --> 00:11:24,158
+You can sum up the opposite diagonal
+你也可以求另一条对角线
+
+326
+00:11:24,180 --> 00:11:26,832
+from bottom left to top right.
+从左下角到右上角
+
+327
+00:11:26,832 --> 00:11:30,199
+The sum, the commands for this is somewhat more cryptic.
+这个和 这个命令会有点麻烦
+
+328
+00:11:30,200 --> 00:11:31,535
+You don't really need to know this.
+其实你不需要知道这个
+
+329
+00:11:31,540 --> 00:11:33,122
+I'm just showing you just in
+我只是想给你看
+
+330
+00:11:33,122 --> 00:11:34,779
+case any of you are curious,
+如果你感兴趣的话可以听听
+
+331
+00:11:34,779 --> 00:11:37,543
+but let's see.
+让我们来看看
+
+332
+00:11:37,600 --> 00:11:41,235
+Flip UD stands for flip up/down.
+flipup/flipud 表示向上/向下翻转
+
+333
+00:11:41,235 --> 00:11:42,622
+If you do that, that turns out
+如果你用这个命令的话
+
+334
+00:11:42,622 --> 00:11:44,376
+to sum up the
+计算的就是副对角线上
+
+335
+00:11:44,376 --> 00:11:46,055
+elements in the opposites of,
+所有元素的和
+
+336
+00:11:46,055 --> 00:11:49,387
+the other diagonal that also sums up to 369.
+还是会得到369
+
+337
+00:11:49,390 --> 00:11:51,116
+Here, let me show you,
+我来给你演示一下
+
+338
+00:11:51,120 --> 00:11:53,055
+whereas i9 is this
+eye(9) 矩阵是这样
+
+339
+00:11:53,070 --> 00:11:57,300
+matrix, flip up/down of
+那么 flipup(eye(9))
+
+340
+00:11:57,370 --> 00:11:58,986
+i9, you know, takes the identity
+将得到一个单位矩阵
+
+341
+00:11:58,986 --> 00:12:00,832
+matrix and flips it vertically
+并且将它翻转
+
+342
+00:12:00,832 --> 00:12:01,822
+so you end up with, excuse me,
+不好意思打错了
+
+343
+00:12:01,822 --> 00:12:04,394
+flip UD, end up
+应该是flipud
+
+344
+00:12:04,400 --> 00:12:08,742
+with ones on this opposite diagonal as well.
+翻转以后所有的1就变成副对角线了
+
+345
+00:12:08,770 --> 00:12:10,430
+Just one last command and then
+最后再说一个命令
+
+346
+00:12:10,490 --> 00:12:12,706
+that's it, and then that will be it for this video.
+然后就下课
+
+347
+00:12:12,760 --> 00:12:13,730
+Let's say A to be the
+假如 A 是一个
+
+348
+00:12:13,730 --> 00:12:16,112
+3x3 magic square
+3×3的魔方阵
+
+349
+00:12:16,112 --> 00:12:17,221
+again. If you want
+同样地 如果你想
+
+350
+00:12:17,221 --> 00:12:18,493
+to invert the matrix, you
+这个矩阵的逆矩阵
+
+351
+00:12:18,493 --> 00:12:20,668
+type P inv A, this
+键入 pinv(A)
+
+352
+00:12:20,668 --> 00:12:23,612
+is typically called a pseudo inference, but it doesn't matter.
+通常称为伪逆矩阵 但这个名字不重要
+
+353
+00:12:23,612 --> 00:12:24,991
+Think of it as basically the inverse
+你就把它看成是
+
+354
+00:12:24,991 --> 00:12:26,927
+of A and that's the
+矩阵 A 求逆
+
+355
+00:12:26,960 --> 00:12:28,313
+inverse of A and second
+因此这就是 A 矩阵的逆矩阵
+
+356
+00:12:28,313 --> 00:12:31,721
+set, you know, 10 equals p
+设 temp = pinv(A)
+
+357
+00:12:31,740 --> 00:12:33,596
+of A and of temp times
+然后再用temp 乘以 A
+
+358
+00:12:33,596 --> 00:12:35,362
+A. This is indeed the
+这实际上得到的就是
+
+359
+00:12:35,362 --> 00:12:37,252
+identity matrix with essentially ones
+单位矩阵
+
+360
+00:12:37,260 --> 00:12:38,753
+on the diagonals and zeros on
+对角线为1 其他元素为0
+
+361
+00:12:38,753 --> 00:12:43,322
+the off-diagonals, up to a numerical round-off.
+稍微圆整一下就是
+
+362
+00:12:44,120 --> 00:12:45,746
+So, that's it for how
+好了 这样我们就介绍了
+
+363
+00:12:45,750 --> 00:12:48,430
+to do different computational operations
+如何对矩阵中的数字
+
+364
+00:12:48,430 --> 00:12:50,865
+on the data in matrices.
+进行各种操作
+
+365
+00:12:50,890 --> 00:12:53,055
+And after running a
+在运行完某个
+
+366
+00:12:53,055 --> 00:12:54,350
+learning algorithm, often one of
+学习算法之后
+
+367
+00:12:54,380 --> 00:12:55,876
+the most useful things is to
+通常一件最有用的事情
+
+368
+00:12:55,900 --> 00:12:57,223
+be able to look at your
+是看看你的结果
+
+369
+00:12:57,230 --> 00:13:00,013
+results, or to plot, or visualize your result.
+或者说让你的结果可视化
+
+370
+00:13:00,020 --> 00:13:01,675
+And in the next video I'm
+在接下来的视频中
+
+371
+00:13:01,675 --> 00:13:03,233
+going to very quickly show you
+我会非常迅速地告诉你
+
+372
+00:13:03,233 --> 00:13:04,230
+how, again, with one or
+如何很快地画出
+
+373
+00:13:04,300 --> 00:13:06,261
+two lines of code using Octave
+如何只用一两行代码
+
+374
+00:13:06,270 --> 00:13:07,814
+you can quickly visualize your
+你就可以快速地可视化你的数据
+
+375
+00:13:07,850 --> 00:13:09,901
+data, or plot your data
+画出你的数据
+
+376
+00:13:09,901 --> 00:13:11,101
+and use that to better
+这样你就能更好地理解
+
+377
+00:13:11,101 --> 00:13:14,880
+understand, you know, what your learning algorithms are doing.
+你使用的学习算法 (字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
diff --git a/srt/5 - 4 - Plotting Data (10 min).srt b/srt/5 - 4 - Plotting Data (10 min).srt
new file mode 100644
index 00000000..1addf299
--- /dev/null
+++ b/srt/5 - 4 - Plotting Data (10 min).srt
@@ -0,0 +1,1311 @@
+1
+00:00:00,180 --> 00:00:02,402
+When developing learning algorithms, very
+当开发学习算法时
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,402 --> 00:00:04,066
+often a few simple plots
+往往几个简单的图
+
+3
+00:00:04,066 --> 00:00:05,279
+can give you a better
+可以让你更好地
+
+4
+00:00:05,279 --> 00:00:06,593
+sense of what the algorithm
+理解算法的内容
+
+5
+00:00:06,593 --> 00:00:08,423
+is doing and just sanity check
+并且可以完整地检查下
+
+6
+00:00:08,423 --> 00:00:09,503
+that everything is going okay
+算法是否正常运行
+
+7
+00:00:09,510 --> 00:00:12,405
+and the algorithms doing what is supposed to.
+是否达到了算法的目的
+
+8
+00:00:12,410 --> 00:00:13,924
+For example, in an earlier
+例如在之前的
+
+9
+00:00:13,924 --> 00:00:15,244
+video, I talked about how
+视频中 我谈到了
+
+10
+00:00:15,244 --> 00:00:16,826
+plotting the cost function J
+绘制成本函数J(θ)
+
+11
+00:00:16,826 --> 00:00:18,433
+of theta can help you
+可以帮助
+
+12
+00:00:18,433 --> 00:00:21,383
+make sure that gradient descent is converging.
+确认梯度下降算法是否收敛
+
+13
+00:00:21,383 --> 00:00:23,084
+Often, plots of the data
+通常情况下 绘制数据
+
+14
+00:00:23,084 --> 00:00:24,795
+or of all the learning algorithm outputs
+或学习算法所有输出
+
+15
+00:00:24,810 --> 00:00:26,422
+will also give you ideas
+也会启发你
+
+16
+00:00:26,422 --> 00:00:29,391
+for how to improve your learning algorithm.
+如何改进你的学习算法
+
+17
+00:00:29,391 --> 00:00:31,312
+Fortunately, Octave has very
+幸运的是 Octave有非常
+
+18
+00:00:31,330 --> 00:00:33,045
+simple tools to generate lots
+简单的工具用来生成大量
+
+19
+00:00:33,070 --> 00:00:34,534
+of different plots and when
+不同的图
+
+20
+00:00:34,534 --> 00:00:36,518
+I use learning algorithms, I find
+当我用学习算法时 我发现
+
+21
+00:00:36,518 --> 00:00:38,082
+that plotting the data, plotting
+绘制数据
+
+22
+00:00:38,082 --> 00:00:40,175
+the learning algorithm and so
+绘制学习算法等
+
+23
+00:00:40,175 --> 00:00:42,057
+on are often an important
+往往是
+
+24
+00:00:42,060 --> 00:00:43,165
+part of how I get
+我获得想法
+
+25
+00:00:43,165 --> 00:00:45,937
+ideas for improving the
+来改进算法的重要部分
+
+26
+00:00:45,980 --> 00:00:47,199
+algorithms and in this video,
+在这段视频中
+
+27
+00:00:47,199 --> 00:00:48,482
+I'd like to show you some
+我想告诉你一些
+
+28
+00:00:48,482 --> 00:00:52,773
+of these Octave tools for plotting and visualizing your data.
+Octave的工具来绘制和可视化你的数据
+
+29
+00:00:53,700 --> 00:00:55,301
+Here's my Octave window.
+这是我的Octave窗口
+
+30
+00:00:55,301 --> 00:00:57,471
+Let's quickly generate some data
+我们先来快速生成一些数据
+
+31
+00:00:57,471 --> 00:00:58,646
+for us to plot.
+用来绘图
+
+32
+00:00:58,646 --> 00:00:59,724
+So I'm going to set T
+我先设置t
+
+33
+00:00:59,740 --> 00:01:02,181
+to be equal to, you know, this array of numbers.
+等于这个数列
+
+34
+00:01:02,210 --> 00:01:03,828
+Here's T, set of
+这是t
+
+35
+00:01:03,828 --> 00:01:06,685
+numbers going from 0 up to .98.
+是从0到0.98的集合
+
+36
+00:01:06,700 --> 00:01:09,048
+Let's set y1 equals sine
+让我们设置y1等于sin
+
+37
+00:01:09,060 --> 00:01:11,340
+of 2 pie 40 and
+2*pi*4*t (此处pi表示π)
+
+38
+00:01:12,540 --> 00:01:16,102
+if I want to plot the sine function, it's very easy.
+如果我想绘制正弦函数 这是很容易的
+
+39
+00:01:16,102 --> 00:01:17,918
+I just type plot T comma Y
+我只需要输入plot(t, y1)
+
+40
+00:01:17,918 --> 00:01:20,304
+1 and hit enter.
+并回车
+
+41
+00:01:20,320 --> 00:01:22,233
+And up comes this plot
+就出现了这个图
+
+42
+00:01:22,233 --> 00:01:24,270
+where the horizontal axis is
+横轴是
+
+43
+00:01:24,270 --> 00:01:25,515
+the T variable and the vertical
+t变量 纵轴是y1
+
+44
+00:01:25,515 --> 00:01:26,946
+axis is y1, which
+也就是我们
+
+45
+00:01:26,960 --> 00:01:30,577
+is the sine you saw in the function that we just computed.
+刚刚所输出的正弦函数
+
+46
+00:01:30,990 --> 00:01:32,281
+Let's set y2 to be
+让我们设置y2
+
+47
+00:01:32,281 --> 00:01:34,401
+equal to the cosine
+等于cos
+
+48
+00:01:34,410 --> 00:01:38,735
+of two pi, four T, like so.
+2*pi*4*t
+
+49
+00:01:38,750 --> 00:01:41,403
+And if I plot
+而如果我输入plot
+
+50
+00:01:41,403 --> 00:01:43,835
+T comma y2, what octave
+t逗号y2
+
+51
+00:01:43,835 --> 00:01:45,045
+will I do is I'll take my
+Octave将会
+
+52
+00:01:45,060 --> 00:01:46,988
+sine plot and it
+消除之前的正弦图
+
+53
+00:01:46,988 --> 00:01:48,681
+will replace with this cosine
+并且用这个余弦图来代替它
+
+54
+00:01:48,690 --> 00:01:51,322
+function and now, you know, cosine of xi of 1.
+这里纵轴cos(x)从1开始
+
+55
+00:01:51,330 --> 00:01:53,006
+Now, what if I
+如果我
+
+56
+00:01:53,010 --> 00:01:54,581
+want to have both
+要同时表示
+
+57
+00:01:54,610 --> 00:01:56,981
+the sine and the cosine plots on top of each other?
+正弦和余弦曲线
+
+58
+00:01:56,990 --> 00:01:59,702
+What I'm going to do is I'm
+我要做的就是
+
+59
+00:01:59,702 --> 00:02:01,164
+going to type plot t,y1.
+输入plot(t, y1)
+
+60
+00:02:01,164 --> 00:02:03,332
+So here's my sine function, and then
+这是我的正弦函数
+
+61
+00:02:03,332 --> 00:02:06,958
+I'm going to use the function hold on.
+我使用函数hold on
+
+62
+00:02:06,958 --> 00:02:08,908
+And what hold does it closes
+hold on函数
+
+63
+00:02:08,920 --> 00:02:10,247
+octaves to now
+的功能是将
+
+64
+00:02:10,270 --> 00:02:11,490
+figures on top of the
+新的图像绘制在
+
+65
+00:02:11,490 --> 00:02:13,772
+old one and let
+旧的之上
+
+66
+00:02:13,772 --> 00:02:15,249
+me now plot t y2.
+我现在绘制t y2
+
+67
+00:02:15,249 --> 00:02:19,812
+I'm going to plot the cosine function in a different color.
+我要以不同的颜色绘制余弦函数
+
+68
+00:02:19,850 --> 00:02:22,166
+So, let me put there
+所以我在这里输入
+
+69
+00:02:22,180 --> 00:02:24,093
+r in quotation marks there
+带引号的r
+
+70
+00:02:24,093 --> 00:02:25,339
+and instead of replacing
+我将绘制余弦函数
+
+71
+00:02:25,339 --> 00:02:26,615
+the current figure, I'll plot the
+在这之上
+
+72
+00:02:26,620 --> 00:02:28,499
+cosine function on top and
+而不是替换了现有的图
+
+73
+00:02:28,499 --> 00:02:32,915
+the r indicates the what is an event color.
+r表示所使用的颜色
+
+74
+00:02:32,915 --> 00:02:35,166
+And here additional commands - x
+再加上命令xlabel('time')
+
+75
+00:02:35,166 --> 00:02:39,157
+label times, to label the X axis, or the horizontal axis.
+来标记X轴即水平轴
+
+76
+00:02:39,160 --> 00:02:41,451
+And Y label values A,
+输入ylabel('value')
+
+77
+00:02:41,451 --> 00:02:44,688
+to label the vertical axis value,
+来标记垂直轴的值
+
+78
+00:02:44,688 --> 00:02:47,032
+and I can also
+同时我也可以
+
+79
+00:02:54,532 --> 00:02:57,616
+label my two lines
+来标记我的两条函数曲线
+
+80
+00:02:57,620 --> 00:03:01,514
+with this command: legend sine cosine
+用这个命令 legend('sin', 'cos')
+
+81
+00:03:01,514 --> 00:03:02,860
+and this puts this
+将这个
+
+82
+00:03:02,890 --> 00:03:04,125
+legend up on the upper
+图例放在右上方
+
+83
+00:03:04,125 --> 00:03:05,122
+right showing what the 2
+表示这两条曲线表示的内容
+
+84
+00:03:05,122 --> 00:03:08,285
+lines are, and finally title
+最后输入title('myplot')
+
+85
+00:03:08,290 --> 00:03:12,753
+my plot is the title at the top of this figure.
+在图像的顶部显示这幅图的标题
+
+86
+00:03:12,753 --> 00:03:13,835
+Lastly, if you want to save
+如??果你想保存
+
+87
+00:03:13,835 --> 00:03:18,197
+this figure, you type print -dpng
+这幅图像,你输入print -dpng
+
+88
+00:03:18,197 --> 00:03:20,128
+myplot
+'myplot.png'
+
+89
+00:03:20,128 --> 00:03:21,505
+.png.
+png是一个图像
+
+90
+00:03:21,505 --> 00:03:23,292
+So PNG is a graphics
+文件格式
+
+91
+00:03:23,292 --> 00:03:25,170
+file format, and if you
+如果你
+
+92
+00:03:25,170 --> 00:03:27,612
+do this it will let you save this as a file.
+这样做了 它可以让你保存为一个文件
+
+93
+00:03:27,612 --> 00:03:28,902
+If I do that,
+如果我这样做
+
+94
+00:03:28,920 --> 00:03:31,287
+let me actually change directory to,
+让我先改一下路径
+
+95
+00:03:31,320 --> 00:03:35,114
+let's see, like
+像这样
+
+96
+00:03:35,130 --> 00:03:39,180
+that, and then I will print that out.
+然后我将它打出来
+
+97
+00:03:39,230 --> 00:03:41,692
+So this will take a
+这需要一点时间
+
+98
+00:03:41,700 --> 00:03:43,869
+while depending on how
+而这取决于你的
+
+99
+00:03:43,890 --> 00:03:46,193
+your Octave configuration is setup,
+Octave的配置设置
+
+100
+00:03:46,230 --> 00:03:48,891
+may take a few seconds, but change
+可能需要几秒钟 但改变
+
+101
+00:03:48,900 --> 00:03:50,730
+directory to my desktop and Octave
+路径到我的桌面
+
+102
+00:03:50,730 --> 00:03:53,943
+is now taking a few seconds to save this.
+现在Octave需要几秒钟??的时间来保存它
+
+103
+00:03:54,750 --> 00:03:57,635
+If I now go to my desktop, Let's hide these windows.
+如果我现在去到我的桌面 先最小化这些窗口
+
+104
+00:03:57,670 --> 00:03:59,358
+Here's myplot.png
+这就是
+
+105
+00:03:59,370 --> 00:04:00,720
+which Octave has saved, and you
+Octave所保存的myplot.png
+
+106
+00:04:00,740 --> 00:04:03,481
+know, there's the figure saved as the PNG file.
+这就是保存为PNG的文件
+
+107
+00:04:03,481 --> 00:04:05,530
+Octave can save thousand other formats as well.
+Octave也可以保存为很多其他的格式
+
+108
+00:04:05,530 --> 00:04:07,468
+So, you can type help plot,
+你可以键入help plot
+
+109
+00:04:07,468 --> 00:04:09,497
+if you want to see the
+如果你想试试
+
+110
+00:04:09,510 --> 00:04:11,512
+other file formats, rather than
+其他格式的文件 而不是
+
+111
+00:04:11,530 --> 00:04:13,377
+PNG, that you can save
+PNG 你可以把图片
+
+112
+00:04:13,377 --> 00:04:15,149
+figures in.
+保存为其他格式
+
+113
+00:04:15,149 --> 00:04:16,471
+And lastly, if you want
+最后如??果你想
+
+114
+00:04:16,471 --> 00:04:18,507
+to get rid of the plot, the
+删掉这个图像
+
+115
+00:04:18,540 --> 00:04:23,867
+close command causes the figure to go away.
+命令close会让这个图像关掉
+
+116
+00:04:23,867 --> 00:04:24,963
+As I figure if I type
+如果我键入
+
+117
+00:04:24,963 --> 00:04:26,628
+close, that figure just
+close 这个图像
+
+118
+00:04:26,628 --> 00:04:30,153
+disappeared from my desktop.
+就从我的桌面消失了
+
+119
+00:04:30,640 --> 00:04:33,372
+Octave also lets you specify a figure and numbers.
+Octave也可以让你为图像标号
+
+120
+00:04:33,372 --> 00:04:36,935
+You type figure 1 plots t, y1.
+你键入figure(1); plot(t, y1);
+
+121
+00:04:36,935 --> 00:04:39,582
+That starts up
+将显示
+
+122
+00:04:39,670 --> 00:04:41,959
+first figure, and that plots t, y1.
+第一张图 绘制了变量t y1
+
+123
+00:04:41,970 --> 00:04:45,075
+And then if you want a second figure, you specify a different figure number.
+如果你想绘制第二个图 你可以指定一个不同的数字编号
+
+124
+00:04:45,075 --> 00:04:47,765
+So figure two, plot t,
+键入figure(2); plot(t, y2);
+
+125
+00:04:47,780 --> 00:04:49,924
+y2 like so, and
+正如这样
+
+126
+00:04:49,924 --> 00:04:53,084
+now on my desktop, I actually have 2 figures.
+现在我的桌面上 其实有2个图
+
+127
+00:04:53,084 --> 00:04:54,625
+So, figure 1 and figure
+图1和图2
+
+128
+00:04:54,625 --> 00:04:55,874
+2 thus 1 plotting the sine
+此时一个绘制正弦
+
+129
+00:04:55,874 --> 00:04:59,169
+function, 1 plotting the cosine function.
+函数 另一个绘制了余弦函数
+
+130
+00:04:59,170 --> 00:05:00,498
+Here's one other neat command that
+这是另一个我经常使用的命令
+
+131
+00:05:00,498 --> 00:05:02,825
+I often use, which is the subplot command.
+subplot命令
+
+132
+00:05:02,825 --> 00:05:05,401
+So, we're going to use subplot 1 2 1.
+我们要使用subplot(1,2,1)
+
+133
+00:05:05,401 --> 00:05:07,958
+What it does it sub-divides
+它将图像
+
+134
+00:05:07,958 --> 00:05:11,200
+the plot into a
+分为一个
+
+135
+00:05:11,780 --> 00:05:13,760
+one-by-two grid with the
+1*2的格子
+
+136
+00:05:13,820 --> 00:05:16,010
+first 2 parameters are, and
+也就是前两个参数
+
+137
+00:05:16,010 --> 00:05:17,607
+it starts to access the
+然后它使用
+
+138
+00:05:17,620 --> 00:05:19,335
+first element. That's
+第一个格子
+
+139
+00:05:19,340 --> 00:05:21,714
+what the final parameter 1 is, right?
+也就是最后一个参数1的意思
+
+140
+00:05:21,714 --> 00:05:23,568
+So, divide my figure into a
+所以,将我的图像分成
+
+141
+00:05:23,568 --> 00:05:24,913
+one by two grid, and I
+1*2的格子
+
+142
+00:05:24,913 --> 00:05:26,585
+want to access the first
+我现在使用
+
+143
+00:05:26,585 --> 00:05:27,948
+element right now.
+第一个格子
+
+144
+00:05:27,970 --> 00:05:30,435
+And so, if I type that
+如果我键入这个
+
+145
+00:05:30,435 --> 00:05:32,722
+in, this product, this figure, is on the left.
+那么这个图像显示在左边
+
+146
+00:05:32,760 --> 00:05:35,291
+And if I plot t,
+如果键入plot(t, y1)
+
+147
+00:05:35,350 --> 00:05:37,682
+y1, it now fills
+现在这个图
+
+148
+00:05:37,682 --> 00:05:40,462
+up this first element.
+显示在第一个格子
+
+149
+00:05:40,462 --> 00:05:42,565
+And if I I'll do subplot 122.
+如果我键入subplot(1,2,2)
+
+150
+00:05:42,565 --> 00:05:44,456
+I'm going to start to
+那么我就要
+
+151
+00:05:44,456 --> 00:05:48,724
+access the second element and plot t, y2.
+使用第二个格子 键入plot(t, y2);
+
+152
+00:05:49,270 --> 00:05:51,323
+Well, throw in y2 in
+现在y2显示在右边
+
+153
+00:05:51,323 --> 00:05:54,875
+the right hand side, or in the second element.
+也就是第二个格子
+
+154
+00:05:54,910 --> 00:05:56,114
+And last command, you can
+最后一个命令 你可以
+
+155
+00:05:56,114 --> 00:05:58,165
+also change the axis scales
+改变轴的刻度
+
+156
+00:05:58,165 --> 00:06:00,308
+and change axis these to 1.51
+比如改成
+
+157
+00:06:00,330 --> 00:06:02,892
+minus 1 1 and this
+[0.5 1 -1 1]
+
+158
+00:06:02,892 --> 00:06:05,071
+sets the x range
+也就是设置了
+
+159
+00:06:05,071 --> 00:06:07,448
+and y range for the
+右边图的x轴
+
+160
+00:06:07,448 --> 00:06:09,874
+figure on the right,
+和y轴的范围
+
+161
+00:06:09,890 --> 00:06:12,381
+and concretely, it assess the horizontal
+具体而言 它将
+
+162
+00:06:12,381 --> 00:06:13,668
+major values in the figure
+右图中的横轴
+
+163
+00:06:13,670 --> 00:06:14,856
+on the right to make sure 0.5
+的范围调整至0.5到1
+
+164
+00:06:14,856 --> 00:06:16,334
+to 1, and the vertical
+竖轴的范围为
+
+165
+00:06:16,340 --> 00:06:19,572
+axis values use the range from minus one to one.
+-1到1
+
+166
+00:06:19,572 --> 00:06:21,736
+And, you know, you don't need to memorize all these commands.
+而且 你不需要记住所有这些命令
+
+167
+00:06:21,736 --> 00:06:23,178
+If you ever need to
+如果你需要
+
+168
+00:06:23,178 --> 00:06:24,773
+change the access or you
+改变坐标轴或者
+
+169
+00:06:24,780 --> 00:06:25,703
+need to know is that, you know, there's an
+需要知道axis命令
+
+170
+00:06:25,703 --> 00:06:26,628
+access command and you can
+你可以
+
+171
+00:06:26,628 --> 00:06:28,364
+already get the details
+用Octave中
+
+172
+00:06:28,364 --> 00:06:31,590
+from the usual octave help command.
+help命令了解细节
+
+173
+00:06:31,600 --> 00:06:32,861
+Finally, just a couple last
+最后 还有几个命令
+
+174
+00:06:32,861 --> 00:06:35,449
+commands CLF clear is
+clf清除
+
+175
+00:06:35,450 --> 00:06:38,362
+a figure and here's one unique trait.
+一幅图像 这里有一个独特的特点
+
+176
+00:06:38,362 --> 00:06:39,772
+Let's set a to be equal
+让我们设置A等于
+
+177
+00:06:39,772 --> 00:06:42,076
+to a 5 by 5
+一个5×5
+
+178
+00:06:42,076 --> 00:06:43,375
+magic squares a. So, a
+magic方阵
+
+179
+00:06:43,380 --> 00:06:45,290
+is now this 5 by 5
+现在A是这个5*5
+
+180
+00:06:45,310 --> 00:06:47,581
+matrix does a neat
+的矩阵
+
+181
+00:06:47,581 --> 00:06:49,341
+trick that I sometimes use to
+我有时用一个巧妙的方法
+
+182
+00:06:49,350 --> 00:06:51,582
+visualize the matrix, which is
+来可视化矩阵
+
+183
+00:06:51,582 --> 00:06:54,792
+I can use image sc
+也就是imagesc(A)
+
+184
+00:06:54,800 --> 00:06:56,362
+of a what this will
+它将会
+
+185
+00:06:56,370 --> 00:06:58,056
+do is plot a five
+绘制一个5*5的矩阵
+
+186
+00:06:58,056 --> 00:07:03,925
+by five matrix, a five by five grid of color.
+一个5*5的彩色格图
+
+187
+00:07:03,925 --> 00:07:05,739
+where the different colors correspond to
+不同的颜色对应
+
+188
+00:07:05,739 --> 00:07:09,011
+the different values in the A matrix.
+A矩阵中的不同值
+
+189
+00:07:09,060 --> 00:07:13,262
+So concretely, I can also do color bar.
+具体地说 我还可以使用函数colorbar
+
+190
+00:07:13,630 --> 00:07:14,903
+Let me use a
+让我用一个
+
+191
+00:07:14,903 --> 00:07:16,715
+more sophisticated command, and image sc
+更复杂的命令 imagesc(A)
+
+192
+00:07:16,715 --> 00:07:19,608
+A color bar
+colorbar
+
+193
+00:07:19,608 --> 00:07:22,454
+color map gray.
+colormap gray
+
+194
+00:07:22,454 --> 00:07:24,757
+This is actually running three commands at a time.
+这实际上是在同一时间运行三个命令
+
+195
+00:07:24,760 --> 00:07:26,286
+I'm running image sc then running
+运行imagesc然后运行
+
+196
+00:07:26,286 --> 00:07:28,943
+color bar, then running color map gray.
+colorbar 然后运行colormap gray
+
+197
+00:07:28,943 --> 00:07:30,142
+And what this does, is it sets
+它生成了
+
+198
+00:07:30,160 --> 00:07:31,355
+a color map, so a
+一个颜色图像
+
+199
+00:07:31,355 --> 00:07:32,749
+gray color map, and on the
+一个灰度分布图 并在
+
+200
+00:07:32,749 --> 00:07:35,333
+right it also puts in this color bar.
+右边也加入一个颜色条
+
+201
+00:07:35,360 --> 00:07:37,525
+And so this color bar
+所以这个颜色条
+
+202
+00:07:37,550 --> 00:07:40,701
+shows what the different shades of color correspond to.
+显示不同深浅的颜色所对应的值
+
+203
+00:07:40,720 --> 00:07:42,704
+Concretely, the upper left
+具体地 左上
+
+204
+00:07:42,704 --> 00:07:44,494
+element of the A matrix
+A矩阵的元素
+
+205
+00:07:44,494 --> 00:07:46,358
+is 17, and so that corresponds
+是17 所以对应
+
+206
+00:07:46,358 --> 00:07:49,297
+to kind of a mint shade of gray.
+的是这样中等的灰度
+
+207
+00:07:49,297 --> 00:07:52,012
+Whereas in contrast the second
+而与此相反的第二个
+
+208
+00:07:52,012 --> 00:07:53,210
+element of A--sort of the
+元素 也就是
+
+209
+00:07:53,280 --> 00:07:55,640
+1 2 element of A--is 24.
+A(1,2)元素
+
+210
+00:07:55,640 --> 00:07:57,716
+Right, so it's A 1 2 is 24.
+代表的值为24
+
+211
+00:07:57,716 --> 00:07:59,683
+So that corresponds to
+它对应于
+
+212
+00:07:59,690 --> 00:08:01,343
+this square out here, which is
+这里的这个方块
+
+213
+00:08:01,360 --> 00:08:03,677
+nearly a shade of white.
+是接近白色的灰度
+
+214
+00:08:03,677 --> 00:08:05,640
+And the small value, say
+较小的值比如
+
+215
+00:08:05,690 --> 00:08:08,657
+A--what is that? A
+A多少呢
+
+216
+00:08:08,657 --> 00:08:12,260
+4 5, you know, is a value
+A(4,5)
+
+217
+00:08:12,300 --> 00:08:14,346
+3 over here that corresponds--
+为3对应着
+
+218
+00:08:14,360 --> 00:08:15,548
+you can see on my color bar
+你可以看到在我的颜色条
+
+219
+00:08:15,548 --> 00:08:16,618
+that it corresponds to a
+它对应于
+
+220
+00:08:16,618 --> 00:08:19,499
+much darker shade in this image.
+一个更暗的灰度
+
+221
+00:08:19,499 --> 00:08:21,141
+So here's another example,
+这里是另一个例子
+
+222
+00:08:21,141 --> 00:08:23,228
+I can plot a larger, you
+我可以绘制一个较大的
+
+223
+00:08:23,230 --> 00:08:24,768
+know, here's a magic 15 that
+比如magic(15)
+
+224
+00:08:24,770 --> 00:08:26,029
+gives you a 15 by 15
+给你一个15* 15
+
+225
+00:08:26,029 --> 00:08:27,675
+magic square and this
+magic方阵
+
+226
+00:08:27,680 --> 00:08:29,504
+gives me a plot of what
+这将会是一幅
+
+227
+00:08:29,504 --> 00:08:33,675
+my 15 by 15 magic squares values looks like.
+15*15的magic方阵值的图
+
+228
+00:08:33,700 --> 00:08:35,225
+And finally to wrap
+最后
+
+229
+00:08:35,225 --> 00:08:37,075
+up this video, what you've
+总结一下这段视频
+
+230
+00:08:37,075 --> 00:08:38,318
+seen me do here is
+你看到我所做的
+
+231
+00:08:38,318 --> 00:08:41,917
+use comma chaining of function calls.
+是使用逗号连接函数调用
+
+232
+00:08:41,940 --> 00:08:43,195
+Here's how you actually do this.
+这里是你如何真正做到这一点
+
+233
+00:08:43,210 --> 00:08:44,638
+If I type A equals
+如果我键入a=1
+
+234
+00:08:44,690 --> 00:08:46,613
+1, B equals 2, C equals
+b=2 c=3
+
+235
+00:08:46,613 --> 00:08:48,620
+3, and hit Enter, then
+然后按Enter键
+
+236
+00:08:48,620 --> 00:08:50,628
+this is actually carrying out
+其实这是将这
+
+237
+00:08:50,628 --> 00:08:52,039
+three commands at the same time.
+三个命令同时执行
+
+238
+00:08:52,040 --> 00:08:53,490
+Or really carrying out three
+或者是
+
+239
+00:08:53,490 --> 00:08:55,849
+commands, one after another,
+将三个命令一个接一个执行
+
+240
+00:08:55,849 --> 00:08:57,521
+and it prints out all three results.
+它将输出所有这三个结果
+
+241
+00:08:57,521 --> 00:08:58,417
+And this is a lot like
+这很像
+
+242
+00:08:58,417 --> 00:09:00,489
+A equals 1, B equals
+a=1; b=2;
+
+243
+00:09:00,489 --> 00:09:01,755
+2, C equals 3, except
+c=3;
+
+244
+00:09:01,755 --> 00:09:03,532
+that if I use semicolons instead
+如果我用分号来代替逗号
+
+245
+00:09:03,540 --> 00:09:05,854
+of a comma, it doesn't print out anything.
+没有输出出任何东西
+
+246
+00:09:05,854 --> 00:09:07,195
+So, this, you know,
+所以你知道
+
+247
+00:09:07,210 --> 00:09:08,865
+this thing here we call comma
+这里我们称之为
+
+248
+00:09:08,870 --> 00:09:12,185
+chaining of commands, or comma chaining of function calls.
+逗号连接的命令或函数调用
+
+249
+00:09:12,240 --> 00:09:13,755
+And, it's just another
+只是另一种
+
+250
+00:09:13,755 --> 00:09:15,520
+convenient way in Octave to
+Octave中更便捷的方式
+
+251
+00:09:15,520 --> 00:09:17,778
+put multiple commands like image sc
+将多条命令例如imagesc
+
+252
+00:09:17,778 --> 00:09:19,358
+color bar, colon map
+colorbar colormap
+
+253
+00:09:19,360 --> 00:09:22,919
+to put multi-commands on the same line.
+将这多条命令写在同一行中
+
+254
+00:09:22,930 --> 00:09:24,104
+So, that's it.
+所以 就是这样
+
+255
+00:09:24,104 --> 00:09:25,281
+You now know how to plot
+现在你知道如何绘制
+
+256
+00:09:25,281 --> 00:09:27,504
+different figures and octave, and
+Octave中不同的图像
+
+257
+00:09:27,504 --> 00:09:29,270
+in next video the
+在下面的视频中
+
+258
+00:09:29,280 --> 00:09:30,430
+next main piece that I want
+下一个主要内容
+
+259
+00:09:30,460 --> 00:09:31,985
+to tell you about is how to
+我将告诉你怎样在Octave中
+
+260
+00:09:31,985 --> 00:09:33,622
+write control statements like if,
+写控制语句 比如if
+
+261
+00:09:33,630 --> 00:09:35,294
+while, for statements and
+while for语句
+
+262
+00:09:35,294 --> 00:09:39,426
+octave as well as hard to define and use functions
+并且定义和使用函数
+
diff --git a/srt/5 - 5 - Control Statements_ for, while, if statements (13 min).srt b/srt/5 - 5 - Control Statements_ for, while, if statements (13 min).srt
new file mode 100644
index 00000000..2f80aa19
--- /dev/null
+++ b/srt/5 - 5 - Control Statements_ for, while, if statements (13 min).srt
@@ -0,0 +1,1730 @@
+1
+00:00:00,180 --> 00:00:01,178
+In this video, I'd like to
+在这段视频中 我想
+
+2
+00:00:01,178 --> 00:00:02,587
+tell you how to write
+告诉你怎样
+
+3
+00:00:02,600 --> 00:00:03,842
+control statements for your
+为你的 Octave 程序写控制语句
+
+4
+00:00:03,842 --> 00:00:05,672
+Octave programs, so things
+诸如
+
+5
+00:00:05,700 --> 00:00:07,280
+like "for", "while" and "if" statements
+"for" "while" "if" 这些语句
+
+6
+00:00:07,350 --> 00:00:12,176
+and also how to define and use functions.
+并且如何定义和使用方程
+
+7
+00:00:12,480 --> 00:00:13,980
+Here's my Octave window. Let
+这是我们的 Octave 窗口
+
+8
+00:00:13,980 --> 00:00:16,502
+me first show you how to use a "for" loop.
+我先告诉你如何使用 “for” 循环
+
+9
+00:00:16,502 --> 00:00:17,888
+I'm going to start by setting v
+首先 我要将 v 值设为
+
+10
+00:00:17,888 --> 00:00:18,852
+to be a 10 by
+一个10行1列
+
+11
+00:00:18,870 --> 00:00:20,808
+1 vector 0.
+的零向量
+
+12
+00:00:20,830 --> 00:00:22,209
+Now, here's I write
+现在 我要写一个 “for" 循环
+
+13
+00:00:22,240 --> 00:00:25,071
+a "for" loop for I equals 1 to 10.
+让 i 等于 1 到 10
+
+14
+00:00:25,090 --> 00:00:27,608
+That's for I equals Y colon 10.
+写出来就是 i = 1:10
+
+15
+00:00:27,608 --> 00:00:29,905
+And let's see, I'm
+让我们来看看
+
+16
+00:00:29,905 --> 00:00:31,466
+going to set V of I
+我要设 v(i) 的值
+
+17
+00:00:31,466 --> 00:00:33,214
+equals two to the
+等于 2 的 i 次方
+
+18
+00:00:33,220 --> 00:00:36,848
+power of I, and finally
+循环最后
+
+19
+00:00:36,848 --> 00:00:37,671
+end.
+记得写上“end”
+
+20
+00:00:37,671 --> 00:00:39,082
+The white space does not matter,
+这里的空格没关系
+
+21
+00:00:39,090 --> 00:00:40,538
+so I am putting the spaces
+所以我就加一些空格
+
+22
+00:00:40,538 --> 00:00:41,960
+just to make it look nicely indented,
+让缩进后的代码看起来结构更清晰
+
+23
+00:00:41,990 --> 00:00:44,385
+but you know spacing doesn't matter.
+但是你要知道这里的空格没有意义
+
+24
+00:00:44,420 --> 00:00:46,163
+But if I do this, then the
+如果按我这样做 那么
+
+25
+00:00:46,163 --> 00:00:48,626
+result is that V gets
+向量 v 的值就是
+
+26
+00:00:48,626 --> 00:00:49,420
+set to, you know, two to
+这样一个集合 2的一次方
+
+27
+00:00:49,500 --> 00:00:51,478
+the power one, two to the power two, and so on.
+2的二次方 依此类推
+
+28
+00:00:51,490 --> 00:00:52,665
+So this is syntax for I
+于是这就是我的 i 等于 1 到 10
+
+29
+00:00:52,665 --> 00:00:55,410
+equals one colon 10 that
+的语句结构
+
+30
+00:00:55,410 --> 00:00:57,429
+makes I loop through the
+让 i 遍历 1 到 10
+
+31
+00:00:57,440 --> 00:00:59,662
+values one through 10.
+的值
+
+32
+00:00:59,662 --> 00:01:00,830
+And by the way, you can also do
+另外 你还可以通过
+
+33
+00:01:00,830 --> 00:01:02,481
+this by setting your
+设置你的 indices (索引) 等于 1
+
+34
+00:01:02,481 --> 00:01:04,795
+indices equals one to
+一直到10
+
+35
+00:01:04,800 --> 00:01:07,260
+10, and so the
+来做到这一点
+
+36
+00:01:07,270 --> 00:01:09,305
+indices in the array from one to 10.
+这时 indices 就是一个从1到10的序列
+
+37
+00:01:09,305 --> 00:01:13,249
+You can also write for I equals indices.
+你也可以写 i = indices
+
+38
+00:01:15,040 --> 00:01:17,805
+And this is actually the same as if I equals one to 10.
+这实际上和我直接把 i 写到 1 到 10 是一样
+
+39
+00:01:17,820 --> 00:01:19,459
+You can do, you know, display
+你可以写 disp(i)
+
+40
+00:01:19,480 --> 00:01:23,498
+I and this would do the same thing.
+也能得到一样的结果
+
+41
+00:01:23,498 --> 00:01:24,698
+So, that is a "for" loop,
+所以 这就是一个 “for” 循环
+
+42
+00:01:24,698 --> 00:01:27,201
+if you are familiar with "break"
+如果你对 “break” 和 “continue” 语句比较熟悉
+
+43
+00:01:27,230 --> 00:01:29,375
+and "continue", there's "break" and
+Octave里也有 “break” 和 “continue” 语句
+
+44
+00:01:29,375 --> 00:01:30,809
+"continue" statements, you can
+你也可以在 Octave环境里
+
+45
+00:01:30,809 --> 00:01:32,061
+also use those inside loops
+使用那些循环语句
+
+46
+00:01:32,061 --> 00:01:33,902
+in octave, but first
+但是首先让我告诉你
+
+47
+00:01:33,902 --> 00:01:36,550
+let me show you how a while loop works.
+一个 while 循环是如何工作的
+
+48
+00:01:36,570 --> 00:01:39,088
+So, here's my vector
+这是我的 v 向量
+
+49
+00:01:39,120 --> 00:01:40,912
+V. Let's write the while loop.
+让我们写个 while 循环
+
+50
+00:01:40,920 --> 00:01:44,037
+I equals 1, while I
+i = 1 ;
+
+51
+00:01:44,037 --> 00:01:45,259
+is less than or equal to
+while i <= 5 ;
+
+52
+00:01:45,259 --> 00:01:47,662
+5, let's set
+让我们设置
+
+53
+00:01:47,662 --> 00:01:51,082
+V I equals one hundred
+v(i) 等于 100
+
+54
+00:01:51,530 --> 00:01:54,449
+and increment I by
+然后 i 加 1
+
+55
+00:01:54,449 --> 00:01:56,644
+one, end.
+结束 (end)
+
+56
+00:01:56,700 --> 00:01:58,090
+So this says what?
+所以这是什么意思呢
+
+57
+00:01:58,090 --> 00:01:59,932
+I starts off equal to
+我让 i 取值从 1 开始
+
+58
+00:01:59,970 --> 00:02:01,359
+one and then I'm going
+然后我要
+
+59
+00:02:01,380 --> 00:02:02,629
+to set V I equals one
+让 v(i) 等于 100
+
+60
+00:02:02,629 --> 00:02:04,249
+hundred and increment I by
+再让 i 递增 1
+
+61
+00:02:04,260 --> 00:02:07,666
+one until I is, you know, greater than five.
+直到 i 大于 5停止
+
+62
+00:02:07,690 --> 00:02:09,377
+And as a result of that,
+现在来看一下结果
+
+63
+00:02:09,377 --> 00:02:13,022
+whereas previously V was this powers of two vector.
+原来的向量 v 是2的这些次方
+
+64
+00:02:13,022 --> 00:02:14,573
+I've now taken the first
+我现在已经取出了
+
+65
+00:02:14,580 --> 00:02:17,225
+five elements of my vector
+向量的前五个元素
+
+66
+00:02:17,260 --> 00:02:19,618
+and overwritten them with this value one hundred.
+把他们用100覆盖掉
+
+67
+00:02:19,618 --> 00:02:22,797
+So that's a syntax for a while loop.
+这就是一个while循环的句法结构
+
+68
+00:02:23,140 --> 00:02:24,503
+Let's do another example.
+现在我们来分析另外一个例子
+
+69
+00:02:24,503 --> 00:02:26,600
+Y equals one while
+i = 1; while true,
+
+70
+00:02:26,600 --> 00:02:28,491
+true and here
+这里我将向你展示
+
+71
+00:02:28,500 --> 00:02:31,892
+I wanted to show you how to use a break statement.
+如何使用break语句
+
+72
+00:02:31,892 --> 00:02:34,040
+Let's say V I equals 999
+比方说 v(i) = 999
+
+73
+00:02:34,070 --> 00:02:37,331
+and I equals i+1
+然后让 i = i+1
+
+74
+00:02:38,110 --> 00:02:45,900
+if i equals 6 break and
+当 i 等于6的时候 break (停止循环)
+
+75
+00:02:47,910 --> 00:02:47,910
+end.
+结束 (end)
+
+76
+00:02:48,410 --> 00:02:49,425
+And this is also our first
+当然这也是我们第一次
+
+77
+00:02:49,425 --> 00:02:51,945
+use of an if statement, so
+使用一个 if 语句 所以
+
+78
+00:02:51,945 --> 00:02:53,308
+I hope the logic of this makes sense.
+我希望你们可以理解这个逻辑
+
+79
+00:02:53,308 --> 00:02:57,297
+Since I equals one and, you know, increment loop.
+让 i 等于1 然后开始下面的增量循环
+
+80
+00:02:57,340 --> 00:02:59,900
+While repeatedly set V I equals 1
+while语句重复设置 v(i) 等于1 (此处口误 应为999 译者注)
+
+81
+00:02:59,900 --> 00:03:01,527
+and increment i by 1,
+不断让i增加
+
+82
+00:03:01,527 --> 00:03:02,901
+and then when 1 i
+然后当 i 达到6
+
+83
+00:03:02,920 --> 00:03:04,451
+gets up to 6, do a
+做一个
+
+84
+00:03:04,451 --> 00:03:05,757
+break which breaks here although
+中止循环的命令
+
+85
+00:03:05,757 --> 00:03:07,284
+the while do and so, the
+尽管有while循环 语句也就此中止
+
+86
+00:03:07,284 --> 00:03:08,596
+effective is should be to take
+所以最后的效果是
+
+87
+00:03:08,596 --> 00:03:09,929
+the first five elements of this
+取出向量 v 的前5个元素
+
+88
+00:03:09,929 --> 00:03:11,748
+vector V and set them to 999.
+并且把它们设置为999
+
+89
+00:03:11,748 --> 00:03:14,832
+And yes, indeed, we're taking
+然后运行 的确如此
+
+90
+00:03:14,832 --> 00:03:18,345
+V and overwritten the first five elements with 999.
+我们用999覆盖了 v 的前五个元素
+
+91
+00:03:18,345 --> 00:03:20,172
+So, this is the
+所以 这就是
+
+92
+00:03:20,172 --> 00:03:21,974
+syntax for "if" statements, and
+if 语句和 while 语句的句法结构
+
+93
+00:03:21,974 --> 00:03:25,058
+for "while" statement, and notice the end.
+并且要注意 要有end
+
+94
+00:03:25,070 --> 00:03:27,159
+We have two ends here.
+这里是有两个 end 的
+
+95
+00:03:27,170 --> 00:03:29,719
+This ends here ends the if statement
+这里的 end 结束的是 if 语句
+
+96
+00:03:29,730 --> 00:03:33,228
+and the second end here ends the while statement.
+第二个 end 结束的是 while 语句
+
+97
+00:03:33,250 --> 00:03:35,265
+Now let me show you the more general syntax for
+现在让我告诉你使用 if-else 语句时
+
+98
+00:03:35,265 --> 00:03:37,763
+how to use an if-else statement.
+更一般的句法结构
+
+99
+00:03:37,763 --> 00:03:40,274
+So, let's see, V 1
+举个例子 v(1)
+
+100
+00:03:40,274 --> 00:03:42,776
+is equal to 999, let's
+等于999 假设我们
+
+101
+00:03:42,860 --> 00:03:46,996
+type V1 equals to 2 for this example.
+令 v(1) 等于2
+
+102
+00:03:47,020 --> 00:03:48,758
+So, let me type
+所以 让我输入
+
+103
+00:03:48,758 --> 00:03:55,050
+if V 1 equals 1 display the value as one.
+if v(1) == 1, disp('The value is one');
+
+104
+00:03:56,855 --> 00:03:58,588
+Here's how you write an else
+这里出现了一个else语句
+
+105
+00:03:58,588 --> 00:04:00,040
+statement, or rather here's an
+或者更确切地说 这里是一个
+
+106
+00:04:00,040 --> 00:04:03,853
+else if: V 1 equals
+elseif语句 elseif v(1) == 2,
+
+107
+00:04:03,853 --> 00:04:07,815
+2. This is, if in case that's true in our example, display
+这就是说 如果这种情况下命题为真
+
+108
+00:04:07,815 --> 00:04:12,268
+the value as 2, else
+执行 disp('The value is two');
+
+109
+00:04:13,650 --> 00:04:17,960
+display, the value is not one or two.
+否则(else) 执行 disp('The value is not one or two');
+
+110
+00:04:17,990 --> 00:04:21,699
+Okay, so that's a if-else
+好了 这就是一个if-else语句
+
+111
+00:04:21,700 --> 00:04:23,889
+if-else statement it ends.
+if-else语句 记得最后有end
+
+112
+00:04:23,889 --> 00:04:25,271
+And of course, here we've just
+当然了 我们刚刚设置过
+
+113
+00:04:25,271 --> 00:04:27,589
+set v 1 equals 2, so hopefully, yup,
+v(1)等于2 所以显然
+
+114
+00:04:27,610 --> 00:04:30,729
+displays that the value is 2.
+显示的是 "The value is two"
+
+115
+00:04:30,780 --> 00:04:32,844
+And finally, I don't
+最后 我觉得现在
+
+116
+00:04:32,880 --> 00:04:34,143
+think I talked about this earlier, but
+提醒一件事
+
+117
+00:04:34,143 --> 00:04:35,622
+if you ever need to exit Octave,
+如果你需要退出 Octave
+
+118
+00:04:35,622 --> 00:04:36,947
+you can type the exit command and
+你可以键入 exit 命令然后
+
+119
+00:04:36,947 --> 00:04:38,373
+you hit enter that will cause Octave
+回车就会退出 Octave
+
+120
+00:04:38,400 --> 00:04:39,981
+to quit or the 'q'--quits
+或者命令 ‘quit’
+
+121
+00:04:39,981 --> 00:04:42,428
+command also works.
+也可以
+
+122
+00:04:42,450 --> 00:04:43,857
+Finally, let's talk about
+最后 让我们来说说
+
+123
+00:04:43,857 --> 00:04:45,292
+functions and how to define
+函数 (functions)
+
+124
+00:04:45,310 --> 00:04:48,592
+them and how to use them.
+如何定义和调用函数
+
+125
+00:04:48,620 --> 00:04:49,680
+Here's my desktop, and I
+这是我的桌面
+
+126
+00:04:49,720 --> 00:04:52,078
+have predefined a file
+我在桌面上存了一个
+
+127
+00:04:52,078 --> 00:04:56,818
+or pre-saved on my desktop a file called "squarethisnumber.m".
+预先定义的文件 名为 “squarethisnumber.m”
+
+128
+00:04:56,830 --> 00:04:59,471
+This is how you define functions in Octave.
+这就是在 Octave 环境下定义的函数
+
+129
+00:04:59,480 --> 00:05:01,681
+You create a file called, you know,
+你需要创建一个文件
+
+130
+00:05:01,681 --> 00:05:03,958
+with your function name and then ending in .m,
+用你的函数名来命名 然后以 .m 的后缀结尾
+
+131
+00:05:03,960 --> 00:05:05,694
+and when Octave finds
+当 Octave 发现这文件
+
+132
+00:05:05,730 --> 00:05:07,643
+this file, it knows that this
+它知道应该在什么位置
+
+133
+00:05:07,680 --> 00:05:12,322
+where it should look for the definition of the function "squarethisnumber.m".
+寻找 squareThisNumber.m 这个函数的定义
+
+134
+00:05:12,340 --> 00:05:14,076
+Let's open up this file.
+让我们打开这个文件
+
+135
+00:05:14,076 --> 00:05:15,717
+Notice that I'm using the
+请注意 我使用的是
+
+136
+00:05:15,717 --> 00:05:19,352
+Microsoft program Wordpad to open up this file.
+微软的写字板程序来打开这个文件
+
+137
+00:05:19,352 --> 00:05:20,250
+I just want to encourage you, if
+我只是想建议你
+
+138
+00:05:20,250 --> 00:05:23,379
+your using Microsoft Windows, to
+如果你也使用微软的 Windows 系统
+
+139
+00:05:23,379 --> 00:05:25,075
+use Wordpad rather than
+那么可以使用写字板程序
+
+140
+00:05:25,110 --> 00:05:27,477
+Notepad to open up these
+而不是记事本 来打开这些文件
+
+141
+00:05:27,490 --> 00:05:28,557
+files, if you have a
+如果你有别的什么
+
+142
+00:05:28,557 --> 00:05:29,938
+different text editor that's fine
+文本编辑器 那也可以
+
+143
+00:05:29,938 --> 00:05:33,325
+too, but notepad sometimes messes up the spacing.
+但记事本有时会把代码的间距弄得很乱
+
+144
+00:05:33,350 --> 00:05:34,775
+If you only have Notepad, that should
+如果你只有记事本程序
+
+145
+00:05:34,800 --> 00:05:36,312
+work too, that could work
+那也能用
+
+146
+00:05:36,312 --> 00:05:37,779
+too, but if you
+但最好是
+
+147
+00:05:37,779 --> 00:05:39,354
+have Wordpad as well, I
+如果你有写字板的话
+
+148
+00:05:39,354 --> 00:05:40,609
+would rather use that or some
+我建议你用写字板
+
+149
+00:05:40,610 --> 00:05:45,053
+other text editor, if you have a different text editor for editing your functions.
+或者其他可以编辑函数的文本编辑器
+
+150
+00:05:45,060 --> 00:05:47,155
+So, here's how you define the function in Octave.
+现在我们来说如何在 Octave 里定义函数
+
+151
+00:05:47,155 --> 00:05:49,816
+Let me just zoom in a little bit.
+我们先来放大一点
+
+152
+00:05:49,816 --> 00:05:52,516
+And this file has just three lines in it.
+这个文件只有三行
+
+153
+00:05:52,516 --> 00:05:54,440
+The first line says function Y equals square root
+第一行写着 function y = squareThisNumber(x)
+
+154
+00:05:54,440 --> 00:05:56,448
+number of X, this tells
+这就告诉 Octave
+
+155
+00:05:56,448 --> 00:05:57,705
+Octave that I'm gonna return
+我想返回一个 y 值
+
+156
+00:05:57,705 --> 00:06:00,025
+the value Y, I'm gonna
+我想返回一个值
+
+157
+00:06:00,025 --> 00:06:01,315
+return one value and that
+并且返回的这个值
+
+158
+00:06:01,315 --> 00:06:02,375
+the value is going to
+将被存放于
+
+159
+00:06:02,375 --> 00:06:04,443
+be saved in the variable Y
+变量 y 里
+
+160
+00:06:04,443 --> 00:06:06,003
+and moreover, it tells Octave
+另外 它告诉了 Octave
+
+161
+00:06:06,003 --> 00:06:08,068
+that this function has one argument,
+这个函数有一个参数
+
+162
+00:06:08,070 --> 00:06:10,408
+which is the argument X,
+就是参数 x
+
+163
+00:06:10,420 --> 00:06:11,846
+and the way the function
+还有定义的函数体
+
+164
+00:06:11,846 --> 00:06:15,156
+body is defined, if Y equals X squared.
+也就是 y 等于 x 的平方
+
+165
+00:06:15,180 --> 00:06:16,553
+So, let's try to call
+现在让我们尝试调用这个函数
+
+166
+00:06:16,553 --> 00:06:19,071
+this function "square", this number
+SquareThisNumber(5)
+
+167
+00:06:19,071 --> 00:06:21,854
+5, and this actually
+这实际上
+
+168
+00:06:21,854 --> 00:06:23,115
+isn't going to work, and
+是行不通的
+
+169
+00:06:23,115 --> 00:06:25,693
+Octave says square this number it's undefined.
+Octave 说这个方程未被定义
+
+170
+00:06:25,693 --> 00:06:28,902
+That's because Octave doesn't know where to find this file.
+这是因为 Octave 不知道在哪里找这个文件
+
+171
+00:06:28,902 --> 00:06:30,682
+So as usual, let's use PWD,
+所以像之前一样 我们??使用 pwd
+
+172
+00:06:30,690 --> 00:06:32,592
+or not in my directory,
+现在不在我的目录下
+
+173
+00:06:32,592 --> 00:06:36,151
+so let's see this c:\users\ang\desktop.
+因此我们把路径设为 "C:\User\ang\desktop"
+
+174
+00:06:36,151 --> 00:06:39,888
+That's where my desktop is.
+这就是我的桌面的路径
+
+175
+00:06:39,888 --> 00:06:41,276
+Oops, a little typo there.
+噢 打错了
+
+176
+00:06:41,276 --> 00:06:42,848
+Users ANG desktop
+应该是 "Users"
+
+177
+00:06:42,848 --> 00:06:44,157
+and if I now type square
+现在如果我
+
+178
+00:06:44,157 --> 00:06:46,728
+root number 5, it returns the
+键入SquareThisNumber(5)
+
+179
+00:06:46,728 --> 00:06:48,505
+answer 25.
+返回值是25
+
+180
+00:06:48,505 --> 00:06:50,347
+As kind of an advanced feature, this
+还有一种更高级的功能
+
+181
+00:06:50,347 --> 00:06:51,972
+is only for those of you
+这只是对那些知道
+
+182
+00:06:51,972 --> 00:06:54,596
+that know what the term search path means.
+“search path (搜索路径)” 这个术语的人使用的
+
+183
+00:06:54,596 --> 00:06:55,945
+But so if you
+所以如果你
+
+184
+00:06:55,945 --> 00:06:57,497
+want to modify the Octave
+想要修改 Octave
+
+185
+00:06:57,497 --> 00:06:58,863
+search path and you
+的搜索路径
+
+186
+00:06:58,863 --> 00:06:59,866
+could, you just think of
+你可以把下面这部分
+
+187
+00:06:59,866 --> 00:07:01,827
+this next part as advanced
+作为一个进阶知识
+
+188
+00:07:01,827 --> 00:07:03,292
+or optional material.
+或者选学材料
+
+189
+00:07:03,292 --> 00:07:04,214
+Only for those who are either
+仅适用于那些
+
+190
+00:07:04,214 --> 00:07:05,484
+familiar with the concepts of
+熟悉编程语言中
+
+191
+00:07:05,484 --> 00:07:07,642
+search paths and permit languages,
+搜索路径概念的同学
+
+192
+00:07:07,650 --> 00:07:08,962
+but you can use the
+你可以使用
+
+193
+00:07:08,962 --> 00:07:11,875
+term addpath, safety colon,
+addpath 命令添加路径
+
+194
+00:07:11,880 --> 00:07:16,241
+slash users/ANG/desktop to
+添加路径 “C:\Users\ang\desktop”
+
+195
+00:07:16,241 --> 00:07:17,972
+add that directory to the
+将该目录添加到
+
+196
+00:07:17,972 --> 00:07:19,744
+Octave search path so that
+Octave 的搜索路径
+
+197
+00:07:19,744 --> 00:07:21,065
+even if you know, go to
+这样即使你跑到
+
+198
+00:07:21,065 --> 00:07:22,611
+some other directory I can
+其他路径底下
+
+199
+00:07:22,611 --> 00:07:24,510
+still, Octave still knows
+Octave依然知道
+
+200
+00:07:24,510 --> 00:07:26,005
+to look in the users ANG
+会在 Users\ang\desktop
+
+201
+00:07:26,005 --> 00:07:29,214
+desktop directory for functions
+目录下寻找函数
+
+202
+00:07:29,214 --> 00:07:30,521
+so that even though I'm in
+这样 即使我现在
+
+203
+00:07:30,521 --> 00:07:31,868
+a different directory now, it still
+在不同的目录下 它仍然
+
+204
+00:07:31,868 --> 00:07:35,297
+knows where to find the square this number function.
+知道在哪里可以找到 “SquareThisNumber” 这个函数
+
+205
+00:07:35,297 --> 00:07:35,935
+Okay?
+OK
+
+206
+00:07:35,935 --> 00:07:37,407
+But if you're not familiar
+但是 如果你不熟悉
+
+207
+00:07:37,407 --> 00:07:39,184
+with the concept of search path, don't worry
+搜索路径的概念
+
+208
+00:07:39,184 --> 00:07:40,068
+about it.
+不用担心
+
+209
+00:07:40,068 --> 00:07:40,889
+Just make sure as you use
+只要确保
+
+210
+00:07:40,889 --> 00:07:42,053
+the CD command to go to
+在执行函数之前 先用 cd 命令
+
+211
+00:07:42,053 --> 00:07:43,926
+the directory of your function before
+设置到你函数所在的目录下
+
+212
+00:07:43,940 --> 00:07:47,441
+you run it and that actually works just fine.
+实际上也是一样的效果
+
+213
+00:07:47,441 --> 00:07:49,587
+One concept that Octave has
+Octave 还有一个
+
+214
+00:07:49,600 --> 00:07:51,058
+that many other programming
+其他许多编程语言都没有的概念
+
+215
+00:07:51,058 --> 00:07:52,969
+languages don't is that it
+那就是它可以
+
+216
+00:07:52,969 --> 00:07:54,909
+can also let you define
+允许你定义一个函数
+
+217
+00:07:54,909 --> 00:07:58,873
+functions that return multiple values or multiple arguments.
+使得返回值是多个值或多个参数
+
+218
+00:07:58,873 --> 00:08:00,889
+So here's an example of that.
+这里就是一个例子
+
+219
+00:08:00,889 --> 00:08:02,931
+Define the function called square
+定义一个函数叫
+
+220
+00:08:02,931 --> 00:08:04,964
+and cube this number X
+“SquareAndCubeThisNumber(x)” (x的平方以及x的立方)
+
+221
+00:08:04,964 --> 00:08:06,644
+and what this says is this
+这说的就是
+
+222
+00:08:06,660 --> 00:08:08,547
+function returns 2 values, y1 and y2.
+函数返回值是两个 y1 和 y2
+
+223
+00:08:08,547 --> 00:08:09,955
+When I set down, this
+接下来就是
+
+224
+00:08:09,960 --> 00:08:13,603
+follows, y1 is squared, y2 is execute.
+y1是被平方后的数 y2是被立方后的结果
+
+225
+00:08:13,603 --> 00:08:16,972
+And what this does is this really returns 2 numbers.
+这就是说 函数会真的返回2个值
+
+226
+00:08:16,980 --> 00:08:18,855
+So, some of you depending
+所以 有些同学可能会根据
+
+227
+00:08:18,855 --> 00:08:20,195
+on what programming language you use,
+你使用的编程语言
+
+228
+00:08:20,195 --> 00:08:22,931
+if you're familiar with, you know, CC++ your offer.
+比如你们可能熟悉的C或C++
+
+229
+00:08:22,940 --> 00:08:26,051
+Often, we think of the function as return in just one value.
+通常情况下 认为作为函数返回值只能是一个值
+
+230
+00:08:26,051 --> 00:08:27,847
+But just so the syntax in Octave
+但 Octave 的语法结构就不一样
+
+231
+00:08:27,847 --> 00:08:31,679
+that should return multiple values.
+可以返回多个值
+
+232
+00:08:32,430 --> 00:08:34,087
+Now back in the Octave window. If
+现在回到 Octave 窗口
+
+233
+00:08:34,087 --> 00:08:37,914
+I type, you know, a, b equals
+如果我键入
+
+234
+00:08:37,914 --> 00:08:41,263
+square and cube this
+ [a,b] = SquareAndCubeThisNumber(5)
+
+235
+00:08:41,263 --> 00:08:44,599
+number 5 then
+然后
+
+236
+00:08:44,610 --> 00:08:46,338
+a is now equal to
+a 就等于25
+
+237
+00:08:46,338 --> 00:08:47,778
+25 and b is equal to
+b 就等于
+
+238
+00:08:47,778 --> 00:08:49,729
+the cube of 5 equal to 125.
+5的立方 125
+
+239
+00:08:49,729 --> 00:08:51,645
+So, this is often
+所以说如果你需要定义一个函数
+
+240
+00:08:51,670 --> 00:08:53,010
+convenient if you needed to define
+并且返回多个值
+
+241
+00:08:53,010 --> 00:08:56,447
+a function that returns multiple values.
+这一点常常会带来很多方便
+
+242
+00:08:56,447 --> 00:08:57,480
+Finally, I'm going to show
+最后 我来给大家演示一下
+
+243
+00:08:57,480 --> 00:09:01,123
+you just one more sophisticated example of a function.
+一个更复杂一点的函数的例子
+
+244
+00:09:01,130 --> 00:09:02,361
+Let's say I have a data set
+比方说 我有一个数据集
+
+245
+00:09:02,370 --> 00:09:04,400
+that looks like this, with data points at 1, 1, 2, 2, 3, 3.
+像这样 数据点为[1,1], [2,2], [3,3]
+
+246
+00:09:04,430 --> 00:09:07,636
+And what I'd like
+我想做的事是
+
+247
+00:09:07,636 --> 00:09:09,113
+to do is to define an
+定义一个 Octave 函数来计算代价函数 J(θ)
+
+248
+00:09:09,113 --> 00:09:10,798
+octave function to compute the cost
+就是计算
+
+249
+00:09:10,830 --> 00:09:14,341
+function J of theta for different values of theta.
+不同 θ 值所对应的代价函数值 J
+
+250
+00:09:14,360 --> 00:09:16,157
+First let's put the data into octave.
+首先让我们把数据放到 Octave 里
+
+251
+00:09:16,160 --> 00:09:17,694
+So I set my design
+我把我的矩阵设置为
+
+252
+00:09:17,700 --> 00:09:20,998
+matrix to be 1,1,1,2,1,3.
+X = [1 1; 1 2; 1 3];
+
+253
+00:09:21,010 --> 00:09:24,043
+So, this is my design
+这就是我的设计矩阵 X
+
+254
+00:09:24,050 --> 00:09:26,073
+matrix x with x0, the
+第一列表示x0项
+
+255
+00:09:26,073 --> 00:09:27,428
+first column being the said
+矩阵的第一列
+
+256
+00:09:27,428 --> 00:09:28,746
+term and the second term being
+第二列表示
+
+257
+00:09:28,770 --> 00:09:32,375
+you know, my the x-values of my three training examples.
+我的三个训练样本的 x 值
+
+258
+00:09:32,375 --> 00:09:33,594
+And let me set
+现在我再来
+
+259
+00:09:33,594 --> 00:09:35,488
+y to be 1-2-3 as
+设置 y 值为 [1; 2; 3]
+
+260
+00:09:35,488 --> 00:09:38,793
+follows, which were the y axis values.
+就像这样 是y轴对应值
+
+261
+00:09:38,810 --> 00:09:40,431
+So let's say theta
+现在我们设定 theta
+
+262
+00:09:40,431 --> 00:09:43,714
+is equal to 0 semicolon 1.
+为 [0;1]
+
+263
+00:09:43,730 --> 00:09:45,652
+Here at my desktop, I've
+现在我的桌面上
+
+264
+00:09:45,660 --> 00:09:47,483
+predefined does cost function
+已经有我预定义的代价函数 J
+
+265
+00:09:47,490 --> 00:09:49,008
+j and if I
+如果我打开函数
+
+266
+00:09:49,010 --> 00:09:52,019
+bring up the definition of that function it looks as follows.
+函数的定义应该是下面这样的
+
+267
+00:09:52,019 --> 00:09:53,579
+So function j equals cost function
+所以 函数J 就写成
+
+268
+00:09:53,580 --> 00:09:55,192
+j equals x y
+J = costFunctionJ(X, y, theta)
+
+269
+00:09:55,192 --> 00:09:57,151
+theta, some commons, specifying
+这里有一些注释
+
+270
+00:09:57,151 --> 00:09:59,546
+the inputs and then
+主要用于解释输入变量
+
+271
+00:09:59,560 --> 00:10:01,383
+vary few steps set m
+接下来几步
+
+272
+00:10:01,383 --> 00:10:02,995
+to be the number trading examples
+设定 m 为训练样本的数量
+
+273
+00:10:03,020 --> 00:10:05,495
+thus the number of rows in x.
+也就是 X 的行数
+
+274
+00:10:05,510 --> 00:10:07,596
+Compute the predictions, predictions equals
+计算预测值 predictions
+
+275
+00:10:07,596 --> 00:10:10,137
+x times theta and so
+预测值等于 X 乘以 theta
+
+276
+00:10:10,170 --> 00:10:11,670
+this is a common that's wrapped
+这里是注释行
+
+277
+00:10:11,710 --> 00:10:14,693
+around, so this is probably the preceding comment line.
+是上一个注释行拐过来的部分
+
+278
+00:10:14,720 --> 00:10:16,823
+Computer script errors by, you know, taking
+下面就是计算平方误差 公式就是
+
+279
+00:10:16,823 --> 00:10:18,637
+the difference between your predictions and
+预测值减去 y 值
+
+280
+00:10:18,640 --> 00:10:20,265
+the y values and taking the
+然后取出来每一项进行平方
+
+281
+00:10:20,265 --> 00:10:22,126
+element of y squaring and then
+最后就可以
+
+282
+00:10:22,140 --> 00:10:24,376
+finally computing the cost
+计算代价函数 J
+
+283
+00:10:24,376 --> 00:10:26,128
+function J. And Octave knows
+并且 Octave 知道
+
+284
+00:10:26,128 --> 00:10:27,439
+that J is a value I
+J 是一个我想返回的值
+
+285
+00:10:27,439 --> 00:10:31,383
+want to return because J appeared here in the function definition.
+因为 J 出现在了我函数的定义里
+
+286
+00:10:31,420 --> 00:10:34,127
+Feel free by the way to pause
+另外 你可以随时
+
+287
+00:10:34,170 --> 00:10:35,292
+this video if you want
+暂停一下视频
+
+288
+00:10:35,292 --> 00:10:36,712
+to look at this function
+如果你想
+
+289
+00:10:36,712 --> 00:10:38,820
+definition for longer and
+仔细看一下这个函数的定义
+
+290
+00:10:38,820 --> 00:10:44,031
+kind of make sure that you understand the different steps.
+确保你明白了定义中的每一步
+
+291
+00:10:44,031 --> 00:10:45,184
+But when I run it in
+现在当我
+
+292
+00:10:45,184 --> 00:10:46,630
+Octave, I run j equals
+在 Octave 里运行时
+
+293
+00:10:46,630 --> 00:10:51,197
+cost function j x y theta.
+我键入 j = costFunctionJ(x, y, theta)
+
+294
+00:10:51,197 --> 00:10:55,142
+It computes. Oops, made a typo there.
+然后他就开始计算 噢 又打错了
+
+295
+00:10:55,142 --> 00:10:57,018
+It should have been capital X. It
+这里应该是大写 X
+
+296
+00:10:57,018 --> 00:11:00,472
+computes J equals 0 because
+它就计算出 j 等于0
+
+297
+00:11:00,510 --> 00:11:03,367
+if my data set was,
+这是因为 如果我的数据集
+
+298
+00:11:03,367 --> 00:11:06,963
+you know, 123, 123 then setting, theta 0
+x 为 [1;2;3] y 也为 [1;2;3] 然后设置 θ0 等于0
+
+299
+00:11:06,980 --> 00:11:08,741
+equals 0, theta 1 equals
+θ1 等于1
+
+300
+00:11:08,770 --> 00:11:11,259
+1, this gives me exactly the
+这给了我恰好45度的斜线
+
+301
+00:11:11,259 --> 00:11:15,559
+45-degree line that fits my data set perfectly.
+这条线是可以完美拟合我的数据集的
+
+302
+00:11:15,600 --> 00:11:16,887
+Whereas in contrast if I set
+而相反地 如果我设置
+
+303
+00:11:16,887 --> 00:11:19,828
+theta equals say 0, 0,
+theta 等于[0; 0]
+
+304
+00:11:19,830 --> 00:11:22,524
+then this hypothesis is
+那么这个假设就是
+
+305
+00:11:22,540 --> 00:11:24,050
+predicting zeroes on everything
+0是所有的预测值
+
+306
+00:11:24,050 --> 00:11:25,803
+the same, theta 0 equals 0,
+和刚才一样 设置θ0 = 0
+
+307
+00:11:25,810 --> 00:11:27,139
+theta 1 equals 0 and
+θ1 也等于0
+
+308
+00:11:27,139 --> 00:11:29,345
+I compute the cost function
+然后我计算的代价函数
+
+309
+00:11:29,370 --> 00:11:31,830
+then it's 2.333 and that's
+结果是2.333
+
+310
+00:11:31,830 --> 00:11:35,495
+actually equal to 1 squared,
+实际上 他就等于1的平方
+
+311
+00:11:35,520 --> 00:11:36,745
+which is my squared error on
+ 也就是第一个样本的平方误差
+
+312
+00:11:36,745 --> 00:11:39,789
+the first example, plus 2 squared,
+加上2的平方
+
+313
+00:11:39,800 --> 00:11:42,377
+plus 3 squared and then
+加上3的平方
+
+314
+00:11:42,440 --> 00:11:45,288
+divided by 2m, which is
+然后除以2m
+
+315
+00:11:45,288 --> 00:11:47,091
+2 times number of training examples,
+也就是训练样本数的两倍
+
+316
+00:11:47,091 --> 00:11:50,643
+which is indeed 2.33 and
+这就是2.33
+
+317
+00:11:50,643 --> 00:11:53,289
+so, that sanity checks that
+因此这也反过来验证了
+
+318
+00:11:53,330 --> 00:11:54,909
+this function here is, you
+我们这里的函数
+
+319
+00:11:54,909 --> 00:11:56,302
+know, computing the correct cost
+计算出了正确的代价函数
+
+320
+00:11:56,302 --> 00:11:58,212
+function and these are the couple examples
+这些就是我们
+
+321
+00:11:58,250 --> 00:12:00,222
+we tried out on our
+用简单的训练样本
+
+322
+00:12:00,222 --> 00:12:03,433
+simple training example.
+尝试的几次试验
+
+323
+00:12:03,490 --> 00:12:04,914
+And so that sanity tracks
+这也可以作为我们对
+
+324
+00:12:04,960 --> 00:12:08,689
+that the cost function J,
+定义的代价函数 J
+
+325
+00:12:08,720 --> 00:12:10,202
+as defined here, that it
+进行了完整性检查
+
+326
+00:12:10,230 --> 00:12:12,992
+is indeed, you know, seeming to compute
+确实是可以计算出正确的代价函数的
+
+327
+00:12:12,992 --> 00:12:14,908
+the correct cost function, at least
+至少基于这里的 X
+
+328
+00:12:14,920 --> 00:12:17,424
+on our simple training set
+和 y 是成立的
+
+329
+00:12:17,430 --> 00:12:18,835
+that we had here with X
+ 也就是我们
+
+330
+00:12:18,835 --> 00:12:20,823
+and Y being this
+这几个简单的训练集
+
+331
+00:12:20,823 --> 00:12:25,189
+simple training example that we solved.
+至少是成立的
+
+332
+00:12:25,230 --> 00:12:26,285
+So, now you know how
+好啦 现在你知道
+
+333
+00:12:26,285 --> 00:12:28,171
+to right control statements like for loops,
+如何在 Octave 环境下写出正确的控制语句
+
+334
+00:12:28,171 --> 00:12:29,838
+while loops and if statements
+比如 for 循环、while 循环和 if 语句
+
+335
+00:12:29,838 --> 00:12:33,197
+in octave as well as how to define and use functions.
+以及如何定义和使用函数
+
+336
+00:12:33,197 --> 00:12:34,530
+In the next video, I'm
+在接下来的视频中
+
+337
+00:12:34,530 --> 00:12:36,123
+going to just very quickly
+我会非常快的
+
+338
+00:12:36,123 --> 00:12:38,144
+step you through the logistics
+介绍一下
+
+339
+00:12:38,144 --> 00:12:39,873
+of working on and
+如何在这门课里
+
+340
+00:12:39,873 --> 00:12:41,664
+submitting problem sets for
+完成和提交作业
+
+341
+00:12:41,664 --> 00:12:45,212
+this class and how to use our submission system.
+如何使用我们的提交系统
+
+342
+00:12:45,230 --> 00:12:46,794
+And finally, after that, in
+在此之后
+
+343
+00:12:46,794 --> 00:12:48,856
+the final octave tutorial video,
+在最后的 Octave 教程视频里
+
+344
+00:12:48,856 --> 00:12:51,400
+I wanna tell you about vectorization, which
+我会讲解一下向量化
+
+345
+00:12:51,400 --> 00:12:52,938
+is an idea for how to
+这是一种可以使你的
+
+346
+00:12:52,938 --> 00:12:56,126
+make your octave programs run much fast.
+Octave 程序运行非常快的思想(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
diff --git a/srt/5 - 6 - Vectorization (14 min).srt b/srt/5 - 6 - Vectorization (14 min).srt
new file mode 100644
index 00000000..25b26305
--- /dev/null
+++ b/srt/5 - 6 - Vectorization (14 min).srt
@@ -0,0 +1,1876 @@
+1
+00:00:00,280 --> 00:00:04,479
+In this video, I'd like to tell you about the idea of vectorization.
+在这段视频中 我将介绍有关向量化的内容
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:04,480 --> 00:00:06,471
+So, whether you're using Octave
+无论你是用Octave
+
+3
+00:00:06,471 --> 00:00:08,277
+or a similar language like MATLAB
+还是别的语言 比如MATLAB
+
+4
+00:00:08,277 --> 00:00:09,604
+or whether you're using Python
+或者你正在用Python
+
+5
+00:00:09,604 --> 00:00:12,520
+and NumPy or Java CC++.
+NumPy 或 Java C C++
+
+6
+00:00:12,520 --> 00:00:14,850
+All of these languages have either
+所有这些语言都具有
+
+7
+00:00:14,850 --> 00:00:16,708
+built into them or have
+各种线性代数库
+
+8
+00:00:16,720 --> 00:00:19,439
+readily and easily accessible, different
+这些库文件都是内置的
+
+9
+00:00:19,439 --> 00:00:21,806
+numerical linear algebra libraries.
+容易阅读和获取
+
+10
+00:00:21,820 --> 00:00:23,335
+They're usually very well written,
+他们通常写得很好
+
+11
+00:00:23,335 --> 00:00:25,695
+highly optimized, often so that developed by
+已经经过高度优化
+
+12
+00:00:25,695 --> 00:00:29,181
+people that, you know, have PhDs in numerical computing or
+通常是数值计算方面的博士
+
+13
+00:00:29,181 --> 00:00:32,075
+they are really specializing numerical computing.
+或者专业人士开发的
+
+14
+00:00:32,075 --> 00:00:33,944
+And when you're implementing machine
+而当你实现机器学习算法时
+
+15
+00:00:33,960 --> 00:00:35,904
+learning algorithms, if you're able
+如果你能
+
+16
+00:00:35,930 --> 00:00:37,797
+to take advantage of these
+好好利用这些
+
+17
+00:00:37,810 --> 00:00:39,296
+linear algebra libraries or these
+线性代数库或者说
+
+18
+00:00:39,310 --> 00:00:41,600
+numerical linear algebra libraries and
+数值线性代数库
+
+19
+00:00:41,620 --> 00:00:43,387
+mix the routine calls to them
+并联合调用它们
+
+20
+00:00:43,387 --> 00:00:45,172
+rather than sort of right call
+而不是自己去做那些
+
+21
+00:00:45,180 --> 00:00:48,029
+yourself to do things that these libraries could be doing.
+函数库可以做的事情
+
+22
+00:00:48,040 --> 00:00:49,612
+If you do that then
+如果是这样的话 那么
+
+23
+00:00:49,612 --> 00:00:51,872
+often you get that "first is more efficient".
+通常你会发现 首先 这样更有效
+
+24
+00:00:51,880 --> 00:00:53,179
+So, just run more quickly and
+也就是说运行速度更快
+
+25
+00:00:53,179 --> 00:00:54,891
+take better advantage of
+并且更好地利用
+
+26
+00:00:54,891 --> 00:00:56,631
+any parallel hardware your computer
+你的计算机里可能有的一些并行硬件系统
+
+27
+00:00:56,631 --> 00:00:58,254
+may have and so on.
+等等
+
+28
+00:00:58,270 --> 00:01:00,533
+And second, it also means
+第二 这也意味着
+
+29
+00:01:00,540 --> 00:01:03,075
+that you end up with less code that you need to write.
+你可以用更少的代码来实现你需要的功能
+
+30
+00:01:03,075 --> 00:01:04,962
+So have a simpler implementation
+因此 实现的方式更简单
+
+31
+00:01:04,962 --> 00:01:08,532
+that is, therefore, maybe also more likely to be bug free.
+代码出现问题的有可能性也就越小
+
+32
+00:01:08,550 --> 00:01:10,534
+And as a concrete example.
+举个具体的例子
+
+33
+00:01:10,570 --> 00:01:12,726
+Rather than writing code
+与其自己写代码
+
+34
+00:01:12,726 --> 00:01:15,061
+yourself to multiply matrices, if
+做矩阵乘法
+
+35
+00:01:15,061 --> 00:01:16,300
+you let Octave do it by
+如果你只在Octave中
+
+36
+00:01:16,300 --> 00:01:18,145
+typing a times b,
+输入 a乘以b
+
+37
+00:01:18,145 --> 00:01:19,833
+that will use a very efficient
+就是一个非常有效的
+
+38
+00:01:19,833 --> 00:01:22,318
+routine to multiply the 2 matrices.
+两个矩阵相乘的程序
+
+39
+00:01:22,340 --> 00:01:23,985
+And there's a bunch of examples like
+有很多例子可以说明
+
+40
+00:01:24,010 --> 00:01:27,220
+these where you use appropriate vectorized implementations.
+如果你用合适的向量化方法来实现
+
+41
+00:01:27,220 --> 00:01:30,062
+You get much simpler code, and much more efficient code.
+你就会有一个简单得多 也有效得多的代码
+
+42
+00:01:30,280 --> 00:01:33,071
+Let's look at some examples.
+让我们来看一些例子
+
+43
+00:01:33,071 --> 00:01:34,937
+Here's a usual hypothesis of linear
+这是一个常见的线性回归假设函数
+
+44
+00:01:34,937 --> 00:01:36,415
+regression and if you
+如果
+
+45
+00:01:36,415 --> 00:01:37,348
+want to compute H of
+你想要计算 h(x)
+
+46
+00:01:37,348 --> 00:01:40,032
+X, notice that there is a sum on the right.
+注意到右边是求和
+
+47
+00:01:40,032 --> 00:01:41,130
+And so one thing you could
+那么你可以
+
+48
+00:01:41,130 --> 00:01:42,775
+do is compute the sum
+自己计算
+
+49
+00:01:42,775 --> 00:01:46,611
+from J equals 0 to J equals N yourself.
+j =0 到 j = n 的和
+
+50
+00:01:46,620 --> 00:01:48,000
+Another way to think of this
+但换另一种方式来想想
+
+51
+00:01:48,000 --> 00:01:49,210
+is to think of h
+是把 h 看作
+
+52
+00:01:49,210 --> 00:01:52,029
+of x as theta transpose x
+θ 转置乘以 x
+
+53
+00:01:52,029 --> 00:01:53,262
+and what you can do is
+那么
+
+54
+00:01:53,262 --> 00:01:55,654
+think of this as you know, computing this
+你就可以写成
+
+55
+00:01:55,660 --> 00:01:57,823
+in a product between 2 vectors
+两个向量的内积
+
+56
+00:01:57,840 --> 00:02:00,135
+where theta is, you know, your
+其中 θ 就是
+
+57
+00:02:00,135 --> 00:02:01,784
+vector say theta 0, theta 1,
+θ0 θ1
+
+58
+00:02:01,800 --> 00:02:04,812
+theta 2 if you have 2 features.
+θ2 如果你有两个特征量
+
+59
+00:02:04,812 --> 00:02:06,410
+If n equals 2 and if
+如果 n 等于2 并且如果
+
+60
+00:02:06,450 --> 00:02:08,133
+you think of x as this
+你把 x 看作
+
+61
+00:02:08,133 --> 00:02:11,810
+vector, x0, x1, x2
+x0 x1 x2
+
+62
+00:02:11,884 --> 00:02:13,952
+and these 2 views can
+这两种思考角度
+
+63
+00:02:13,952 --> 00:02:17,539
+give you 2 different implementations.
+会给你两种不同的实现方式
+
+64
+00:02:17,560 --> 00:02:18,909
+Here's what I mean.
+比如说
+
+65
+00:02:18,909 --> 00:02:21,012
+Here's an unvectorized implementation for
+这是未向量化的代码实现方式
+
+66
+00:02:21,040 --> 00:02:22,454
+how to compute h of
+计算 h(x) 是未向量化的
+
+67
+00:02:22,454 --> 00:02:26,120
+x and by unvectorized I mean, without vectorization.
+我的意思是 没有被向量化
+
+68
+00:02:26,130 --> 00:02:29,479
+We might first initialize, you know, prediction to be 0.0.
+我们可能首先要初始化变量 prediction 的值为0.0
+
+69
+00:02:29,479 --> 00:02:32,383
+This is going to eventually, the
+而这个
+
+70
+00:02:32,383 --> 00:02:34,287
+prediction is going to be
+变量 prediction 的最终结果就是
+
+71
+00:02:34,300 --> 00:02:36,090
+h of x and then
+h(x) 然后
+
+72
+00:02:36,090 --> 00:02:37,258
+I'm going to have a for loop for
+我要用一个 for 循环
+
+73
+00:02:37,270 --> 00:02:38,354
+j equals one through n+1
+j 取值 0 到 n+1
+
+74
+00:02:38,354 --> 00:02:40,792
+prediction gets incremented by
+变量prediction 每次就通过
+
+75
+00:02:40,792 --> 00:02:41,822
+theta j times xj.
+自身加上 θ(j) 乘以 x(j) 更新值
+
+76
+00:02:41,822 --> 00:02:44,737
+So, it's kind of this expression over here.
+这个就是算法的代码实现
+
+77
+00:02:44,737 --> 00:02:47,223
+By the way, I should mention in these
+顺便我要提醒一下
+
+78
+00:02:47,223 --> 00:02:48,894
+vectors right over here, I
+这里的向量
+
+79
+00:02:48,900 --> 00:02:51,102
+had these vectors being 0 index.
+我用的下标是 0
+
+80
+00:02:51,110 --> 00:02:52,600
+So, I had theta 0 theta 1,
+所以我有 θ0 θ1
+
+81
+00:02:52,600 --> 00:02:54,390
+theta 2, but because MATLAB
+θ2 但因为 MATLAB
+
+82
+00:02:54,390 --> 00:02:56,713
+is one index, theta 0
+的下标从1开始 在 MATLAB 中 θ0
+
+83
+00:02:56,713 --> 00:02:58,019
+in MATLAB, we might
+我们可能会
+
+84
+00:02:58,019 --> 00:03:00,204
+end up representing as theta
+用 θ1 来表示
+
+85
+00:03:00,204 --> 00:03:02,042
+1 and this second element
+这第二个元素
+
+86
+00:03:02,042 --> 00:03:04,392
+ends up as theta
+最后就会变成
+
+87
+00:03:04,392 --> 00:03:05,862
+2 and this third element
+θ2 而第三个元素
+
+88
+00:03:05,880 --> 00:03:08,002
+may end up as theta
+最终可能就用
+
+89
+00:03:08,002 --> 00:03:09,952
+3 just because vectors in
+θ3 表示 因为
+
+90
+00:03:09,960 --> 00:03:11,998
+MATLAB are indexed starting
+MATLAB 中的下标从1开始
+
+91
+00:03:11,998 --> 00:03:13,525
+from 1 even though our real
+即使我们实际的
+
+92
+00:03:13,525 --> 00:03:15,436
+theta and x here starting,
+θ 和 x 的下标从0开始
+
+93
+00:03:15,450 --> 00:03:17,002
+indexing from 0, which
+这就是为什么
+
+94
+00:03:17,002 --> 00:03:18,785
+is why here I have a for loop
+这里我的 for 循环
+
+95
+00:03:18,785 --> 00:03:20,498
+j goes from 1 through n+1
+j 取值从 1 直到 n+1
+
+96
+00:03:20,498 --> 00:03:22,225
+rather than j go through
+而不是
+
+97
+00:03:22,225 --> 00:03:26,243
+0 up to n, right? But
+从 0 到 n 清楚了吗?
+
+98
+00:03:26,300 --> 00:03:27,870
+so, this is an
+但这是一个
+
+99
+00:03:27,870 --> 00:03:29,571
+unvectorized implementation in that we
+未向量化的代码实现方式
+
+100
+00:03:29,571 --> 00:03:31,373
+have a for loop that summing up
+我们用一个 for 循环
+
+101
+00:03:31,373 --> 00:03:34,018
+the n elements of the sum.
+对 n 个元素进行加和
+
+102
+00:03:34,050 --> 00:03:35,646
+In contrast, here's how you
+作为比较 接下来是
+
+103
+00:03:35,646 --> 00:03:38,400
+write a vectorized implementation which
+向量化的代码实现
+
+104
+00:03:38,410 --> 00:03:39,959
+is that you would think
+你把
+
+105
+00:03:39,959 --> 00:03:42,618
+of x and theta
+x 和 θ
+
+106
+00:03:42,618 --> 00:03:43,955
+as vectors, and you just set
+看做向量 而你只需要
+
+107
+00:03:43,955 --> 00:03:46,039
+prediction equals theta transpose
+令变量 prediction 等于 θ转置
+
+108
+00:03:46,039 --> 00:03:48,347
+times x. You're just computing like so.
+乘以 x 你就可以这样计算
+
+109
+00:03:48,360 --> 00:03:51,011
+Instead of writing all these
+与其写所有这些
+
+110
+00:03:51,011 --> 00:03:52,966
+lines of code with the for loop,
+for 循环的代码
+
+111
+00:03:52,966 --> 00:03:54,242
+you instead have one line
+你只需要一行代码
+
+112
+00:03:54,242 --> 00:03:56,648
+of code and what this
+这行代码
+
+113
+00:03:56,648 --> 00:03:57,555
+line of code on the right
+右边所做的
+
+114
+00:03:57,555 --> 00:03:59,237
+will do is it use
+就是
+
+115
+00:03:59,237 --> 00:04:01,829
+Octaves highly optimized numerical
+利用 Octave 的高度优化的数值
+
+116
+00:04:01,840 --> 00:04:03,859
+linear algebra routines to compute
+线性代数算法来计算
+
+117
+00:04:03,859 --> 00:04:06,245
+this inner product between the
+两个向量的内积
+
+118
+00:04:06,245 --> 00:04:08,186
+two vectors, theta and X. And not
+θ 以及 x
+
+119
+00:04:08,190 --> 00:04:10,182
+only is the vectorized implementation
+这样向量化的实现不仅仅是更简单
+
+120
+00:04:10,182 --> 00:04:14,664
+simpler, it will also run more efficiently.
+它运行起来也将更加高效
+
+121
+00:04:15,820 --> 00:04:17,792
+So, that was Octave, but
+这就是 Octave 所做的
+
+122
+00:04:17,792 --> 00:04:19,912
+issue of vectorization applies to
+而向量化的方法
+
+123
+00:04:19,920 --> 00:04:22,020
+other programming languages as well.
+在其他编程语言中同样可以实现
+
+124
+00:04:22,040 --> 00:04:24,947
+Let's look at an example in C++.
+让我们来看一个 C++ 的例子
+
+125
+00:04:24,947 --> 00:04:27,965
+Here's what an unvectorized implementation might look like.
+这就是未向量化的代码实现可能看起来的样子
+
+126
+00:04:27,965 --> 00:04:31,395
+We again initialize prediction, you know, to
+我们再次初始化变量 prediction 为 0.0
+
+127
+00:04:31,395 --> 00:04:32,518
+0.0 and then we now have a full
+然后我们现在有一个完整的
+
+128
+00:04:32,518 --> 00:04:34,508
+loop for J0 up to
+从 j 等于 0 直到 n
+
+129
+00:04:34,508 --> 00:04:36,819
+n. Prediction + equals
+变量 prediction +=
+
+130
+00:04:36,830 --> 00:04:38,546
+theta j times x j where
+theta[j] 乘以 x[j]
+
+131
+00:04:38,560 --> 00:04:42,777
+again, you have this x + for loop that you write yourself.
+再一次 你有这样的自己写的 for 循环
+
+132
+00:04:42,777 --> 00:04:44,843
+In contrast, using a good
+与此相反 使用一个比较好的
+
+133
+00:04:44,850 --> 00:04:46,498
+numerical linear algebra library in
+C++ 数值线性代数库
+
+134
+00:04:46,498 --> 00:04:48,965
+C++, you could use
+你就可以用这个方程
+
+135
+00:04:48,990 --> 00:04:54,440
+write the function like or rather.
+write the function like or rather.
+
+136
+00:04:54,560 --> 00:04:56,533
+In contrast, using a good
+与此相反 使用较好的
+
+137
+00:04:56,533 --> 00:04:58,152
+numerical linear algebra library in
+C++ 数值线性代数库
+
+138
+00:04:58,152 --> 00:05:00,686
+C++, you can instead
+你可以写出这样的代码
+
+139
+00:05:00,686 --> 00:05:02,470
+write code that might look like this.
+write code that might look like this.
+
+140
+00:05:02,470 --> 00:05:03,985
+So, depending on the details
+因此取决于你的
+
+141
+00:05:03,985 --> 00:05:05,595
+of your numerical linear algebra
+数值线性代数库的内容
+
+142
+00:05:05,595 --> 00:05:06,790
+library, you might be
+你可以有一个
+
+143
+00:05:06,830 --> 00:05:08,580
+able to have an object that
+able to have an object that
+
+144
+00:05:08,580 --> 00:05:09,918
+is a C++ object which is
+C++ 对象
+
+145
+00:05:09,918 --> 00:05:11,328
+vector theta and a C++
+theta 和一个 C++
+
+146
+00:05:11,350 --> 00:05:13,436
+object which is a vector X,
+对象 向量 x
+
+147
+00:05:13,436 --> 00:05:15,552
+and you just take theta dot
+你只需要用 theta.transpose ()
+
+148
+00:05:15,552 --> 00:05:18,115
+transpose times x where
+乘以 x
+
+149
+00:05:18,120 --> 00:05:20,092
+this times becomes C++ to
+而这次是让 C++ 来实现运算
+
+150
+00:05:20,092 --> 00:05:22,028
+overload the operator so
+因此
+
+151
+00:05:22,028 --> 00:05:26,156
+that you can just multiply these two vectors in C++.
+你只需要在 C++ 中将两个向量相乘
+
+152
+00:05:26,156 --> 00:05:28,091
+And depending on, you know, the details
+根据
+
+153
+00:05:28,110 --> 00:05:29,515
+of your numerical and linear algebra
+你所使用的数值和线性代数库的使用细节的不同
+
+154
+00:05:29,515 --> 00:05:30,855
+library, you might end
+你最终使的代码表达方式
+
+155
+00:05:30,855 --> 00:05:31,894
+up using a slightly different and
+可能会有些许不同
+
+156
+00:05:31,894 --> 00:05:33,636
+syntax, but by relying
+但是通过
+
+157
+00:05:33,636 --> 00:05:35,758
+on a library to do this in a product.
+一个库来做内积
+
+158
+00:05:35,760 --> 00:05:37,064
+You can get a much simpler piece
+你可以得到一段更简单
+
+159
+00:05:37,064 --> 00:05:40,623
+of code and a much more efficient one.
+更有效的代码
+
+160
+00:05:40,623 --> 00:05:43,582
+Let's now look at a more sophisticated example.
+现在 让我们来看一个更为复杂的例子
+
+161
+00:05:43,582 --> 00:05:45,015
+Just to remind you here's our
+提醒一下
+
+162
+00:05:45,015 --> 00:05:46,792
+update rule for gradient descent
+这是线性回归算法梯度下降的更新规则
+
+163
+00:05:46,792 --> 00:05:48,794
+for linear regression and so,
+所以
+
+164
+00:05:48,794 --> 00:05:50,488
+we update theta j using this
+我们用这条规则对 j 等于 0 1 2 等等的所有值
+
+165
+00:05:50,488 --> 00:05:53,672
+rule for all values of J equals 0, 1, 2, and so on.
+更新 对象 θ j
+
+166
+00:05:53,672 --> 00:05:56,259
+And if I just write
+我只是
+
+167
+00:05:56,260 --> 00:05:58,206
+out these equations for
+用 θ0 θ1 θ2 来写方程
+
+168
+00:05:58,206 --> 00:06:00,048
+theta 0 Theta one, theta two.
+theta 0 Theta one, theta two.
+
+169
+00:06:00,048 --> 00:06:02,173
+Assuming we have two features.
+那就是假设我们有两个特征量
+
+170
+00:06:02,173 --> 00:06:03,469
+So N equals 2.
+所以 n等于2
+
+171
+00:06:03,469 --> 00:06:04,607
+Then these are the updates we
+这些都是我们需要对
+
+172
+00:06:04,610 --> 00:06:07,388
+perform to theta zero, theta one, theta two.
+theta0 theta1 theta2的更新
+
+173
+00:06:07,410 --> 00:06:08,982
+where you might remember my
+你可能还记得
+
+174
+00:06:08,982 --> 00:06:10,825
+saying in an earlier video
+在以前的视频中说过
+
+175
+00:06:10,825 --> 00:06:14,783
+that these should be simultaneous updates.
+这些都应该是同步更新
+
+176
+00:06:14,783 --> 00:06:16,268
+So let's see if
+因此 让我们来看看
+
+177
+00:06:16,268 --> 00:06:17,725
+we can come up with a
+我们是否可以拿出一个
+
+178
+00:06:17,725 --> 00:06:20,723
+vectorized implementation of this.
+向量化的代码实现
+
+179
+00:06:20,740 --> 00:06:22,598
+Here are my same 3 equations written
+这里是和之前相同的三个方程
+
+180
+00:06:22,598 --> 00:06:24,182
+on a slightly smaller font and you
+只不过用更小一些的字体写出来
+
+181
+00:06:24,182 --> 00:06:25,517
+can imagine that 1 wait
+你可以想象
+
+182
+00:06:25,520 --> 00:06:26,716
+to implement this three lines
+实现这三个方程的方式之一
+
+183
+00:06:26,720 --> 00:06:27,798
+of code is to have a
+就是用
+
+184
+00:06:27,798 --> 00:06:28,968
+for loop that says, you
+一个 for 循环
+
+185
+00:06:28,968 --> 00:06:31,682
+know, for j equals 0,
+就是让 j 等于0
+
+186
+00:06:31,682 --> 00:06:33,305
+1 through 2 the update
+1 2
+
+187
+00:06:33,305 --> 00:06:35,603
+theta J or something like that.
+来更新 θ j
+
+188
+00:06:35,603 --> 00:06:36,760
+But instead, let's come up
+但让我们
+
+189
+00:06:36,760 --> 00:06:40,975
+with a vectorized implementation and see if we can have a simpler way.
+用向量化的方式来实现 并看看我们是否能够有一个更简单的方法
+
+190
+00:06:40,975 --> 00:06:42,711
+So, basically compress these three
+基本上用三行代码
+
+191
+00:06:42,757 --> 00:06:44,314
+lines of code or a
+或者一个 for 循环
+
+192
+00:06:44,314 --> 00:06:48,518
+for loop that, you know, effectively does these 3 sets, 1 set at a time.
+一次实现这三个方程
+
+193
+00:06:48,518 --> 00:06:49,688
+Let's see who can these 3
+让我们来看看谁能用这三步
+
+194
+00:06:49,688 --> 00:06:51,402
+steps and compress them into
+并将它们压缩成
+
+195
+00:06:51,402 --> 00:06:53,972
+1 line of vectorized code.
+一行向量化的代码
+
+196
+00:06:53,976 --> 00:06:55,476
+Here's the idea.
+这个想法是
+
+197
+00:06:55,480 --> 00:06:56,462
+What I'm going to do is I'm
+我打算
+
+198
+00:06:56,462 --> 00:06:59,131
+going to think of theta
+把 θ 看做一个向量
+
+199
+00:06:59,131 --> 00:07:00,633
+as a vector and I'm
+然后我用
+
+200
+00:07:00,633 --> 00:07:04,214
+going to update theta as theta
+θ 减去
+
+201
+00:07:04,270 --> 00:07:07,468
+minus alpha times some
+α 乘以 某个别的向量
+
+202
+00:07:07,468 --> 00:07:11,650
+other vector, delta, where
+δ 来更新 θ
+
+203
+00:07:11,650 --> 00:07:13,689
+delta is going to be
+这里的 δ 等于 m 分之 1
+
+204
+00:07:13,700 --> 00:07:15,876
+equal to 1 over
+equal to 1 over
+
+205
+00:07:15,876 --> 00:07:18,408
+m, sum from I equals
+i=1 到 m 加和
+
+206
+00:07:18,450 --> 00:07:22,151
+one through m and then
+one through m and then
+
+207
+00:07:22,180 --> 00:07:25,570
+this term on the
+然后这个表达式
+
+208
+00:07:25,720 --> 00:07:28,118
+right, okay?
+好吗?
+
+209
+00:07:28,118 --> 00:07:31,205
+So, let me explain what's going on here.
+让我解释一下是怎么回事
+
+210
+00:07:31,220 --> 00:07:32,666
+Here, I'm going to treat
+在这里 我要把
+
+211
+00:07:32,666 --> 00:07:35,322
+theta as a vector
+θ 看作一个向量
+
+212
+00:07:35,350 --> 00:07:38,106
+so, there's an N+1 dimensional vector.
+有一个 n+1 维向量
+
+213
+00:07:38,110 --> 00:07:40,291
+I'm saying that theta gets, you know, updated
+我是说 θ 被更新
+
+214
+00:07:40,310 --> 00:07:43,922
+as--that's the vector, our N+1.
+我们的 n+1 维向量
+
+215
+00:07:43,922 --> 00:07:45,319
+Alpha is a real
+α 是一个实数
+
+216
+00:07:45,319 --> 00:07:47,395
+number and delta
+δ
+
+217
+00:07:47,410 --> 00:07:49,941
+here is a vector.
+在这里是一个向量
+
+218
+00:07:49,960 --> 00:07:54,278
+So, this subtraction operation, that's a vector subtraction.
+所以这个减法运算是一个向量减法
+
+219
+00:07:54,278 --> 00:07:55,255
+Okay?
+清楚吗 ?
+
+220
+00:07:55,255 --> 00:07:56,977
+Because alpha times delta
+因为 α 乘以 δ
+
+221
+00:07:56,977 --> 00:07:58,385
+is a vector and so
+是一个向量 所以
+
+222
+00:07:58,385 --> 00:08:00,369
+I'm saying if theta gets, you know, this
+θ 就是 θ 减去 α 乘以 δ 得到的向量
+
+223
+00:08:00,369 --> 00:08:04,217
+vector, alpha times delta subtracted from it.
+vector, alpha times delta subtracted from it.
+
+224
+00:08:04,240 --> 00:08:06,563
+So, what is the vector delta?
+那么什么是向量 δ 呢 ?
+
+225
+00:08:06,563 --> 00:08:10,220
+Well, this vector delta looks like this.
+嗯 向量 δ 是这样子的
+
+226
+00:08:10,256 --> 00:08:12,092
+And what this meant to
+这实际上代表的是
+
+227
+00:08:12,092 --> 00:08:14,595
+be is really meant to be
+be is really meant to be
+
+228
+00:08:14,620 --> 00:08:17,102
+this thing over here.
+这部分内容
+
+229
+00:08:17,140 --> 00:08:19,200
+Concretely, delta will be
+具体地说 δ 将成为
+
+230
+00:08:19,220 --> 00:08:22,165
+a N+1 dimensional vector and
+n +1 维向量
+
+231
+00:08:22,165 --> 00:08:23,978
+the very first element of
+并且向量的第一个元素
+
+232
+00:08:23,978 --> 00:08:27,767
+the vector delta is going to be equal to that.
+就等于这个
+
+233
+00:08:27,770 --> 00:08:29,513
+So, if we have
+所以我们的 δ
+
+234
+00:08:29,513 --> 00:08:31,565
+the delta, you know, if we index it
+如果要写下标的话
+
+235
+00:08:31,565 --> 00:08:34,469
+from 0--this is delta 0, delta 1, delta 2.
+就是从零开始 δ0 δ1 δ2
+
+236
+00:08:34,469 --> 00:08:36,541
+What I want is that
+我想要的是
+
+237
+00:08:36,560 --> 00:08:39,033
+delta 0 is equal
+δ0 等于
+
+238
+00:08:39,040 --> 00:08:41,267
+to, you know, this
+这个
+
+239
+00:08:41,267 --> 00:08:42,359
+first box also green up
+第一行绿色框起来的部分
+
+240
+00:08:42,360 --> 00:08:45,306
+above and indeed, you might
+事实上 你可能会
+
+241
+00:08:45,306 --> 00:08:47,108
+be able to convince yourself that delta
+写出 δ0 是
+
+242
+00:08:47,108 --> 00:08:48,681
+0 is this 1 of m,
+m 分之 1
+
+243
+00:08:48,681 --> 00:08:50,102
+sum of, you know, h of
+h(x) 减去 y(i)
+
+244
+00:08:50,102 --> 00:08:53,356
+x. xi minus
+乘以 x(i)0 的和
+
+245
+00:08:53,400 --> 00:08:58,315
+yi times xi0.
+yi times xi0.
+
+246
+00:08:58,315 --> 00:08:59,748
+So, let's just make
+所以让我们
+
+247
+00:08:59,748 --> 00:09:01,064
+sure that we're on the
+在同一页上
+
+248
+00:09:01,064 --> 00:09:03,998
+same page about how delta really is computed.
+计算真正的 δ
+
+249
+00:09:03,998 --> 00:09:05,488
+Delta is one of m
+δ 就是 m 分之 1
+
+250
+00:09:05,488 --> 00:09:08,284
+times the sum over here
+乘以这个和
+
+251
+00:09:08,284 --> 00:09:09,871
+and, you know, what is this sum?
+那这个和是什么 ?
+
+252
+00:09:09,871 --> 00:09:11,426
+Well, this term over
+恩 这个符号
+
+253
+00:09:11,426 --> 00:09:17,115
+here, that's a real number.
+是一个实数
+
+254
+00:09:17,150 --> 00:09:21,219
+And the second term over here, xi.
+这里的第二个符号 是 x(i)
+
+255
+00:09:21,219 --> 00:09:23,892
+This term over there is a
+这个符号是一个向量
+
+256
+00:09:23,910 --> 00:09:26,109
+vector, right? Because xi might
+对吧 ? 因为 x(i)
+
+257
+00:09:26,109 --> 00:09:26,982
+be a vector.
+可能是一个向量
+
+258
+00:09:26,990 --> 00:09:29,630
+That would be
+这将是
+
+259
+00:09:29,975 --> 00:09:36,115
+xi0, xi1, xi2 right?
+x(i)0 x(i)1 x(i)2 对吧 ?
+
+260
+00:09:36,130 --> 00:09:38,246
+And what is the summation?
+那这个求和是什么 ?
+
+261
+00:09:38,246 --> 00:09:40,241
+Well, what does summation say
+恩 这个求和就是
+
+262
+00:09:40,250 --> 00:09:43,292
+is that this term
+这里的式子
+
+263
+00:09:43,502 --> 00:09:46,555
+over here.
+就在这里
+
+264
+00:09:47,280 --> 00:09:54,801
+This is equal to h+x1-y1 times
+等于 h(x(1)) - y(1) 乘以 x(1)
+
+265
+00:09:54,870 --> 00:09:59,099
+x1 + h of
+加上
+
+266
+00:09:59,115 --> 00:10:02,778
+x2-y2 times x2
+h(x(2)) - y(2) 乘以 x(2)
+
+267
+00:10:02,778 --> 00:10:05,396
++ you know, and so on.
+依此类推
+
+268
+00:10:05,396 --> 00:10:06,404
+Okay?
+对吧 ?
+
+269
+00:10:06,404 --> 00:10:07,420
+Because this is a summation of
+因为这是对 i 的加和
+
+270
+00:10:07,420 --> 00:10:09,013
+the I. So, as I
+所以
+
+271
+00:10:09,013 --> 00:10:11,345
+ranges from I1 through m,
+当 i 从 1 到 m
+
+272
+00:10:11,345 --> 00:10:15,144
+you get these different terms and you're summing up these terms.
+你就会得到这些不同的式子 然后作加和
+
+273
+00:10:15,160 --> 00:10:16,221
+And the meaning of each of these
+每个式子的意思
+
+274
+00:10:16,221 --> 00:10:18,262
+terms is a lot like
+很像
+
+275
+00:10:18,262 --> 00:10:19,807
+- if you remember actually from
+如果你还记得实际上
+
+276
+00:10:19,807 --> 00:10:24,100
+the earlier quiz in this, if you solve this equation.
+在以前的一个小测验 如果你要解这个方程
+
+277
+00:10:24,110 --> 00:10:25,560
+We said that in order to
+我们说过
+
+278
+00:10:25,560 --> 00:10:27,250
+vectorize this code, we
+为了向量化这段代码
+
+279
+00:10:27,250 --> 00:10:30,755
+will instead set u2v+5w. So,
+我们会令 u = 2v +5w 因此
+
+280
+00:10:30,770 --> 00:10:32,391
+we're saying that the vector u
+我们说 向量u
+
+281
+00:10:32,391 --> 00:10:33,706
+is equal to 2 times
+等于2乘以向量v
+
+282
+00:10:33,706 --> 00:10:35,568
+the vector v plus 5 times
+加上 5乘以向量 w
+
+283
+00:10:35,570 --> 00:10:37,198
+the vector w. So, just an
+用这个例子说明
+
+284
+00:10:37,198 --> 00:10:39,023
+example of how to
+如何对不同的向量进行相加
+
+285
+00:10:39,023 --> 00:10:42,453
+add different vectors and this summation is the same thing.
+这里的求和是同样的道理
+
+286
+00:10:42,453 --> 00:10:44,919
+It's a saying that this
+这一部分
+
+287
+00:10:44,950 --> 00:10:49,766
+summation over here is just some real number right?
+只是一个实数
+
+288
+00:10:49,840 --> 00:10:50,996
+That's kind of like the number
+就有点像数字 2
+
+289
+00:10:51,010 --> 00:10:52,698
+2 and some other number
+而这里是别的一些数字
+
+290
+00:10:52,711 --> 00:10:54,085
+times the vector x1.
+来乘以向量x1
+
+291
+00:10:54,085 --> 00:10:56,792
+This is like 2 times v instead
+这就像是 2v
+
+292
+00:10:56,792 --> 00:10:59,177
+with some other number times x1
+只不过用别的数字乘以 x1
+
+293
+00:10:59,177 --> 00:11:01,712
+and then plus, you know, instead of
+然后加上 你知道
+
+294
+00:11:01,712 --> 00:11:03,475
+5xw, we instead have some
+不是5w 而是用
+
+295
+00:11:03,475 --> 00:11:05,212
+other real number plus some
+别的实数乘以
+
+296
+00:11:05,212 --> 00:11:06,850
+other vector and then you
+一个别的向量 然后你
+
+297
+00:11:06,860 --> 00:11:08,909
+add on other vectors, you know,
+加上其他的向量
+
+298
+00:11:08,909 --> 00:11:10,528
+plus ... plus the other
+plus ... plus the other
+
+299
+00:11:10,540 --> 00:11:12,234
+vectors, which is why
+这就是为什么
+
+300
+00:11:12,234 --> 00:11:15,178
+overall, this thing
+总体而言
+
+301
+00:11:15,178 --> 00:11:17,015
+over here, that whole
+在这里 这整个量
+
+302
+00:11:17,015 --> 00:11:19,745
+quantity, that delta is
+δ 就是一个向量
+
+303
+00:11:19,770 --> 00:11:23,685
+just some vector, and concretely, the
+具体而言
+
+304
+00:11:23,685 --> 00:11:26,373
+3 elements of delta correspond
+对应这三个 δ 的元素
+
+305
+00:11:26,373 --> 00:11:28,813
+if n2, the 3 elements
+如果n等于2
+
+306
+00:11:28,820 --> 00:11:31,512
+of delta correspond exactly to
+δ 的三个元素一一对应
+
+307
+00:11:31,512 --> 00:11:33,349
+this thing to the second
+这个
+
+308
+00:11:33,349 --> 00:11:35,075
+thing and this third
+第二个 以及这第三个
+
+309
+00:11:35,075 --> 00:11:36,401
+thing, which is why
+式子 这就是为什么
+
+310
+00:11:36,410 --> 00:11:38,299
+when you update theta, according to
+当您更新 θ 值时 根据
+
+311
+00:11:38,299 --> 00:11:40,979
+theta minus alpha delta,
+θ - αθ 这个式子
+
+312
+00:11:41,010 --> 00:11:42,760
+we end up having exactly the
+我们最终能得到完全符合最上方更新规则的
+
+313
+00:11:42,830 --> 00:11:44,948
+same simultaneous updates as the
+同步更新
+
+314
+00:11:44,960 --> 00:11:47,825
+update rules that we have on top.
+update rules that we have on top.
+
+315
+00:11:47,840 --> 00:11:48,960
+So, I know that there
+我知道
+
+316
+00:11:48,960 --> 00:11:50,466
+was a lot that happened on
+幻灯片上的内容很多
+
+317
+00:11:50,500 --> 00:11:52,608
+the slides, but again, feel
+但是再次重申
+
+318
+00:11:52,650 --> 00:11:54,489
+free to pause the video and
+请随时暂停视频
+
+319
+00:11:54,510 --> 00:11:56,592
+I either encourage you to
+我也鼓励你
+
+320
+00:11:56,592 --> 00:11:58,247
+step through the difference. If
+一步步对比这两者的差异
+
+321
+00:11:58,247 --> 00:11:59,451
+you're unsure of what just happen,
+如果你不清楚刚才的内容
+
+322
+00:11:59,451 --> 00:12:01,719
+I encourage you to step through
+我希望你能一步一步读幻灯片的内容
+
+323
+00:12:01,719 --> 00:12:02,940
+the slide to make sure you
+以确保你理解
+
+324
+00:12:02,940 --> 00:12:04,578
+understand why is it
+为什么这个式子
+
+325
+00:12:04,580 --> 00:12:07,048
+that this update here with
+用 δ 的这个定理
+
+326
+00:12:07,060 --> 00:12:09,612
+this definition of delta, right?
+定义的 好吗 ?
+
+327
+00:12:09,612 --> 00:12:10,943
+Why is it that that equal
+以及它为什么
+
+328
+00:12:10,943 --> 00:12:13,714
+to this update on top and
+和最上面的更新方式是等价的
+
+329
+00:12:13,714 --> 00:12:15,033
+it's still not clear when insight is
+为什么是这样子的
+
+330
+00:12:15,033 --> 00:12:18,395
+that, you know, this thing over here.
+你知道 就是这里的式子
+
+331
+00:12:18,400 --> 00:12:20,628
+That's exactly the vector
+这就是向量 x
+
+332
+00:12:20,628 --> 00:12:22,109
+x and so, we're
+而我们只是用了
+
+333
+00:12:22,109 --> 00:12:23,342
+just taking, you know, all
+你知道
+
+334
+00:12:23,342 --> 00:12:25,516
+3 of these computations and compressing
+这三个计算式并且压缩
+
+335
+00:12:25,516 --> 00:12:27,106
+them into one step
+成一个步骤
+
+336
+00:12:27,106 --> 00:12:29,778
+with the this vector delta,
+用这个向量 δ
+
+337
+00:12:29,778 --> 00:12:31,292
+which is why we can come
+这就是为什么我们能够
+
+338
+00:12:31,292 --> 00:12:33,465
+up with a vectorized implementation of
+矢量化地实现
+
+339
+00:12:33,490 --> 00:12:36,942
+this step of linear regression this way.
+线性回归
+
+340
+00:12:36,942 --> 00:12:38,639
+So I hope this
+所以 我希望
+
+341
+00:12:38,660 --> 00:12:40,660
+step makes sense, and do
+步骤是有逻辑的
+
+342
+00:12:40,660 --> 00:12:41,791
+look at the video and make
+请务必看视频 并且保证
+
+343
+00:12:41,791 --> 00:12:44,013
+sure and see if you can understand it.
+你确实能理解它
+
+344
+00:12:44,013 --> 00:12:46,058
+In case you don't understand The
+万一你是在不能理解
+
+345
+00:12:46,058 --> 00:12:48,029
+equivalence of this math if
+它们数学上等价的原因
+
+346
+00:12:48,029 --> 00:12:49,435
+you implement this, this turns
+你就直接实现这个算法
+
+347
+00:12:49,435 --> 00:12:50,944
+out to be the right answer anyway,
+算是能得到正确答案的
+
+348
+00:12:50,944 --> 00:12:52,224
+so even if you didn't
+所以即使你没有
+
+349
+00:12:52,224 --> 00:12:56,403
+quite understand the equivalence, if you just implement it this way,
+完全理解为何是等价的 如果只是实现这种算法
+
+350
+00:12:56,410 --> 00:12:58,992
+you'll be able to get linear regressions to work.
+你仍然实现线性回归算法
+
+351
+00:12:58,992 --> 00:13:00,663
+So, if you're able to
+所以如果你能
+
+352
+00:13:00,663 --> 00:13:02,216
+figure out why these 2 steps
+弄清楚为什么这两个步骤是等价的
+
+353
+00:13:02,216 --> 00:13:04,122
+are equivalent then hopefully that
+那我希望你可以对
+
+354
+00:13:04,122 --> 00:13:06,239
+would give you a better understanding of vectorization
+向量化有一个更好的理解
+
+355
+00:13:06,239 --> 00:13:10,121
+as well, and finally,
+以及 最后
+
+356
+00:13:10,121 --> 00:13:12,355
+if you're implementing linear
+如果你在实现线性回归的时候
+
+357
+00:13:12,370 --> 00:13:14,872
+regression using more than one or two features.
+使用一个或两个以上的特征量
+
+358
+00:13:14,872 --> 00:13:16,548
+So, sometimes we use linear
+有时我们使用
+
+359
+00:13:16,550 --> 00:13:18,078
+regression with tens or hundreds
+几十或几百个特征量
+
+360
+00:13:18,078 --> 00:13:19,968
+thousands of features, but if
+来计算线性归回
+
+361
+00:13:19,980 --> 00:13:21,853
+you use the vectorized implementation
+当你使用向量化地实现
+
+362
+00:13:21,853 --> 00:13:23,735
+of linear regression, usually that
+线性回归
+
+363
+00:13:23,735 --> 00:13:25,605
+will run much faster than if
+通常运行速度就会比你以前用
+
+364
+00:13:25,605 --> 00:13:26,892
+you had say your old
+你的 for 循环快的多
+
+365
+00:13:26,892 --> 00:13:28,163
+for loop that was you
+也就是自己
+
+366
+00:13:28,163 --> 00:13:31,485
+know, updating theta 0 then theta 1 then theta 2 yourself.
+写代码更新 θ0 θ1 θ2
+
+367
+00:13:31,500 --> 00:13:33,769
+So, using a vectorized implementation, you
+因此使用向量化实现方式
+
+368
+00:13:33,769 --> 00:13:34,688
+should be able to get a
+你应该是能够得到
+
+369
+00:13:34,688 --> 00:13:37,762
+much more efficient implementation of linear regression.
+一个高效得多的线性回归算法
+
+370
+00:13:37,790 --> 00:13:39,347
+And when you vectorize later
+而当你向量化
+
+371
+00:13:39,347 --> 00:13:40,430
+algorithms that we'll see in
+我们将在之后的课程里面学到的算法
+
+372
+00:13:40,430 --> 00:13:41,554
+this class is a good
+这会是一个很好的技巧
+
+373
+00:13:41,554 --> 00:13:43,367
+trick whether an octave
+无论是对于 Octave 或者
+
+374
+00:13:43,367 --> 00:13:44,767
+or some of the language, the C++
+一些其他的语言 如C++
+
+375
+00:13:44,767 --> 00:13:48,474
+Java for getting your code to run more efficiently.
+Java 来让你的代码运行得更高效 【果壳教育无边界字幕组】翻译:Jaminalia 校对:所罗门捷列夫
+
diff --git a/srt/5 - 7 - Working on and Submitting Programming Exercises (4 min).srt b/srt/5 - 7 - Working on and Submitting Programming Exercises (4 min).srt
new file mode 100644
index 00000000..88ea5a0f
--- /dev/null
+++ b/srt/5 - 7 - Working on and Submitting Programming Exercises (4 min).srt
@@ -0,0 +1,357 @@
+1
+00:00:00,000 --> 00:00:04,162
+在这段视频中 我想很快地介绍一下这门课程
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:00,000 --> 00:00:04,162
+In this video, I want to just quickly step you
+through the logistics of how to work on
+
+3
+00:00:04,162 --> 00:00:09,387
+做作业的流程 以及如何使用作业提交系统
+
+4
+00:00:04,162 --> 00:00:09,387
+homeworks in this class and how to use the
+submission system which will let you verify
+
+5
+00:00:09,387 --> 00:00:15,619
+这个提交系统可以即时检验你的机器学习程序答案是否正确
+
+6
+00:00:09,387 --> 00:00:15,619
+right away that you got the right answer for
+your machine learning program exercise.
+
+7
+00:00:15,619 --> 00:00:19,354
+这是我的 Octave 编程窗口 让我们先进入到我的桌面
+
+8
+00:00:15,619 --> 00:00:19,354
+Here's my Octave window and
+let's first go to my desktop.
+
+9
+00:00:19,354 --> 00:00:25,374
+我在我的桌面上保存了我的第一个练习和一些文件
+
+10
+00:00:19,354 --> 00:00:25,374
+I saved the files for my first exercise,
+some of the files on my desktop:
+
+11
+00:00:25,374 --> 00:00:27,994
+在 'ml-class-ex1' 目录中
+
+12
+00:00:25,374 --> 00:00:27,994
+in this directory, 'ml-class-ex1'.
+
+13
+00:00:27,994 --> 00:00:32,921
+我们提供了大量的文件 其中有一些需要由你自己来编辑
+
+14
+00:00:27,994 --> 00:00:32,921
+And we provide a number files
+and ask you to edit some of them.
+
+15
+00:00:32,921 --> 00:00:40,701
+因此第一个文件应该符合编程练习中 pdf 文件的要求
+
+16
+00:00:32,921 --> 00:00:40,701
+So the first file should meet the details in
+the pdf file for this programming exercise.
+
+17
+00:00:40,701 --> 00:00:45,352
+其中一个我们要求你编写的文件是 warmUpExercise.m 这个文件
+
+18
+00:00:40,701 --> 00:00:45,352
+But one of the files we ask you to edit is
+this file called warmUpExercise.m, where the
+
+19
+00:00:45,352 --> 00:00:49,890
+这个文件只是为了确保你熟悉提交系统
+
+20
+00:00:45,352 --> 00:00:49,890
+exercise is really just to make sure that
+you're familiar with the submission system.
+
+21
+00:00:49,890 --> 00:00:53,795
+所有你需要做的就是提交一个5×5的矩阵
+
+22
+00:00:49,890 --> 00:00:53,795
+And all you need to do is
+return the 5x5 identity matrix.
+
+23
+00:00:53,795 --> 00:01:00,301
+因此这个练习的答案 我给你们写过 就是 A = eye(5)
+
+24
+00:00:53,795 --> 00:01:00,301
+So the solution to this exercise I just
+showed you is to write A = eye(5).
+
+25
+00:01:00,301 --> 00:01:05,766
+这将修改该函数以产生5×5的单位矩阵
+
+26
+00:01:00,301 --> 00:01:05,766
+So that modifies this function to
+generate the 5x5 identity matrix.
+
+27
+00:01:05,766 --> 00:01:11,149
+现在warmUpExercise() 这个方程就实现了返回5x5的单位矩阵
+
+28
+00:01:05,766 --> 00:01:11,149
+And this function warmUpExercise()
+now returns the 5x5 identity matrix.
+
+29
+00:01:11,149 --> 00:01:13,727
+将它保存一下
+
+30
+00:01:11,149 --> 00:01:13,727
+And I'm just going to save it.
+
+31
+00:01:13,727 --> 00:01:17,465
+所以我已经完成了作业的第一部分 现在回到我的 Octave 窗口
+
+32
+00:01:13,727 --> 00:01:17,465
+So I've done the first part of this homework.
+Going back to my Octave window,
+
+33
+00:01:17,465 --> 00:01:27,185
+现在来到我的目录 C:\Users\ang\Desktop\ml-class-ex1
+
+34
+00:01:17,465 --> 00:01:27,185
+let's now go to my directory,
+'C:\Users\ang\Desktop\ml-class-ex1'.
+
+35
+00:01:27,185 --> 00:01:33,347
+如果我想确保我已经实现了程序 像这样输入'warmUpExercise()'
+
+36
+00:01:27,185 --> 00:01:33,347
+And if I want to make sure that I've implemented
+this, type 'warmUpExercise()' like so.
+
+37
+00:01:33,347 --> 00:01:39,671
+好了它返回了我们用刚才写的代码创建的一个5x5的单位矩阵
+
+38
+00:01:33,347 --> 00:01:39,671
+And yup, it returns the 5x5 identity matrix
+that we just wrote the code to create.
+
+39
+00:01:39,671 --> 00:01:43,870
+我现在可以按如下步骤提交代码 我要在这里目录下键入 submit()
+
+40
+00:01:39,671 --> 00:01:43,870
+And I can now submit the code as follows.
+I'm going to type 'submit()' in this
+
+41
+00:01:43,870 --> 00:01:49,300
+我要提交第一部分 所以我选择输入'1'
+
+42
+00:01:43,870 --> 00:01:49,300
+directory and I'm ready to submit part 1
+so I'm going to enter choice '1'.
+
+43
+00:01:49,300 --> 00:01:54,387
+这时它问我我的电子邮件地址 我们打开课程网站
+
+44
+00:01:49,300 --> 00:01:54,387
+So it asks me for my email address.
+I'm going go to the course website.
+
+45
+00:01:54,387 --> 00:01:59,682
+这是一个内部测试网站 所以你的版本可能看起来有点不同
+
+46
+00:01:54,387 --> 00:01:59,682
+This is an internal testing site, so your version
+of the website may look a little bit different.
+
+47
+00:01:59,682 --> 00:02:07,934
+这是我的电子邮件地址 和我的提交密码 我需要在这里输入
+
+48
+00:01:59,682 --> 00:02:07,934
+But that's my email address and this is my submission
+password, and I'm just going to type them in here.
+
+49
+00:02:07,934 --> 00:02:19,205
+所以我的邮箱是 ang@cs.stanford.edu 我的提交密码就是 9yC75USsGf
+
+50
+00:02:07,934 --> 00:02:19,205
+So I have ang@cs.stanford.edu and
+my submission password is 9yC75USsGf.
+
+51
+00:02:19,205 --> 00:02:23,849
+按下回车键 它连接到服务器 并将其提交
+
+52
+00:02:19,205 --> 00:02:23,849
+I'm going to hit enter; it connects to the server
+and submits it, and right away
+
+53
+00:02:23,849 --> 00:02:28,567
+然后它就会立刻告诉你 恭喜您 已成功完成作业1第1部分
+
+54
+00:02:23,849 --> 00:02:28,567
+it tells you "Congratulations! You have
+successfully completed Homework 1 Part 1".
+
+55
+00:02:28,567 --> 00:02:33,160
+这就确认了你已经做对了第一部分练习
+
+56
+00:02:28,567 --> 00:02:33,160
+And this gives you a verification
+that you got this part right.
+
+57
+00:02:33,160 --> 00:02:36,795
+如果你提交的答案不正确 那么它会给你一条消息 说明
+
+58
+00:02:33,160 --> 00:02:36,795
+And if you don't submit the right answer,
+then it will give you a message indicating
+
+59
+00:02:36,795 --> 00:02:39,501
+你没有完全答对
+
+60
+00:02:36,795 --> 00:02:39,501
+that you haven't quite gotten it right yet.
+
+61
+00:02:39,501 --> 00:02:47,861
+您还可以继续使用此提交密码 也可以生成新密码 都没有关系
+
+62
+00:02:39,501 --> 00:02:47,861
+And you can use this submission password and
+you can generate new passwords; it doesn't matter.
+
+63
+00:02:47,861 --> 00:02:52,556
+但你也可以使用你的网站登录密码 但因为这个密码
+
+64
+00:02:47,861 --> 00:02:52,556
+But you can also use your regular website
+login password, but because this password
+
+65
+00:02:52,556 --> 00:02:59,281
+会在显示器上直接显示 所以我们给你额外的提交密码
+
+66
+00:02:52,556 --> 00:02:59,281
+here is typed in clear text on your monitor,
+we gave you this extra submission password
+
+67
+00:02:59,281 --> 00:03:03,650
+因为你可能不希望输入你登录网站的密码
+
+68
+00:02:59,281 --> 00:03:03,650
+in case you don't want to type in your
+website's normal password onto a window
+
+69
+00:03:03,650 --> 00:03:09,219
+你的密码是否会显示出来 取决于你使用的操作系统
+
+70
+00:03:03,650 --> 00:03:09,219
+that, depending on your operating system,
+may or may not appear as text when you type
+
+71
+00:03:09,219 --> 00:03:14,544
+你的密码是否会显示出来 取决于你使用的操作系统
+
+72
+00:03:09,219 --> 00:03:14,544
+it into the Octave submission script.
+
+73
+00:03:14,544 --> 00:03:18,746
+这就是提交作业的方法
+
+74
+00:03:14,544 --> 00:03:18,746
+So, that's how you submit the
+homeworks after you've done it.
+
+75
+00:03:18,746 --> 00:03:23,696
+祝你好运 当你完成家庭作业的时候 我希望你都能答对
+
+76
+00:03:18,746 --> 00:03:23,696
+Good luck, and, when you get around to
+homeworks, I hope you get all of them right.
+
+77
+00:03:23,696 --> 00:03:28,329
+最后 在下一个也就是最后一个 Octave 的视频教程中 我将介绍
+
+78
+00:03:23,696 --> 00:03:28,329
+And finally, in the next and final Octave
+tutorial video, I want to tell you about
+
+79
+00:03:28,329 --> 00:03:33,337
+向量化(vectoriazation) 这种方式可以使你的 Octave 代码更有效率地运行
+
+80
+00:03:28,329 --> 00:03:33,337
+vectorization, which is a way to get your
+Octave code to run much more efficiently.
+
diff --git a/srt/6 - 1 - Classification (8 min).srt b/srt/6 - 1 - Classification (8 min).srt
new file mode 100644
index 00000000..adbc7294
--- /dev/null
+++ b/srt/6 - 1 - Classification (8 min).srt
@@ -0,0 +1,1166 @@
+1
+00:00:00,460 --> 00:00:01,410
+In this and the next
+在现在及未来
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,580 --> 00:00:02,730
+few videos, I want to
+一些视频,我想
+
+3
+00:00:02,960 --> 00:00:04,660
+start to talk about classification problems,
+开始谈论的分类问题,
+
+4
+00:00:05,520 --> 00:00:07,000
+where the variable y that
+其中变量y表示
+
+5
+00:00:07,110 --> 00:00:08,160
+you want to predict is discreet
+要预测是谨慎的
+
+6
+00:00:08,570 --> 00:00:10,190
+valued. We'll develop an
+估值。我们将开发一个
+
+7
+00:00:10,420 --> 00:00:11,860
+algorithm called logistic regression,
+算法称为Logistic回归,
+
+8
+00:00:12,410 --> 00:00:13,620
+which is one of the
+这是一个
+
+9
+00:00:13,700 --> 00:00:16,560
+most popular and most widely used learning algorithms today.
+最流行,最广泛使用的今天学习算法。
+
+10
+00:00:19,770 --> 00:00:22,150
+Here are some examples of classification problems.
+下面是分类问题的一些例子。
+
+11
+00:00:23,170 --> 00:00:24,720
+Earlier, we talked about emails,
+此前,我们谈到了电子邮件,
+
+12
+00:00:25,260 --> 00:00:26,700
+spam classification as an
+垃圾分类作为
+
+13
+00:00:27,070 --> 00:00:28,260
+example of a classification problem.
+例的分类问题。
+
+14
+00:00:29,380 --> 00:00:32,160
+Another example would be classifying online transactions.
+另一个例子是分类网上交易。
+
+15
+00:00:33,080 --> 00:00:34,110
+So, if you have a website
+所以,如果你有一个网站
+
+16
+00:00:34,340 --> 00:00:35,530
+that sells stuff and if you
+卖的东西,如果你
+
+17
+00:00:35,750 --> 00:00:36,740
+want to know if a physical
+要知道,如果一个物理
+
+18
+00:00:37,040 --> 00:00:39,140
+transaction is fraudulent or
+交易是欺诈或
+
+19
+00:00:39,260 --> 00:00:40,920
+not, whether someone has, you
+不,某人是否有你
+
+20
+00:00:41,060 --> 00:00:42,260
+know, is using a stolen credit card
+知道,是用偷来的信用卡
+
+21
+00:00:42,580 --> 00:00:43,890
+or has stolen the user's password.
+或窃取用户的密码。
+
+22
+00:00:44,560 --> 00:00:46,830
+That's another classification problem, and
+这是另一种分类问题,
+
+23
+00:00:47,030 --> 00:00:48,220
+earlier we also talked about
+我们前面也谈到了
+
+24
+00:00:48,410 --> 00:00:50,610
+the example of classifying tumors
+瘤进行分类的例子
+
+25
+00:00:51,640 --> 00:00:53,680
+as a cancerous malignant or as benign tumors.
+作为恶性癌或良性肿瘤。
+
+26
+00:00:55,070 --> 00:00:56,010
+In all of these problems,
+在所有的这些问题,
+
+27
+00:00:56,690 --> 00:00:57,610
+the variable that we're trying
+我们正在努力变
+
+28
+00:00:57,850 --> 00:00:58,870
+to predict is a variable
+预测是一个可变
+
+29
+00:00:59,290 --> 00:01:00,110
+Y that we can think
+y表示我们可以认为
+
+30
+00:01:00,420 --> 00:01:01,710
+of as taking on two values,
+作为取两个值,
+
+31
+00:01:02,600 --> 00:01:04,120
+either zero or one, either
+为零或1,或者
+
+32
+00:01:04,340 --> 00:01:05,780
+a spam or not spam, fraudulent
+垃圾邮件或者非垃圾邮件,欺诈性
+
+33
+00:01:06,620 --> 00:01:08,740
+or not fraudulent, malignant or benign.
+或不欺诈,恶性或良性。
+
+34
+00:01:10,490 --> 00:01:11,430
+Another name for the class
+该类的另一个名字
+
+35
+00:01:11,810 --> 00:01:13,160
+that we denote with 0 is
+我们表示具有0
+
+36
+00:01:13,810 --> 00:01:15,660
+the negative class, and another
+负类,和另一
+
+37
+00:01:15,950 --> 00:01:16,920
+name for the class that we
+命名的类,我们
+
+38
+00:01:17,020 --> 00:01:19,350
+denote with 1 is the positive class.
+表示与图1是阳性的类。
+
+39
+00:01:20,170 --> 00:01:21,500
+So 0 may denote the
+所以,0可以表示
+
+40
+00:01:22,070 --> 00:01:23,460
+benign tumor and 1
+良性肿瘤和1
+
+41
+00:01:23,850 --> 00:01:25,940
+positive class may denote a malignant tumor.
+阳性类可表示的恶性肿瘤。
+
+42
+00:01:27,090 --> 00:01:28,410
+The assignment of the 2
+的2的分配
+
+43
+00:01:28,860 --> 00:01:29,940
+classes, you know, spam,
+类,你知道,垃圾邮件,
+
+44
+00:01:30,050 --> 00:01:31,140
+no spam, and so on -
+没有垃圾邮件,等等 -
+
+45
+00:01:31,330 --> 00:01:32,470
+the assignment of the 2
+的2的分配
+
+46
+00:01:32,790 --> 00:01:34,140
+classes to positive and negative,
+类,以积极和消极的,
+
+47
+00:01:34,500 --> 00:01:35,950
+to 0 and 1 is somewhat
+为0和1是有点
+
+48
+00:01:36,250 --> 00:01:37,840
+arbitrary and it doesn't really matter.
+任意的,它其实并不重要。
+
+49
+00:01:38,680 --> 00:01:39,820
+But often there is this
+但往往有这样的
+
+50
+00:01:39,990 --> 00:01:40,970
+intuition that the negative
+直觉,负
+
+51
+00:01:41,460 --> 00:01:43,430
+class is conveying the
+类输送
+
+52
+00:01:43,590 --> 00:01:44,690
+absence of something, like the absence
+缺少的东西,就像没有
+
+53
+00:01:45,000 --> 00:01:47,440
+of a malignant tumor, whereas one,
+的恶性肿瘤,而之一,
+
+54
+00:01:47,860 --> 00:01:49,410
+the positive class, is conveying
+正班,被输送
+
+55
+00:01:49,950 --> 00:01:52,110
+the presence of something that we may be looking for.
+的东西,我们可能会寻找存在。
+
+56
+00:01:52,770 --> 00:01:54,340
+But the definition of which
+但它的定义
+
+57
+00:01:54,560 --> 00:01:55,400
+is negative and which is positive
+是负的,这是正
+
+58
+00:01:55,680 --> 00:01:58,480
+is somewhat arbitrary and it doesn't matter that much.
+是任意的,这一点并不重要得多。
+
+59
+00:02:00,090 --> 00:02:00,980
+For now, we're going to start
+现在,我们要开始
+
+60
+00:02:01,340 --> 00:02:03,030
+with classification problems with just
+与分类问题,只需
+
+61
+00:02:03,290 --> 00:02:04,540
+two classes; zero and one.
+两班;零和1。
+
+62
+00:02:05,480 --> 00:02:07,010
+Later on, we'll talk about multi-class
+稍后,我们将讨论多级
+
+63
+00:02:07,440 --> 00:02:09,320
+problems as well, whether variable
+问题还有,是否变
+
+64
+00:02:09,750 --> 00:02:10,960
+Y may take on say,
+Y可以采取的发言权,
+
+65
+00:02:11,550 --> 00:02:13,120
+for value zero, one, two and three.
+对值零,一,二,三。
+
+66
+00:02:14,220 --> 00:02:16,810
+This is called a multi-class classification problem,
+这就是所谓的多类分类问题,
+
+67
+00:02:17,680 --> 00:02:18,800
+but for the next few
+但在未来数
+
+68
+00:02:18,950 --> 00:02:20,280
+videos, let's start with the
+视频,让我们先从
+
+69
+00:02:20,660 --> 00:02:22,750
+two class or the binary classification problem.
+二级或二元分类问题。
+
+70
+00:02:23,580 --> 00:02:25,650
+and we'll worry about the multi-class setting later.
+我们稍后会担心多类设置。
+
+71
+00:02:26,980 --> 00:02:29,440
+So, how do we develop a classification algorithm?
+那么,我们如何建立一个分类算法?
+
+72
+00:02:30,530 --> 00:02:31,670
+Here's an example of a
+下面是一个例子
+
+73
+00:02:31,750 --> 00:02:32,730
+training set for a classification
+对于分类训练集
+
+74
+00:02:34,350 --> 00:02:35,800
+task for classifying a tumor
+任务的肿瘤分类
+
+75
+00:02:36,240 --> 00:02:37,540
+as malignant or benign and
+为恶性或良性,
+
+76
+00:02:37,820 --> 00:02:39,260
+notice that malignancy takes on
+注意到恶性肿瘤发生在
+
+77
+00:02:39,530 --> 00:02:41,200
+only two values zero or
+只有两个值为零或
+
+78
+00:02:41,380 --> 00:02:43,210
+no or one or one or yes.
+没有或一个或一个或肯定。
+
+79
+00:02:44,550 --> 00:02:45,650
+So, one thing we could
+所以,有一件事我们可以
+
+80
+00:02:45,850 --> 00:02:46,970
+do given this training set
+不要给这个训练集
+
+81
+00:02:47,440 --> 00:02:48,700
+is to apply the algorithm
+是应用该算法
+
+82
+00:02:49,120 --> 00:02:52,710
+that we already know, linear regression to this data set
+我们已经知道,线性回归,这组数据
+
+83
+00:02:53,150 --> 00:02:55,310
+and just try to fit the straight line to the data.
+而只是尝试,以适应直线的数据。
+
+84
+00:02:56,290 --> 00:02:57,480
+So, if you take this training
+所以,如果你把这个培训
+
+85
+00:02:57,780 --> 00:02:58,760
+set and fill a straight
+设置和填充直
+
+86
+00:02:58,900 --> 00:03:00,320
+line to it, maybe you get
+行吧,也许你会得到
+
+87
+00:03:00,700 --> 00:03:03,530
+hypothesis that looks like that.
+假设,看起来像这样。
+
+88
+00:03:03,700 --> 00:03:05,920
+Alright, so that's my hypothesis, h of
+好了,所以这是我的假设,H的
+
+89
+00:03:06,020 --> 00:03:07,890
+x equals theta transpose
+x等于西塔转
+
+90
+00:03:08,020 --> 00:03:09,330
+x. If you want
+按x。 如果你想
+
+91
+00:03:09,570 --> 00:03:11,270
+to make predictions, one thing
+作出预测,一件事
+
+92
+00:03:11,500 --> 00:03:12,980
+you could try doing is then
+你可以尝试做的是那么
+
+93
+00:03:13,610 --> 00:03:16,760
+threshold the classifier outputs at 0.5.
+阈值分类器输出为0.5。
+
+94
+00:03:17,110 --> 00:03:19,880
+That is at the vertical access value 0.5.
+即在垂直通道值0.5。
+
+95
+00:03:21,760 --> 00:03:23,940
+And if the hypothesis outputs
+并且,如果假设输出
+
+96
+00:03:24,330 --> 00:03:25,490
+a value that's greater than
+一个值,该值是大于
+
+97
+00:03:25,620 --> 00:03:27,510
+equal to 0.5 you predict y equals one.
+等于0.5您预测y等于之一。
+
+98
+00:03:27,860 --> 00:03:29,940
+If it's less than 0.5, you predict y equals zero.
+如果它小于0.5,你预测y等于为零。
+
+99
+00:03:31,070 --> 00:03:32,540
+Let's see what happens when we do that.
+让我们来看看,当我们这样做会发生什么。
+
+100
+00:03:32,740 --> 00:03:33,900
+So, let's take 0.5, and
+所以,让我们取0.5,和
+
+101
+00:03:34,090 --> 00:03:36,670
+so, you know, that's where the threshold is.
+所以,你知道,这就是门槛。
+
+102
+00:03:37,070 --> 00:03:39,260
+And thus, using linear regression this way.
+因此,使用线性回归这种方式。
+
+103
+00:03:39,920 --> 00:03:41,060
+Everything to the right
+一切的权利
+
+104
+00:03:41,330 --> 00:03:42,460
+of this point, we will end
+这一点,我们将结束
+
+105
+00:03:42,640 --> 00:03:43,690
+up predicting as the positive
+向上预测作为正
+
+106
+00:03:44,280 --> 00:03:45,390
+class because of the output
+因为输出级
+
+107
+00:03:45,690 --> 00:03:46,800
+values are greater than 0.5
+值大于0.5
+
+108
+00:03:47,270 --> 00:03:48,690
+on the vertical axis and
+在垂直轴上,并
+
+109
+00:03:49,340 --> 00:03:50,730
+everything to the left
+一切向左侧
+
+110
+00:03:51,000 --> 00:03:52,260
+of that point we will end
+这一点,我们将结束
+
+111
+00:03:52,490 --> 00:03:54,170
+up predicting as a negative value.
+向上预测为负值。
+
+112
+00:03:55,660 --> 00:03:57,570
+In this particular example, it
+在这个特定的例子中,它
+
+113
+00:03:57,720 --> 00:03:59,400
+looks like linear regression is actually
+看起来像线性回归实际上是
+
+114
+00:03:59,790 --> 00:04:01,870
+doing something reasonable even though
+做的事情,即使合理
+
+115
+00:04:02,190 --> 00:04:03,910
+this is a classification task we're
+这是一个分类的任务我们
+
+116
+00:04:04,140 --> 00:04:05,430
+interested in.
+感兴趣。
+
+117
+00:04:05,500 --> 00:04:07,420
+But now let's try changing problem a bit.
+但现在让我们尝试改变的问题了一下。
+
+118
+00:04:08,060 --> 00:04:09,360
+Let me extend out the horizontal
+让我伸出水平
+
+119
+00:04:10,040 --> 00:04:11,460
+axis of orbit and let's
+轨道的轴,让我们
+
+120
+00:04:11,650 --> 00:04:12,640
+say we got one more training
+说我们得到了一个更多的培训
+
+121
+00:04:12,990 --> 00:04:15,030
+example way out there on the right.
+例如出路在那里就对了。
+
+122
+00:04:16,520 --> 00:04:17,830
+Notice that that additional training
+请注意,这额外的培训
+
+123
+00:04:18,170 --> 00:04:19,200
+example, this one out
+例如,这一个了
+
+124
+00:04:19,390 --> 00:04:21,710
+here, it doesn't actually change anything, right?
+在这里,它实际上并没有改变什么,对不对?
+
+125
+00:04:22,420 --> 00:04:23,470
+Looking at the training set, it
+看着训练集,它
+
+126
+00:04:23,560 --> 00:04:26,340
+is pretty clear what a good hypothesis is.
+是相当清楚什么是好的假设是。
+
+127
+00:04:26,890 --> 00:04:27,920
+Well, everything to the right of
+好了,一切的权利
+
+128
+00:04:28,000 --> 00:04:29,050
+somewhere around here to the
+某处在这里的
+
+129
+00:04:29,190 --> 00:04:29,970
+right of this we should predict
+权这一点,我们应该预测
+
+130
+00:04:30,300 --> 00:04:31,280
+as positive, and everything to
+为阳性,什么都
+
+131
+00:04:31,480 --> 00:04:32,690
+the left we should probably predict
+左边,我们也许应该预测
+
+132
+00:04:33,060 --> 00:04:34,700
+as negative because from this
+因为从这个负
+
+133
+00:04:34,880 --> 00:04:35,940
+training set it looks like
+培训设置它看起来像
+
+134
+00:04:36,200 --> 00:04:37,880
+all the tumors larger than, you
+所有比你大的肿瘤
+
+135
+00:04:37,970 --> 00:04:39,190
+know, a certain value around here
+知道了,在这里一定值
+
+136
+00:04:39,490 --> 00:04:41,030
+are malignant, and all the
+是恶性的,并且所有的
+
+137
+00:04:41,200 --> 00:04:42,110
+tumors smaller than that are
+肿瘤比那些小
+
+138
+00:04:42,220 --> 00:04:44,660
+not malignant, at least for this training set.
+不是恶性的,至少在这个训练集。
+
+139
+00:04:46,160 --> 00:04:47,280
+But once we've added
+但是我们增加一次
+
+140
+00:04:47,720 --> 00:04:49,060
+that extra example out here,
+额外的例子在这里,
+
+141
+00:04:49,620 --> 00:04:50,660
+if you now run linear regression,
+如果你现在运行的线性回归,
+
+142
+00:04:51,580 --> 00:04:53,590
+you instead get a straight line fit to the data.
+你不是得到一条直线拟合数据。
+
+143
+00:04:54,430 --> 00:04:55,630
+That might maybe look like this, and
+这可能也许是这样的,和
+
+144
+00:04:57,890 --> 00:04:59,860
+if you now threshold this hypothesis
+如果你现在这个门槛假说
+
+145
+00:05:02,480 --> 00:05:03,460
+at 0.5, you end up with
+0.5,你结束了
+
+146
+00:05:04,110 --> 00:05:05,550
+a threshold that's around here
+一个阈值是在这里
+
+147
+00:05:06,320 --> 00:05:07,320
+so that everything to the right
+所以,一切的权利
+
+148
+00:05:07,570 --> 00:05:08,790
+of this point you predict as
+这一点,您预测为
+
+149
+00:05:08,960 --> 00:05:11,510
+positive, and everything to the left of that point you predict as negative.
+积极的,一切到该点的左边你预测为阴性。
+
+150
+00:05:14,580 --> 00:05:15,720
+And this seems a pretty
+而这似乎是一个相当
+
+151
+00:05:16,100 --> 00:05:18,500
+bad thing for linear regression to have done, right?
+坏事线性回归,都做了,对吧?
+
+152
+00:05:18,770 --> 00:05:19,840
+Because, you know, these are
+因为,你知道,这些都是
+
+153
+00:05:19,930 --> 00:05:22,010
+our positive examples, these are our negative examples.
+我们积极的例子,这些都是我们的负面的例子。
+
+154
+00:05:23,050 --> 00:05:24,580
+It's pretty clear, we should
+这是很清楚的,我们应该
+
+155
+00:05:24,800 --> 00:05:26,000
+really be separating the two classes
+真正分开两班
+
+156
+00:05:26,550 --> 00:05:28,180
+somewhere around there, but somehow
+我身边不远,但不知何故
+
+157
+00:05:28,670 --> 00:05:30,030
+by adding one example way
+通过添加一个实例方式
+
+158
+00:05:30,190 --> 00:05:31,280
+out here to the right, this
+这里的权利,这
+
+159
+00:05:31,420 --> 00:05:33,340
+example really isn't giving us any new information.
+例如还真是不给我们任何新的信息。
+
+160
+00:05:33,770 --> 00:05:34,950
+I mean, it should be no
+我的意思是,应该没
+
+161
+00:05:35,170 --> 00:05:36,300
+surprise to the learning out of
+惊喜地学习出
+
+162
+00:05:37,030 --> 00:05:39,100
+that the example way out here turns out to be malignant.
+那这里的例子的方式原来是恶性的。
+
+163
+00:05:40,230 --> 00:05:41,210
+But somehow adding that example
+但不知何故,并称例子
+
+164
+00:05:41,740 --> 00:05:43,420
+out there caused linear regression
+在那里引起的线性回归
+
+165
+00:05:44,410 --> 00:05:45,670
+to change in straight line fit
+在直线拟合改变
+
+166
+00:05:45,980 --> 00:05:47,650
+to the data from this
+从这个数据
+
+167
+00:05:48,840 --> 00:05:50,000
+magenta line out here
+洋红色线条在这里
+
+168
+00:05:50,840 --> 00:05:51,940
+to this blue line over here,
+这个蓝线在这里,
+
+169
+00:05:52,850 --> 00:05:54,770
+and caused it to give us a worse hypothesis.
+而造成它给我们一个更坏的假设。
+
+170
+00:05:56,950 --> 00:05:58,440
+So, applying linear regression
+因此,应用线性回归
+
+171
+00:05:59,080 --> 00:06:01,030
+to a classification problem usually
+一个分类问题通常
+
+172
+00:06:01,610 --> 00:06:03,400
+isn't, often isn't a great idea.
+是不是,往往不是一个好主意。
+
+173
+00:06:04,430 --> 00:06:05,750
+In the first instance, in the
+在一审中,
+
+174
+00:06:05,810 --> 00:06:07,090
+first example before I added
+之前,第一个例子我添加
+
+175
+00:06:07,540 --> 00:06:08,780
+this extra training example,
+这个额外的培训为例,
+
+176
+00:06:09,810 --> 00:06:11,430
+previously linear regression was
+此前线性回归是
+
+177
+00:06:11,650 --> 00:06:13,200
+just getting lucky and it
+刚开幸运,它
+
+178
+00:06:13,380 --> 00:06:14,990
+got us a hypothesis that, you
+我们得到了一个假设,你
+
+179
+00:06:15,090 --> 00:06:16,290
+know, worked well for that particular
+知道的,运作良好,对特定
+
+180
+00:06:16,670 --> 00:06:19,470
+example, but usually apply
+例如,但通常适用
+
+181
+00:06:19,980 --> 00:06:20,970
+linear regression to a data set,
+线性回归到一个数据集,
+
+182
+00:06:21,820 --> 00:06:23,040
+you know, you might get lucky but
+你知道,你可能会得到幸运,但
+
+183
+00:06:23,270 --> 00:06:24,130
+often it isn't a good
+通常它不是一个好
+
+184
+00:06:24,260 --> 00:06:25,730
+idea, so I wouldn't use
+的想法,所以我不会用
+
+185
+00:06:25,980 --> 00:06:27,960
+linear regression for classification problems.
+对于分类问题线性回归。
+
+186
+00:06:29,670 --> 00:06:30,820
+Here is one other funny thing
+这里是另外一个有趣的事情
+
+187
+00:06:31,250 --> 00:06:32,650
+about what would happen if
+会发生什么,如果
+
+188
+00:06:32,930 --> 00:06:35,510
+we were to use linear regression for a classification problem.
+我们用线性回归的分类问题。
+
+189
+00:06:36,690 --> 00:06:38,220
+For classification, we know that
+对于分类,我们知道,
+
+190
+00:06:38,450 --> 00:06:39,790
+Y is either zero or one,
+Y是0或1,
+
+191
+00:06:40,580 --> 00:06:41,620
+but if you are using
+但如果您使用的是
+
+192
+00:06:41,890 --> 00:06:43,050
+linear regression, well the hypothesis
+线性回归,以及假设
+
+193
+00:06:44,210 --> 00:06:45,750
+can output values much larger
+可输出值大得多
+
+194
+00:06:46,060 --> 00:06:47,330
+than one or less than
+大于1或小于
+
+195
+00:06:47,500 --> 00:06:48,820
+zero, even if all
+零,即使所有
+
+196
+00:06:49,050 --> 00:06:50,690
+of good the training examples have labels
+良好的训练样例有标签
+
+197
+00:06:51,140 --> 00:06:52,410
+Y equals zero or one,
+Y等于0或1,
+
+198
+00:06:53,900 --> 00:06:54,880
+and it seems kind of strange
+这似乎有点怪
+
+199
+00:06:55,520 --> 00:06:56,760
+that even though we
+即使我们
+
+200
+00:06:56,960 --> 00:06:58,160
+know that the label should
+知道标签应
+
+201
+00:06:58,350 --> 00:06:59,320
+be zero one, it seems
+为零1,似乎
+
+202
+00:06:59,420 --> 00:07:00,890
+kind of strange if the
+有点怪,如果
+
+203
+00:07:01,210 --> 00:07:02,580
+algorithm can offer values much
+算法可以提供多少价值
+
+204
+00:07:02,840 --> 00:07:04,900
+larger than one or much smaller than zero.
+比1大或大于零小得多。
+
+205
+00:07:09,540 --> 00:07:10,900
+So what we'll do in the
+所以,我们会做的
+
+206
+00:07:11,000 --> 00:07:12,400
+next few videos is develop
+接下来的几个视频是发展
+
+207
+00:07:12,860 --> 00:07:14,640
+an algorithm called logistic regression
+所谓逻辑回归算法
+
+208
+00:07:15,550 --> 00:07:17,390
+which has the property that the
+它具有属性的
+
+209
+00:07:17,780 --> 00:07:19,290
+output, the predictions of logistic
+输出,物流的预测
+
+210
+00:07:19,670 --> 00:07:21,220
+regression are always between zero
+回归总是零之间
+
+211
+00:07:21,630 --> 00:07:22,750
+and one, and doesn't become
+和1,并不会成为
+
+212
+00:07:23,060 --> 00:07:24,170
+bigger than one or become less
+大于1或变得不
+
+213
+00:07:24,370 --> 00:07:26,370
+than zero and by
+大于零并通过
+
+214
+00:07:26,530 --> 00:07:28,570
+the way, logistic regression is
+顺便说一下,logistic回归是
+
+215
+00:07:29,090 --> 00:07:30,150
+and we will use it as
+我们将使用它作为
+
+216
+00:07:30,350 --> 00:07:32,770
+a classification algorithm in some,
+的分类算法在某些,
+
+217
+00:07:33,330 --> 00:07:35,060
+maybe sometimes confusing that
+也许有时混淆了
+
+218
+00:07:35,780 --> 00:07:37,410
+the term regression appears in
+术语回归出现在
+
+219
+00:07:37,700 --> 00:07:39,360
+his name, even though logistic regression
+他的名字,即使logistic回归
+
+220
+00:07:39,970 --> 00:07:41,280
+is actually a classification algorithm.
+实际上是一种分类算法。
+
+221
+00:07:42,120 --> 00:07:43,040
+But that's just the name it
+但是,这仅仅是它的名字
+
+222
+00:07:43,160 --> 00:07:46,140
+was given for historical reasons so don't be confused by that.
+被赋予历史的原因,所以不要被迷惑了。
+
+223
+00:07:46,680 --> 00:07:48,340
+Logistic Regression is actually a
+Logistic回归实际上是一种
+
+224
+00:07:48,430 --> 00:07:50,250
+classification algorithm that we
+分类算法,我们
+
+225
+00:07:50,380 --> 00:07:52,030
+apply to settings where the
+适用于设置在哪里
+
+226
+00:07:52,160 --> 00:07:54,780
+label Y is discreet valued. The 1001.
+标签Y被谨慎的重视。 1001。
+
+227
+00:07:55,820 --> 00:07:57,440
+So hopefully you now
+所以希望你现在
+
+228
+00:07:57,680 --> 00:07:59,180
+know why if you
+知道是什么原因,如果你
+
+229
+00:07:59,280 --> 00:08:00,950
+have a causation problem using
+使用有因果关系问题
+
+230
+00:08:01,400 --> 00:08:02,660
+linear regression isn't a good idea .
+线性回归是不是一个好主意。
+
+231
+00:08:03,210 --> 00:08:04,480
+In the next video we'll
+在接下来的视频中,我们将
+
+232
+00:08:04,700 --> 00:08:05,680
+start working out the details
+开始制定细节
+
+233
+00:08:06,290 --> 00:08:07,640
+of the logistic regression algorithm.
+的logistic回归算法。
+
diff --git a/srt/6 - 2 - Hypothesis Representation (7 min).srt b/srt/6 - 2 - Hypothesis Representation (7 min).srt
new file mode 100644
index 00000000..43153d12
--- /dev/null
+++ b/srt/6 - 2 - Hypothesis Representation (7 min).srt
@@ -0,0 +1,1055 @@
+1
+00:00:00,210 --> 00:00:02,931
+Let's start talking about logistic regression.
+让我们开始谈谈逻辑回归 (字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,950 --> 00:00:04,315
+In this video, I'd like to
+在这段视频中 我将要
+
+3
+00:00:04,315 --> 00:00:07,210
+show you the hypothesis representation, that
+向你展示表征假设 也就是
+
+4
+00:00:07,210 --> 00:00:08,805
+is, what is the function
+函数是什么
+
+5
+00:00:08,810 --> 00:00:10,266
+we're going to use to represent
+我们要用代表
+
+6
+00:00:10,300 --> 00:00:15,446
+our hypothesis where we have a classification problem.
+我们的假设 我们有一个分类问题
+
+7
+00:00:15,450 --> 00:00:16,969
+Earlier, we said that we
+此前 我们说我们
+
+8
+00:00:16,969 --> 00:00:20,426
+would like our classifier to
+希望我们分类器
+
+9
+00:00:20,426 --> 00:00:21,956
+output values that are between
+的输出值在零和一之间
+
+10
+00:00:21,956 --> 00:00:23,250
+zero and one. So, we
+因此 我们
+
+11
+00:00:23,270 --> 00:00:24,566
+like to come up with a
+喜欢提出一个假设
+
+12
+00:00:24,566 --> 00:00:26,385
+hypothesis that satisfies this
+来满足该性质
+
+13
+00:00:26,385 --> 00:00:30,396
+property, that these predictions are maybe between zero and one.
+性质 这些估算也许在零和一之间
+
+14
+00:00:30,396 --> 00:00:32,764
+When we were using linear regression,
+当我们使用线性回归的时候
+
+15
+00:00:32,764 --> 00:00:34,262
+this was the form of a
+这是一种
+
+16
+00:00:34,262 --> 00:00:35,604
+hypothesis, where H of X
+假设的形式 其中H的X
+
+17
+00:00:35,604 --> 00:00:38,319
+is theta transpose X. For
+是θ转换成X的 关于
+
+18
+00:00:38,330 --> 00:00:39,831
+logistic regression, I'm going
+逻辑回归 我要去
+
+19
+00:00:39,831 --> 00:00:41,075
+to modify this a little
+修改这一点
+
+20
+00:00:41,075 --> 00:00:43,352
+bit, and make the hypothesis
+位 并假设
+
+21
+00:00:43,360 --> 00:00:46,218
+G of theta transpose X,
+G的θ转换成X
+
+22
+00:00:46,218 --> 00:00:47,711
+where I'm going to define
+我将定义
+
+23
+00:00:47,711 --> 00:00:50,693
+the function G as follows:
+函数G如下:
+
+24
+00:00:50,693 --> 00:00:51,926
+G of Z if Z
+G的z如果Z
+
+25
+00:00:51,926 --> 00:00:53,633
+is a real number is equal
+是一个实数等于
+
+26
+00:00:53,640 --> 00:00:55,640
+to one over one plus
+一分之一加
+
+27
+00:00:55,640 --> 00:00:58,480
+E to the negative Z. This
+E的负Z.
+
+28
+00:00:58,490 --> 00:01:01,716
+called the sigmoid function
+称为S形函数
+
+29
+00:01:01,720 --> 00:01:04,843
+or the logistic function.
+或逻辑函数
+
+30
+00:01:04,843 --> 00:01:07,089
+And the term logistic function,
+而长期的逻辑功能,
+
+31
+00:01:07,120 --> 00:01:11,103
+that's what give rise to the name logistic progression.
+就是引起逻辑进程的名称
+
+32
+00:01:11,103 --> 00:01:12,781
+And, by the way, the terms
+而且 顺便说一下 这个术语
+
+33
+00:01:12,781 --> 00:01:14,551
+sigmoid function and logistic
+S型函数和逻辑
+
+34
+00:01:14,551 --> 00:01:16,996
+function are basically synonyms
+函数基本上是同义词
+
+35
+00:01:16,996 --> 00:01:18,362
+and mean the same thing.
+和意味着同样的事情
+
+36
+00:01:18,362 --> 00:01:19,756
+So the two terms are
+因此 这两个术语是
+
+37
+00:01:19,756 --> 00:01:21,893
+basically interchangeable and either
+基本上是可互换的 要么其他
+
+38
+00:01:21,893 --> 00:01:23,160
+term can be used to
+术语可以用来
+
+39
+00:01:23,160 --> 00:01:24,620
+refer to this function
+参阅此函数
+
+40
+00:01:24,620 --> 00:01:26,283
+G. And if we
+G 如果我们
+
+41
+00:01:26,283 --> 00:01:27,734
+take these two equations, and
+使这两个方程 并
+
+42
+00:01:27,734 --> 00:01:30,089
+put them together, then here's
+把它们放在一起 那么这里
+
+43
+00:01:30,089 --> 00:01:32,354
+just an alternative way of
+正好是一个替代
+
+44
+00:01:32,354 --> 00:01:34,843
+writing out the form of my hypothesis.
+我的假设写出来的方式
+
+45
+00:01:34,843 --> 00:01:36,533
+I'm saying that H of x
+我说H的x
+
+46
+00:01:36,540 --> 00:01:38,933
+is one over one plus
+是一分之一加
+
+47
+00:01:38,933 --> 00:01:41,765
+E to the negative theta transpose
+E的负θ转换
+
+48
+00:01:41,765 --> 00:01:43,106
+X, and all I've done is
+X 我愿付出一切
+
+49
+00:01:43,106 --> 00:01:45,353
+I've taken the variable
+我已经采取变量
+
+50
+00:01:45,353 --> 00:01:46,700
+Z, Z here's a
+Z Z是一个
+
+51
+00:01:46,760 --> 00:01:48,173
+real number and plugged in
+实数和插入
+
+52
+00:01:48,173 --> 00:01:50,201
+theta transpose X, so
+θ转换X,所以
+
+53
+00:01:50,201 --> 00:01:52,560
+I end up with, you know, theta transpose
+我最终 你知道 θ转换
+
+54
+00:01:52,560 --> 00:01:54,933
+X, in place of Z there.
+X 在Z轴上
+
+55
+00:01:54,940 --> 00:01:57,949
+Lastly, let me show you where the sigmoid function looks like.
+最后,来让我向你展示S型函数
+
+56
+00:01:57,949 --> 00:02:00,296
+We're going to plot it on this figure here.
+我们要在这绘制这个图形
+
+57
+00:02:00,296 --> 00:02:02,022
+The sigmoid function, G of
+S型函数 G的
+
+58
+00:02:02,022 --> 00:02:04,652
+Z, also called the logistic function, looks like this.
+Z 也称为逻辑函数 看起来是这样的
+
+59
+00:02:04,652 --> 00:02:07,078
+It starts off near zero and
+它开始接近零
+
+60
+00:02:07,078 --> 00:02:09,366
+then rises until it processes
+然后上升 直到
+
+61
+00:02:09,366 --> 00:02:13,473
+0.5 at the origin and then it flattens out again like so.
+0.5在原点 那么它再次变平了 像这样
+
+62
+00:02:13,500 --> 00:02:16,051
+So that's what the sigmoid function looks like.
+所以S型函数看起来像
+
+63
+00:02:16,051 --> 00:02:17,898
+And you notice that the
+而且你注意到
+
+64
+00:02:17,898 --> 00:02:19,725
+sigmoid function, well, it
+S型函数 那么 它
+
+65
+00:02:19,740 --> 00:02:21,894
+asymptotes at one, and
+在一个渐近线上
+
+66
+00:02:21,894 --> 00:02:24,256
+asymptotes at zero as
+渐近于零
+
+67
+00:02:24,256 --> 00:02:26,388
+Z against the horizontal axis
+?的水平轴
+
+68
+00:02:26,388 --> 00:02:27,659
+is Z. As Z goes to
+为Z Z的
+
+69
+00:02:27,659 --> 00:02:29,304
+minus infinity, G of
+负无穷大 G
+
+70
+00:02:29,304 --> 00:02:31,396
+Z approaches zero and as
+的z接近零和
+
+71
+00:02:31,396 --> 00:02:33,816
+G of Z approaches infinity, G
+G的?趋于无穷大时 G
+
+72
+00:02:33,816 --> 00:02:35,864
+of Z approaches 1, and
+的z接近1
+
+73
+00:02:35,880 --> 00:02:37,252
+so because G of
+因为G的
+
+74
+00:02:37,252 --> 00:02:39,408
+Z offers values that are
+?值在
+
+75
+00:02:39,408 --> 00:02:41,696
+between 0 and 1 we
+0和1之间 我们
+
+76
+00:02:41,730 --> 00:02:44,592
+also have that H of
+也使得H的
+
+77
+00:02:44,610 --> 00:02:47,141
+X must be between 0 and 1.
+X必须为0和1之间
+
+78
+00:02:47,141 --> 00:02:50,029
+Finally, given this hypothesis
+最后 鉴于这一假说
+
+79
+00:02:50,040 --> 00:02:52,123
+representation, what we
+表示 什么是我们
+
+80
+00:02:52,123 --> 00:02:53,740
+need to do, as before,
+需要做的 因为在此之前
+
+81
+00:02:53,740 --> 00:02:58,841
+is fit the parameters theta to our data.
+是适合我们的数据参数θ
+
+82
+00:02:58,841 --> 00:03:00,490
+So given a training set, we
+所以一个训练集 我们
+
+83
+00:03:00,490 --> 00:03:01,743
+need to pick a value for
+需要选择一个值
+
+84
+00:03:01,743 --> 00:03:03,773
+the parameters theta and this
+参数θ和这个
+
+85
+00:03:03,773 --> 00:03:06,981
+hypothesis will then let us make predictions.
+假设会让我们做出预测
+
+86
+00:03:06,981 --> 00:03:08,534
+We'll talk about a learning algorithm
+稍后我们将谈论一个学习算法
+
+87
+00:03:08,534 --> 00:03:11,828
+later for fitting the parameters theta.
+拟合参数θ
+
+88
+00:03:11,828 --> 00:03:13,506
+But first let's talk a
+但是首先让我们讨论
+
+89
+00:03:13,506 --> 00:03:17,379
+bit about the interpretation of this model.
+一下这个模型的解释
+
+90
+00:03:17,640 --> 00:03:19,612
+Here's how I'm going to
+这就是我要
+
+91
+00:03:19,620 --> 00:03:21,660
+interpret the output of
+解释的输出
+
+92
+00:03:21,660 --> 00:03:23,637
+my hypothesis H of
+假说H的
+
+93
+00:03:23,637 --> 00:03:26,387
+X. When my hypothesis
+X 当我的假说
+
+94
+00:03:26,400 --> 00:03:28,238
+outputs some number, I am
+输出一些数字 我
+
+95
+00:03:28,240 --> 00:03:30,126
+going to treat that number as
+要把该数字的
+
+96
+00:03:30,126 --> 00:03:33,400
+the estimated probability that Y
+概率估计Y
+
+97
+00:03:33,400 --> 00:03:35,170
+is equal to one on a
+等于为一个在
+
+98
+00:03:35,170 --> 00:03:38,266
+new input example X. Here is what I mean.
+新的输入例如X 这就是我的意思
+
+99
+00:03:38,266 --> 00:03:40,324
+Here is an example.
+下面就是一个例子
+
+100
+00:03:40,324 --> 00:03:43,932
+Let's say we're using the tumor classification example.
+比方说 我们正在使用的肿瘤分类的例子
+
+101
+00:03:43,932 --> 00:03:45,234
+So we may have a feature
+因此我们可能有一个特点
+
+102
+00:03:45,234 --> 00:03:47,945
+vector X, which is this x01
+向量X 这是这个X01
+
+103
+00:03:47,945 --> 00:03:49,860
+as always and then our
+和平时一样 然后我们
+
+104
+00:03:49,860 --> 00:03:52,836
+one feature is the size of the tumor.
+的一个特征是肿瘤的大小
+
+105
+00:03:52,836 --> 00:03:54,045
+Suppose I have a patient come
+假设我有一个病人来了
+
+106
+00:03:54,045 --> 00:03:55,459
+in and, you know they have some
+你知道他们有一些
+
+107
+00:03:55,459 --> 00:03:57,183
+tumor size and I
+肿瘤大小和我
+
+108
+00:03:57,183 --> 00:03:58,759
+feed their feature vector X
+给他们的特征向量X
+
+109
+00:03:58,759 --> 00:04:00,963
+into my hypothesis and suppose
+到我的假设和假设
+
+110
+00:04:00,970 --> 00:04:03,760
+my hypothesis outputs the number 0.7.
+我的假设输出数量0.7
+
+111
+00:04:03,760 --> 00:04:05,758
+I'm going to interpret
+我将解释
+
+112
+00:04:05,758 --> 00:04:07,298
+my hypothesis as follows.
+我的假设如下
+
+113
+00:04:07,298 --> 00:04:08,790
+I'm going to say that this
+我要说 这个
+
+114
+00:04:08,790 --> 00:04:10,235
+hypothesis is telling me
+假设告诉我
+
+115
+00:04:10,235 --> 00:04:12,143
+that for a patient with
+对于一个患者
+
+116
+00:04:12,143 --> 00:04:14,490
+features X, the probability
+的特征X,这个概率
+
+117
+00:04:14,520 --> 00:04:16,772
+that Y equals one is 0 .7.
+对于Y是0 .7
+
+118
+00:04:16,772 --> 00:04:18,703
+In other words, I'm going
+换句话说 我要去
+
+119
+00:04:18,720 --> 00:04:21,106
+to tell my patient that the
+告诉我的病人这个
+
+120
+00:04:21,106 --> 00:04:23,320
+tumor, sadly, has
+肿瘤 可悲的是 有
+
+121
+00:04:23,320 --> 00:04:27,836
+a 70% chance or a 0.7 chance of being malignant.
+70%的可能性或0.7可能性是恶性
+
+122
+00:04:27,860 --> 00:04:29,420
+To write this out slightly more
+要更加正式的写出来
+
+123
+00:04:29,420 --> 00:04:30,473
+formally or to write this
+或写这在
+
+124
+00:04:30,480 --> 00:04:31,763
+out in math, I'm going to
+数学 我要
+
+125
+00:04:31,763 --> 00:04:34,803
+interpret my hypothesis output
+解释我的假设输出
+
+126
+00:04:34,820 --> 00:04:37,144
+as P of y
+作为P的y
+
+127
+00:04:37,150 --> 00:04:39,913
+equals 1, given X
+等于1 鉴于X
+
+128
+00:04:39,913 --> 00:04:41,813
+parametrized by theta.
+参数化的θ
+
+129
+00:04:41,830 --> 00:04:43,389
+So, for those of you that are
+因此 对于那些你
+
+130
+00:04:43,389 --> 00:04:45,320
+familiar with probability, this equation
+熟悉概率 这个方程
+
+131
+00:04:45,320 --> 00:04:46,766
+might make sense, if you're a little less familiar
+可能是有意义的 如果你有点不太熟悉
+
+132
+00:04:46,766 --> 00:04:48,673
+with probability, you know, here's
+概率 要知道 这里是
+
+133
+00:04:48,673 --> 00:04:51,564
+how I read this expression, this
+我怎么看这个表达式 这
+
+134
+00:04:51,580 --> 00:04:53,215
+is the probability that y is
+是y的概率是
+
+135
+00:04:53,215 --> 00:04:54,988
+equals to one, given x
+等于一 鉴于x
+
+136
+00:04:54,988 --> 00:04:56,493
+instead of given that my patient
+而不是考虑我的病人
+
+137
+00:04:56,493 --> 00:04:58,027
+has, you know, features X.
+你知道 假设X
+
+138
+00:04:58,040 --> 00:04:59,860
+Given my patient has a particular
+鉴于我的病人有一个特别的
+
+139
+00:04:59,860 --> 00:05:01,575
+tumor size represented by my
+肿瘤大小代表我的
+
+140
+00:05:01,575 --> 00:05:03,156
+features X, and this
+假设X 这
+
+141
+00:05:03,156 --> 00:05:06,956
+probability is parametrized by theta.
+参数化的概率θ
+
+142
+00:05:07,130 --> 00:05:09,166
+So I'm basically going to count
+所以 我基本上要指望
+
+143
+00:05:09,166 --> 00:05:11,009
+on my hypothesis to give
+我假设给
+
+144
+00:05:11,009 --> 00:05:13,332
+me estimates of the probability
+我估计的概率
+
+145
+00:05:13,332 --> 00:05:15,349
+that Y is equal to 1.
+Y是等于1
+
+146
+00:05:15,349 --> 00:05:16,523
+Now since this is a
+现在 因为这是一个
+
+147
+00:05:16,523 --> 00:05:18,629
+classification task, we know
+分类的任务 我们知道
+
+148
+00:05:18,640 --> 00:05:21,497
+that Y must be either zero or one, right?
+Y必须是0或1 对不对?
+
+149
+00:05:21,497 --> 00:05:23,373
+Those are the only two values
+这些是仅有的两个值
+
+150
+00:05:23,390 --> 00:05:25,466
+that Y could possibly take on,
+可能是由Y呈现
+
+151
+00:05:25,466 --> 00:05:26,654
+either in the training set or
+无论是在训练集或
+
+152
+00:05:26,654 --> 00:05:28,077
+for new patients that may walk
+对于新患者可能
+
+153
+00:05:28,077 --> 00:05:32,014
+into my office or into the doctor's office in the future.
+走进我的办公室 或在未来进入医生的办公室
+
+154
+00:05:32,014 --> 00:05:33,529
+So given H of X,
+因此 鉴于H的X的
+
+155
+00:05:33,550 --> 00:05:36,153
+we can therefore compute the probability
+因此 我们可以计算概率
+
+156
+00:05:36,153 --> 00:05:39,116
+that Y is equal to zero as well.
+Y是也等于零的
+
+157
+00:05:39,116 --> 00:05:41,209
+Concretely, because Y must
+具体地说 因为Y必须
+
+158
+00:05:41,250 --> 00:05:43,065
+be either zero or one,
+是零或一
+
+159
+00:05:43,070 --> 00:05:45,141
+we know that the probability
+我们知道的概率
+
+160
+00:05:45,141 --> 00:05:46,329
+of Y equals zero, plus the
+Y等于零 加
+
+161
+00:05:46,329 --> 00:05:47,512
+probability of Y equals
+Y的概率等于
+
+162
+00:05:47,550 --> 00:05:50,173
+one, must add up to one.
+一 必须添加一个
+
+163
+00:05:50,173 --> 00:05:51,483
+This first equation looks a
+这第一个方程看起来
+
+164
+00:05:51,483 --> 00:05:52,828
+little bit more complicated but it's
+有点复杂 但
+
+165
+00:05:52,828 --> 00:05:54,603
+basically saying that probability of
+基本上说这个概率
+
+166
+00:05:54,610 --> 00:05:56,287
+Y equals zero for a
+Y等于零对于一个
+
+167
+00:05:56,320 --> 00:05:58,319
+particular patient with features x, and
+特别的病人与特征x
+
+168
+00:05:58,360 --> 00:06:01,002
+you know, given our parameter's theta, plus the
+你知道 鉴于我们的参数θ 加上
+
+169
+00:06:01,010 --> 00:06:02,305
+probability of Y equals one for
+Y的概率等于一用于
+
+170
+00:06:02,305 --> 00:06:04,470
+that same patient which features x and you
+同一个病人的特征x和你的
+
+171
+00:06:04,471 --> 00:06:06,334
+parameters theta must add
+参数θ必须增加
+
+172
+00:06:06,360 --> 00:06:08,260
+up to one, if this equation
+一个 如果该方程
+
+173
+00:06:08,260 --> 00:06:10,171
+looks a little bit complicated feel free
+看起来有点复杂的感觉可以随时
+
+174
+00:06:10,200 --> 00:06:14,049
+to mentally imagine it without that X and theta.
+想象它没有X和θ在脑子里
+
+175
+00:06:14,049 --> 00:06:15,476
+And this is just saying that
+而这仅仅是说
+
+176
+00:06:15,480 --> 00:06:16,993
+the probability of Y equals zero plus
+Y的概率等于零加
+
+177
+00:06:16,993 --> 00:06:19,272
+the probability of Y equals one must be equal to one.
+Y的概率等于一必须是等于一
+
+178
+00:06:19,280 --> 00:06:20,365
+And we know this to be
+我们知道这是
+
+179
+00:06:20,365 --> 00:06:23,120
+true because Y has to be either zero or one.
+真正因为Y是零或一
+
+180
+00:06:23,120 --> 00:06:24,240
+And so the chance of Y
+这样的机会对于Y称为
+
+181
+00:06:24,240 --> 00:06:25,918
+being zero plus the chance
+零加的机会
+
+182
+00:06:25,930 --> 00:06:29,547
+that Y is one, you know, those two must add up to one.
+Y是一 你知道 这两个必须添加到一
+
+183
+00:06:29,547 --> 00:06:31,387
+And so if you just
+所以 如果你只是
+
+184
+00:06:31,440 --> 00:06:33,780
+take this term and move
+把这个词和
+
+185
+00:06:33,780 --> 00:06:35,409
+it to the right-hand side, then
+它移到右边 则
+
+186
+00:06:35,409 --> 00:06:37,327
+you end up with this equation
+你结束了这个等式
+
+187
+00:06:37,327 --> 00:06:38,995
+that says probability Y equals zero
+说概率y等于0
+
+188
+00:06:38,995 --> 00:06:40,502
+is one minus probability y equals
+是一个负的概率y的得数
+
+189
+00:06:40,530 --> 00:06:43,548
+and thus if our
+因此 如果我们
+
+190
+00:06:43,560 --> 00:06:46,009
+hypothesis if H of X
+假设 如果H的X
+
+191
+00:06:46,009 --> 00:06:47,775
+gives us that term you can
+为我们提供了这个词 你可以
+
+192
+00:06:47,790 --> 00:06:49,948
+therefore quite simply compute the
+因此 很简单地计算
+
+193
+00:06:49,948 --> 00:06:51,508
+probability, or compute the
+概率 或计算
+
+194
+00:06:51,510 --> 00:06:53,282
+estimated probability that Y
+估计概率Y
+
+195
+00:06:53,282 --> 00:06:55,411
+is equal to zero as well.
+是等于零
+
+196
+00:06:55,411 --> 00:06:56,720
+So you now know what
+所以 你现在知道什么是
+
+197
+00:06:56,720 --> 00:06:59,779
+the hypothesis representation is for
+假设表示为
+
+198
+00:06:59,790 --> 00:07:01,576
+logistic regression and we're seeing
+逻辑回归和我们所看到的
+
+199
+00:07:01,580 --> 00:07:03,534
+what the mathematical formula is
+数学公式是什么
+
+200
+00:07:03,534 --> 00:07:06,701
+defining the hypothesis for logistic regression.
+定义逻辑回归的假设
+
+201
+00:07:06,701 --> 00:07:07,880
+In the next video, I'd like
+在接下来的视频 我想
+
+202
+00:07:07,880 --> 00:07:09,018
+to try to give you
+试着给你
+
+203
+00:07:09,040 --> 00:07:11,091
+better intuition about what the
+更好的知识
+
+204
+00:07:11,091 --> 00:07:12,518
+hypothesis function looks like.
+假设函数看起来像
+
+205
+00:07:12,518 --> 00:07:13,606
+And I want to tell
+我想告诉
+
+206
+00:07:13,620 --> 00:07:15,294
+you something called the decision
+你的一些被称为决定
+
+207
+00:07:15,294 --> 00:07:16,700
+boundary and we'll look
+边界的东西和我们来看看
+
+208
+00:07:16,700 --> 00:07:18,846
+at some visualizations together to
+在一些可视化
+
+209
+00:07:18,846 --> 00:07:20,186
+try to get a better sense
+尝试 以获得更好的理解
+
+210
+00:07:20,186 --> 00:07:22,370
+of what this hypothesis function of
+这是假设函数中
+
+211
+00:07:22,370 --> 00:07:24,697
+logistic regression really looks like.
+逻辑回归应该有的样子
+
diff --git a/srt/6 - 3 - Decision Boundary (15 min).srt b/srt/6 - 3 - Decision Boundary (15 min).srt
new file mode 100644
index 00000000..a3fc7db7
--- /dev/null
+++ b/srt/6 - 3 - Decision Boundary (15 min).srt
@@ -0,0 +1,1986 @@
+1
+00:00:00,133 --> 00:00:02,423
+In the last video, we talked
+在过去的视频中 我们谈到
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,423 --> 00:00:06,653
+about the hypothesis representation for logistic progression.
+逻辑回归中假设函数的表示方法
+
+3
+00:00:06,700 --> 00:00:07,963
+What I'd like to do now is
+现在 我想
+
+4
+00:00:07,963 --> 00:00:09,389
+tell you about something called the
+告诉大家一个叫做
+
+5
+00:00:09,389 --> 00:00:11,370
+decision boundary, and this
+决策边界(decision boundary)的概念
+
+6
+00:00:11,380 --> 00:00:12,894
+will give us a better sense
+这个概念能更好地帮助我们
+
+7
+00:00:12,894 --> 00:00:15,017
+of what the logistic regression
+理解逻辑回归的
+
+8
+00:00:15,030 --> 00:00:17,870
+hypothesis function is computing.
+假设函数在计算什么
+
+9
+00:00:17,870 --> 00:00:20,080
+To recap, this is
+让我们回忆一下
+
+10
+00:00:20,080 --> 00:00:21,264
+what we wrote out last time,
+这是我们上次写下的公式
+
+11
+00:00:21,280 --> 00:00:22,663
+where we said that the
+当时我们说
+
+12
+00:00:22,663 --> 00:00:24,916
+hypothesis is represented as
+假设函数可以表示为
+
+13
+00:00:24,930 --> 00:00:26,119
+H of X equals G of
+h(x)=g(θTx)
+
+14
+00:00:26,119 --> 00:00:28,363
+theta transpose X, where G
+其中函数g
+
+15
+00:00:28,363 --> 00:00:29,871
+is this function called the
+被称为S形函数(sigmoid function)
+
+16
+00:00:29,871 --> 00:00:32,729
+sigmoid function, which looks like this.
+看起来是应该是这样的形状
+
+17
+00:00:32,750 --> 00:00:35,131
+So, it slowly increases from zero
+它从零开始慢慢增加至1
+
+18
+00:00:35,131 --> 00:00:38,996
+to one, asymptoting at one.
+逐渐逼近1
+
+19
+00:00:38,996 --> 00:00:40,391
+What I want to do now is
+现在让我们
+
+20
+00:00:40,391 --> 00:00:42,452
+try to understand better when
+更进一步来理解
+
+21
+00:00:42,452 --> 00:00:44,054
+this hypothesis will make
+这个假设函数何时
+
+22
+00:00:44,070 --> 00:00:45,327
+predictions that Y is
+会将y预测为1
+
+23
+00:00:45,327 --> 00:00:47,049
+equal to one versus when it
+什么时候又会将
+
+24
+00:00:47,049 --> 00:00:48,361
+might make predictions that Y
+y预测为0
+
+25
+00:00:48,361 --> 00:00:50,602
+is equal to zero and understand
+让我们更好的理解
+
+26
+00:00:50,630 --> 00:00:52,351
+better what the hypothesis function
+假设函数的应该是怎样的
+
+27
+00:00:52,351 --> 00:00:56,622
+looks like, particularly when we have more than one feature.
+特别是当我们的数据有多个特征时
+
+28
+00:00:56,640 --> 00:00:59,064
+Concretely, this hypothesis is
+具体地说 这个假设函数
+
+29
+00:00:59,064 --> 00:01:00,827
+out putting estimates of the
+输出的是
+
+30
+00:01:00,827 --> 00:01:02,057
+probability that Y is
+给定x时
+
+31
+00:01:02,060 --> 00:01:05,493
+equal to one given X is prime.
+y=1的概率
+
+32
+00:01:05,530 --> 00:01:06,807
+So if we wanted to
+因此 如果我们想
+
+33
+00:01:06,807 --> 00:01:08,181
+predict is Y equal to
+预测y=1
+
+34
+00:01:08,181 --> 00:01:09,478
+one or is Y equal
+还是等于0
+
+35
+00:01:09,478 --> 00:01:12,217
+to zero here's something we might do.
+我们可以这样做
+
+36
+00:01:12,240 --> 00:01:14,737
+When ever the hypothesis its that
+只要该假设函数
+
+37
+00:01:14,737 --> 00:01:16,412
+the problem with y being one
+输出y=1的概率
+
+38
+00:01:16,412 --> 00:01:17,570
+is greater than or equal
+大于或等于0.5
+
+39
+00:01:17,570 --> 00:01:19,340
+to 0.5 so this means
+那么这表示
+
+40
+00:01:19,350 --> 00:01:21,068
+that it is more likely to
+y更有可能
+
+41
+00:01:21,068 --> 00:01:22,295
+be y equals one than y
+等于1而不是0
+
+42
+00:01:22,295 --> 00:01:26,509
+equals zero then let's predict Y equals one.
+因此 我们预测y=1
+
+43
+00:01:26,509 --> 00:01:27,942
+And otherwise, if the probability
+在另一种情况下 如果
+
+44
+00:01:27,960 --> 00:01:30,168
+of, the estimated probability of
+预测y=1
+
+45
+00:01:30,180 --> 00:01:31,898
+Y being one is less
+的概率
+
+46
+00:01:31,898 --> 00:01:35,025
+than 0.5, then let's predict Y equals zero.
+小于0.5 那么我们应该预测y=0
+
+47
+00:01:35,025 --> 00:01:36,277
+And I chose a greater
+在这里 我选择大于等于
+
+48
+00:01:36,277 --> 00:01:39,666
+than or equal to here and less than here.
+在这里我选择小于
+
+49
+00:01:39,670 --> 00:01:41,010
+If H of X is equal
+如果h(x)的值
+
+50
+00:01:41,010 --> 00:01:43,063
+to 0.5 exactly, then
+正好等于0.5 那么
+
+51
+00:01:43,063 --> 00:01:44,670
+we could predict positive or
+我们可以预测为1
+
+52
+00:01:44,670 --> 00:01:45,820
+negative vector but a put a
+也可以预测为0
+
+53
+00:01:45,820 --> 00:01:47,464
+great deal on to here
+但是这里我选择了大于等于
+
+54
+00:01:47,464 --> 00:01:49,220
+so we default maybe to predicting
+因此我们默认
+
+55
+00:01:49,220 --> 00:01:51,459
+a positive if your
+如果h(x)等于0.5的话
+
+56
+00:01:51,459 --> 00:01:52,883
+vector is 0.5 but that's
+预测选择为1
+
+57
+00:01:52,883 --> 00:01:56,675
+a detail that really doesn't matter that much.
+这只是一个细节 不用太在意
+
+58
+00:01:56,680 --> 00:01:58,136
+What I want to do is understand
+下面 我希望大家能够
+
+59
+00:01:58,140 --> 00:01:59,273
+better when it is
+清晰地理解
+
+60
+00:01:59,273 --> 00:02:01,187
+exactly that H of
+什么时候h(x)
+
+61
+00:02:01,187 --> 00:02:02,927
+X will be greater or equal
+将大于或等于
+
+62
+00:02:02,927 --> 00:02:04,666
+to 0.5, so that
+0.5 从而
+
+63
+00:02:04,666 --> 00:02:09,111
+we end up predicting Y is equal to one.
+我们最终预测y=1
+
+64
+00:02:09,530 --> 00:02:11,525
+If we look at this plot
+如果我们看看
+
+65
+00:02:11,540 --> 00:02:14,208
+of the sigmoid function, we'll notice
+S形函数的曲线图 我们会注意到
+
+66
+00:02:14,208 --> 00:02:17,094
+that the sigmoid function, G
+S函数
+
+67
+00:02:17,094 --> 00:02:18,981
+of Z, is greater than
+只要z大于
+
+68
+00:02:18,981 --> 00:02:21,019
+or equal to 0.5
+或等于0时
+
+69
+00:02:21,030 --> 00:02:24,296
+whenever Z is
+g(z)就将大于
+
+70
+00:02:24,300 --> 00:02:25,994
+greater than or equal to zero.
+或等于0.5
+
+71
+00:02:25,994 --> 00:02:28,163
+So is in this half of
+因此 在曲线图的这半边
+
+72
+00:02:28,163 --> 00:02:29,963
+the figure that, G takes
+g的取值
+
+73
+00:02:29,963 --> 00:02:32,522
+on values that are 0.5 and higher.
+大于或等于0.5
+
+74
+00:02:32,522 --> 00:02:34,482
+This is not clear, that's the 0.5.
+因为这个交点就是0.5
+
+75
+00:02:34,482 --> 00:02:35,957
+So when Z is
+因此 当z大于0时
+
+76
+00:02:35,970 --> 00:02:38,352
+positive, G of Z,
+g(z) 也就是这个
+
+77
+00:02:38,352 --> 00:02:41,959
+the sigmoid function, is greater than or equal to 0.5.
+S形函数 是大于或等于0.5的
+
+78
+00:02:41,959 --> 00:02:44,226
+Since the hypothesis for
+由于逻辑回归的
+
+79
+00:02:44,226 --> 00:02:46,428
+logistic regression is H of
+假设函数h(x)
+
+80
+00:02:46,428 --> 00:02:48,525
+X equals G of theta
+等于g(θTx)
+
+81
+00:02:48,525 --> 00:02:50,964
+transpose X. This is
+因此
+
+82
+00:02:50,964 --> 00:02:52,163
+therefore going to be greater
+函数值将会
+
+83
+00:02:52,180 --> 00:02:54,338
+than or equal to 0.5
+大于或等于0.5
+
+84
+00:02:54,338 --> 00:02:58,329
+whenever theta transpose
+只要θ转置乘以x
+
+85
+00:02:58,340 --> 00:03:01,642
+X is greater than or equal to zero.
+大于或等于0
+
+86
+00:03:01,642 --> 00:03:03,470
+So what was shown, right,
+因此 我们看到
+
+87
+00:03:03,470 --> 00:03:05,835
+because here theta transpose X
+因为这里θ转置x
+
+88
+00:03:05,835 --> 00:03:08,113
+takes the row of Z.
+取代了z的位置
+
+89
+00:03:08,120 --> 00:03:09,543
+So what we're shown is that
+所以我们看到
+
+90
+00:03:09,543 --> 00:03:11,077
+our hypothesis is going
+我们的假设函数
+
+91
+00:03:11,077 --> 00:03:13,191
+to predict Y equals one
+将会预测y=1
+
+92
+00:03:13,200 --> 00:03:15,420
+whenever theta transpose X
+只要θ转置乘以x
+
+93
+00:03:15,420 --> 00:03:17,924
+is greater than or equal to 0.
+大于或等于0
+
+94
+00:03:17,924 --> 00:03:20,016
+Let's now consider the other
+现在让我们来考虑
+
+95
+00:03:20,016 --> 00:03:22,380
+case of when a hypothesis
+假设函数
+
+96
+00:03:22,380 --> 00:03:25,043
+will predict Y is equal to 0.
+预测y=0的情况
+
+97
+00:03:25,043 --> 00:03:27,210
+Well, by similar argument, H
+类似的
+
+98
+00:03:27,210 --> 00:03:28,987
+of X is going to be
+h(θ)将会
+
+99
+00:03:28,987 --> 00:03:30,709
+less than 0.5 whenever G
+小于0.5 只要
+
+100
+00:03:30,730 --> 00:03:32,266
+of Z is less than
+g(z)小于0.5
+
+101
+00:03:32,266 --> 00:03:34,711
+0.5 because the range
+这是因为
+
+102
+00:03:34,720 --> 00:03:36,468
+of values of Z that
+z的定义域上
+
+103
+00:03:36,480 --> 00:03:38,013
+calls Z to take on
+导致g(z)取值
+
+104
+00:03:38,020 --> 00:03:42,626
+values less that 0.5, well that's when Z is negative.
+小于0.5的部分 是z小于0的部分
+
+105
+00:03:42,626 --> 00:03:44,916
+So when G of Z is less than 0.5.
+所以当g(z)小于0.5时
+
+106
+00:03:44,916 --> 00:03:46,874
+Our hypothesis will predict
+我们的假设函数将会预测
+
+107
+00:03:46,874 --> 00:03:48,876
+that Y is equal to zero, and
+y=0
+
+108
+00:03:48,876 --> 00:03:50,540
+by similar argument to what
+根据与之前
+
+109
+00:03:50,540 --> 00:03:52,608
+we had earlier, H of
+类似的原因
+
+110
+00:03:52,608 --> 00:03:54,293
+X is equal G of
+h(x)等于
+
+111
+00:03:54,320 --> 00:03:56,932
+theta transpose X. And
+g(θTx)
+
+112
+00:03:56,932 --> 00:03:58,739
+so, we'll predict Y equals
+因此 只要
+
+113
+00:03:58,739 --> 00:04:01,029
+zero whenever this quantity
+θ转置乘以x小于0
+
+114
+00:04:01,029 --> 00:04:04,937
+theta transpose X is less than zero.
+我们就预测y等于0
+
+115
+00:04:04,940 --> 00:04:06,461
+To summarize what we just
+总结一下我们刚才所讲的
+
+116
+00:04:06,470 --> 00:04:08,377
+worked out, we saw that if
+我们看到
+
+117
+00:04:08,377 --> 00:04:09,900
+we decide to predict whether
+如果我们要决定
+
+118
+00:04:09,900 --> 00:04:11,076
+Y is equal to one or
+预测y=1
+
+119
+00:04:11,076 --> 00:04:12,396
+Y is equal to zero,
+还是y=0
+
+120
+00:04:12,400 --> 00:04:14,216
+depending on whether the estimated
+取决于
+
+121
+00:04:14,216 --> 00:04:15,807
+probability is greater than
+y=1的概率
+
+122
+00:04:15,807 --> 00:04:17,845
+or equal 0.5, or whether
+大于或等于0.5
+
+123
+00:04:17,845 --> 00:04:19,602
+it's less than 0.5, then
+还是小于0.5
+
+124
+00:04:19,602 --> 00:04:20,935
+that's the same as saying that
+这其实就等于说
+
+125
+00:04:20,935 --> 00:04:22,920
+will predict Y equals 1
+我们将预测y=1
+
+126
+00:04:22,920 --> 00:04:25,010
+whenever theta transpose axis greater
+只需要θ转置乘以x
+
+127
+00:04:25,010 --> 00:04:26,002
+than or equal to 0,
+大于或等于0
+
+128
+00:04:26,002 --> 00:04:27,815
+and we will predict Y is
+另一方面我们将预测y=0
+
+129
+00:04:27,815 --> 00:04:30,025
+equal to zero whenever theta transpose X
+只需要θ转置乘以x
+
+130
+00:04:30,025 --> 00:04:32,953
+is less than zero.
+小于0
+
+131
+00:04:32,953 --> 00:04:34,192
+Let's use this to better
+通过这些 我们能更好地
+
+132
+00:04:34,192 --> 00:04:36,890
+understand how the hypothesis
+理解如何利用逻辑回归的假设函数
+
+133
+00:04:36,890 --> 00:04:40,029
+of logistic regression makes those predictions.
+来进行预测
+
+134
+00:04:40,040 --> 00:04:41,535
+Now, let's suppose we have
+现在假设我们有
+
+135
+00:04:41,535 --> 00:04:43,113
+a training set like that shown
+一个训练集
+
+136
+00:04:43,113 --> 00:04:45,165
+on the slide, and suppose
+就像幻灯片上的这个
+
+137
+00:04:45,165 --> 00:04:47,278
+our hypothesis is H of
+接下来我们假设我们的假设函数是
+
+138
+00:04:47,278 --> 00:04:48,678
+X equals G of theta
+h(x)等于g()
+
+139
+00:04:48,678 --> 00:04:50,254
+zero, plus theta one X1
+括号里面是θ0加上θ1x1
+
+140
+00:04:50,260 --> 00:04:52,854
+plus theta two X2.
+加上θ2乘以x2
+
+141
+00:04:52,854 --> 00:04:54,516
+We haven't talked yet about how
+目前我们还没有谈到
+
+142
+00:04:54,516 --> 00:04:56,725
+to fit the parameters of this model.
+如何拟合此模型中的参数
+
+143
+00:04:56,725 --> 00:04:59,355
+We'll talk about that in the next video.
+我们将在下一个视频中讨论这个问题
+
+144
+00:04:59,355 --> 00:05:01,770
+But suppose that variable procedure
+但是假设我们
+
+145
+00:05:01,770 --> 00:05:03,575
+to be specified, we end
+已经拟合好了参数
+
+146
+00:05:03,575 --> 00:05:06,224
+up choosing the following values for the parameters.
+我们最终选择了如下值
+
+147
+00:05:06,224 --> 00:05:07,861
+Let's say we choose theta zero
+比方说 我们选择θ0
+
+148
+00:05:07,861 --> 00:05:09,750
+equals three, theta one
+等于-3 θ1
+
+149
+00:05:09,750 --> 00:05:13,553
+equals one, theta two equals one.
+等于1 θ2等于1
+
+150
+00:05:13,553 --> 00:05:15,430
+So this means that my parameter
+因此 这意味着我的
+
+151
+00:05:15,430 --> 00:05:17,263
+vector is going to be
+参数向量将是
+
+152
+00:05:17,263 --> 00:05:22,963
+theta equals minus 311.
+θ等于[-3 1 1]
+
+153
+00:05:24,140 --> 00:05:27,055
+So, we're given this
+这样 我们有了
+
+154
+00:05:27,060 --> 00:05:30,115
+choice of my hypothesis parameters,
+这样的一个参数选择
+
+155
+00:05:30,115 --> 00:05:32,243
+let's try to figure out where
+让我们试着找出
+
+156
+00:05:32,280 --> 00:05:33,778
+a hypothesis will end up
+假设函数何时将
+
+157
+00:05:33,778 --> 00:05:35,493
+predicting y equals 1 and where it
+预测y等于1
+
+158
+00:05:35,493 --> 00:05:39,055
+will end up predicting y equals 0.
+何时又将预测y等于0
+
+159
+00:05:39,060 --> 00:05:40,660
+Using the formulas that we
+使用我们在
+
+160
+00:05:40,660 --> 00:05:42,900
+worked on the previous slide, we know
+在上一张幻灯片上展示的公式 我们知道
+
+161
+00:05:42,900 --> 00:05:44,539
+that Y equals 1 is
+y更有可能是1
+
+162
+00:05:44,539 --> 00:05:45,849
+more likely, that is the
+或者说
+
+163
+00:05:45,849 --> 00:05:47,404
+probability that Y equals
+y等于1的概率
+
+164
+00:05:47,404 --> 00:05:48,943
+1 is greater than 0.5
+大于0.5
+
+165
+00:05:48,950 --> 00:05:51,553
+or greater than or equal to 0.5.
+或者大于等于0.5
+
+166
+00:05:51,570 --> 00:05:55,256
+Whenever theta transpose x
+只要θ转置x
+
+167
+00:05:55,256 --> 00:05:57,211
+is greater than zero.
+大于0
+
+168
+00:05:57,230 --> 00:05:58,729
+And this formula that I
+我刚刚加了下划线的
+
+169
+00:05:58,729 --> 00:06:00,846
+just underlined minus three
+这个公式
+
+170
+00:06:00,850 --> 00:06:03,033
+plus X1 plus X2 is,
+-3加上x1再加上x2
+
+171
+00:06:03,033 --> 00:06:05,216
+of course, theta transpose
+当然就是θ转置x
+
+172
+00:06:05,220 --> 00:06:07,014
+X when theta is equal
+这是当θ等于
+
+173
+00:06:07,014 --> 00:06:09,746
+to this value of the parameters
+我们选择的这个参数值时
+
+174
+00:06:09,760 --> 00:06:12,516
+that we just chose.
+θ转置乘以x的表达
+
+175
+00:06:12,516 --> 00:06:14,640
+So, for any example, for
+因此 举例来说
+
+176
+00:06:14,640 --> 00:06:16,426
+any example with features X1
+对于任何样本
+
+177
+00:06:16,426 --> 00:06:19,300
+and X2 that satisfy this
+只要x1和x2满足
+
+178
+00:06:19,300 --> 00:06:21,187
+equation that minus 3
+这个等式 也就是-3
+
+179
+00:06:21,187 --> 00:06:23,526
+plus X1 plus X2
+加上x1再加x2
+
+180
+00:06:23,530 --> 00:06:24,723
+is greater than or equal to 0.
+大于等于0
+
+181
+00:06:24,723 --> 00:06:27,028
+Our hypothesis will think
+我们的假设函数就会认为
+
+182
+00:06:27,028 --> 00:06:28,066
+that Y equals 1 is
+y等于1
+
+183
+00:06:28,066 --> 00:06:32,463
+more likely, or will predict that Y is equal to one.
+的可能性较大 或者说将预测y=1
+
+184
+00:06:32,463 --> 00:06:34,505
+We can also take minus three
+我们也可以
+
+185
+00:06:34,505 --> 00:06:35,752
+and bring this to the right
+将-3放到不等式右边
+
+186
+00:06:35,760 --> 00:06:37,703
+and rewrite this as X1
+并改写为x1
+
+187
+00:06:37,740 --> 00:06:41,435
+plus X2 is greater than or equal to three.
+加号x2大于等于3
+
+188
+00:06:41,435 --> 00:06:43,584
+And so, equivalently, we found
+这样是等价的 我们发现
+
+189
+00:06:43,590 --> 00:06:45,826
+that this hypothesis will predict
+这一假设函数将预测
+
+190
+00:06:45,826 --> 00:06:47,561
+Y equals one whenever X1
+y=1 只要
+
+191
+00:06:47,561 --> 00:06:51,854
+plus X2 is greater than or equal to three.
+x1+x2大于等于3
+
+192
+00:06:51,870 --> 00:06:54,893
+Let's see what that means on the figure.
+让我们来看看这在图上是什么意思
+
+193
+00:06:54,893 --> 00:06:57,209
+If I write down the equation,
+如果我写下等式
+
+194
+00:06:57,209 --> 00:07:00,217
+X1 plus X2 equals three,
+x1+x2等于3
+
+195
+00:07:00,230 --> 00:07:03,356
+this defines the equation of a straight line.
+这将定义一条直线
+
+196
+00:07:03,360 --> 00:07:05,040
+And if I draw what that straight
+如果我画出这条直线
+
+197
+00:07:05,040 --> 00:07:07,695
+line looks like, it gives
+它将表示为
+
+198
+00:07:07,730 --> 00:07:10,116
+me the following line which passes
+这样一条线 它通过
+
+199
+00:07:10,116 --> 00:07:11,627
+through 3 and 3 on
+通过x1轴上的3
+
+200
+00:07:11,627 --> 00:07:14,946
+the X1 and the X2 axis.
+和x2轴上的3
+
+201
+00:07:15,886 --> 00:07:17,250
+So the part of the input space,
+因此 这部分的输入样本空间
+
+202
+00:07:17,270 --> 00:07:18,827
+the part of the
+这一部分的
+
+203
+00:07:18,827 --> 00:07:21,553
+X1, X2 plane that corresponds
+X1-X2平面
+
+204
+00:07:21,553 --> 00:07:24,948
+to when X1 plus X2 is greater than or equal to three.
+对应x1加x2大于等于3
+
+205
+00:07:24,948 --> 00:07:27,195
+That's going to be this very top plane.
+这将是上面这个半平面
+
+206
+00:07:27,210 --> 00:07:29,442
+That is everything to the
+也就是所有
+
+207
+00:07:29,442 --> 00:07:30,701
+up, and everything to the upper
+上方和所有右侧的部分
+
+208
+00:07:30,701 --> 00:07:34,109
+right portion of this magenta line that I just drew.
+相对我画的这条洋红色线来说
+
+209
+00:07:34,109 --> 00:07:35,584
+And so, the region where our
+所以
+
+210
+00:07:35,610 --> 00:07:37,135
+hypothesis will predict Y
+我们的假设函数预测
+
+211
+00:07:37,135 --> 00:07:38,324
+equals 1 is this
+y等于1的区域
+
+212
+00:07:38,330 --> 00:07:40,023
+region, you know, is
+就是这片区域
+
+213
+00:07:40,023 --> 00:07:41,586
+really this huge region, this
+是这个巨大的区域
+
+214
+00:07:41,620 --> 00:07:44,393
+half-space over to the upper right.
+是右上方的这个半平面
+
+215
+00:07:44,393 --> 00:07:45,483
+And let me just write that down.
+让我把它写下来
+
+216
+00:07:45,483 --> 00:07:47,395
+I'm gonna call this the Y
+我将称它为
+
+217
+00:07:47,395 --> 00:07:50,263
+equals one region, and in
+y=1区域
+
+218
+00:07:50,263 --> 00:07:54,293
+contrast the region where
+与此相对
+
+219
+00:07:54,293 --> 00:07:56,500
+X1 plus X2 is
+x1加x2
+
+220
+00:07:56,510 --> 00:07:58,691
+less than three, that's when
+小于3的区域
+
+221
+00:07:58,691 --> 00:08:00,090
+we will predict that Y,
+也就是我们预测
+
+222
+00:08:00,110 --> 00:08:01,988
+Y is equal to zero, and
+y等于0的区域
+
+223
+00:08:01,988 --> 00:08:04,679
+that corresponds to this region.
+是这一片区域
+
+224
+00:08:04,710 --> 00:08:06,096
+You know, itt's really a half-plane, but
+你看到 这也是一个半平面
+
+225
+00:08:06,096 --> 00:08:08,530
+that region on the left is
+左侧的这个半平面
+
+226
+00:08:08,530 --> 00:08:11,736
+the region where our hypothesis predict Y equals 0.
+是我们的假设函数预测y等于0的区域
+
+227
+00:08:11,740 --> 00:08:13,431
+I want to give
+我想给这条线一个名字
+
+228
+00:08:13,431 --> 00:08:16,475
+this line, this magenta line that I drew a name.
+就是我刚刚画的这条洋红色线
+
+229
+00:08:16,475 --> 00:08:19,458
+This line there is called
+这条线被称为
+
+230
+00:08:19,458 --> 00:08:24,648
+the decision boundary.
+决策边界(decision boundary)
+
+231
+00:08:24,648 --> 00:08:27,085
+And concretely, this straight line
+具体地说 这条直线
+
+232
+00:08:27,085 --> 00:08:28,468
+X1 plus X equals 3.
+满足x1+x2=3
+
+233
+00:08:28,470 --> 00:08:31,170
+That corresponds to the set of points.
+它对应一系列的点
+
+234
+00:08:31,170 --> 00:08:33,334
+So that corresponds to the region
+它对应
+
+235
+00:08:33,334 --> 00:08:34,606
+where H of X is equal
+h(x)等于
+
+236
+00:08:34,606 --> 00:08:37,000
+to 0.5 exactly and
+0.5的区域
+
+237
+00:08:37,000 --> 00:08:38,731
+the decision boundary, that is
+决策边界 也就是
+
+238
+00:08:38,750 --> 00:08:40,696
+this straight line, that's the
+这条直线
+
+239
+00:08:40,720 --> 00:08:42,772
+line that separates the region
+将整个平面分成了两部分
+
+240
+00:08:42,772 --> 00:08:44,659
+where the hypothesis predicts Y equals
+其中一片区域假设函数预测y等于1
+
+241
+00:08:44,659 --> 00:08:46,433
+one from the region
+而另一片区域
+
+242
+00:08:46,433 --> 00:08:49,773
+where the hypothesis predicts that Y is equal to 0.
+假设函数预测y等于0
+
+243
+00:08:49,773 --> 00:08:51,387
+And just to be clear.
+我想澄清一下
+
+244
+00:08:51,390 --> 00:08:53,353
+The decision boundary is a
+决策边界是
+
+245
+00:08:53,353 --> 00:08:57,458
+property of the hypothesis
+假设函数的一个属性
+
+246
+00:08:57,458 --> 00:09:00,705
+including the parameters theta 0, theta 1, theta 2.
+它包括参数θ0 θ1 θ2
+
+247
+00:09:00,720 --> 00:09:03,216
+And in the figure I drew a training set.
+在这幅图中 我画了一个训练集
+
+248
+00:09:03,240 --> 00:09:06,455
+I drew a data set in order to help the visualization.
+我画了一组数据 让它更加可视化
+
+249
+00:09:06,480 --> 00:09:07,721
+But even if we take
+但是 即使我们
+
+250
+00:09:07,721 --> 00:09:09,276
+away the data set, you know
+去掉这个数据集
+
+251
+00:09:09,280 --> 00:09:11,076
+decision boundary and a
+这条决策边界
+
+252
+00:09:11,076 --> 00:09:12,299
+region where we predict Y
+和我们预测y等于1
+
+253
+00:09:12,300 --> 00:09:14,321
+equals 1 versus Y equals zero.
+与y等于0的区域
+
+254
+00:09:14,321 --> 00:09:15,513
+That's a property of the
+它们都是
+
+255
+00:09:15,513 --> 00:09:16,838
+hypothesis and of the
+假设函数的属性
+
+256
+00:09:16,838 --> 00:09:18,804
+parameters of the hypothesis, and
+决定于其参数
+
+257
+00:09:18,820 --> 00:09:22,163
+not a property of the data set.
+它不是数据集的属性
+
+258
+00:09:22,163 --> 00:09:23,606
+Later on, of course, we'll talk
+当然 我们后面还将讨论
+
+259
+00:09:23,606 --> 00:09:24,683
+about how to fit the
+如何拟合参数
+
+260
+00:09:24,683 --> 00:09:26,736
+parameters and there we'll
+那时 我们将
+
+261
+00:09:26,736 --> 00:09:28,222
+end up using the training set,
+使用训练集
+
+262
+00:09:28,222 --> 00:09:32,547
+or using our data, to determine the value of the parameters.
+使用我们的数据 来确定参数的取值
+
+263
+00:09:32,563 --> 00:09:34,550
+But once we have particular values
+但是 一旦我们有确定的参数取值
+
+264
+00:09:34,550 --> 00:09:37,283
+for the parameters: theta 0, theta 1, theta 2.
+有确定的θ0 θ1 θ2
+
+265
+00:09:37,290 --> 00:09:39,645
+Then that completely defines
+我们就将完全确定
+
+266
+00:09:39,645 --> 00:09:41,721
+the decision boundary and we
+决策边界
+
+267
+00:09:41,721 --> 00:09:43,117
+don't actually need to plot
+这时 我们实际上并不需要
+
+268
+00:09:43,117 --> 00:09:44,886
+a training set in order
+在绘制决策边界的时候
+
+269
+00:09:44,886 --> 00:09:48,180
+to plot the decision boundary.
+绘制训练集
+
+270
+00:09:49,620 --> 00:09:50,626
+Let's now look at a more
+现在 让我们看一个
+
+271
+00:09:50,626 --> 00:09:52,398
+complex example where, as
+更复杂的例子
+
+272
+00:09:52,420 --> 00:09:54,039
+usual, I have crosses to
+和往常一样 我使用十字 (X)
+
+273
+00:09:54,040 --> 00:09:55,932
+denote my positive examples and
+表示我的正样本
+
+274
+00:09:55,932 --> 00:09:58,926
+O's to denote my negative examples.
+圆圈 (O) 的表示我的负样本
+
+275
+00:09:58,926 --> 00:10:00,696
+Given a training set like this,
+给定这样的一个训练集
+
+276
+00:10:00,710 --> 00:10:02,873
+how can I get logistic regression
+我怎样才能使用逻辑回归
+
+277
+00:10:02,900 --> 00:10:05,550
+to fit this sort of data?
+拟合这些数据呢?
+
+278
+00:10:05,550 --> 00:10:07,168
+Earlier, when we were talking about
+早些时候 当我们谈论
+
+279
+00:10:07,168 --> 00:10:09,120
+polynomial regression or when
+多项式回归
+
+280
+00:10:09,120 --> 00:10:10,993
+we're linear regression, we talked
+或线性回归时
+
+281
+00:10:10,993 --> 00:10:12,530
+about how we can add extra
+我们谈到可以添加额外的
+
+282
+00:10:12,530 --> 00:10:15,561
+higher order polynomial terms to the features.
+高阶多项式项
+
+283
+00:10:15,561 --> 00:10:18,996
+And we can do the same for logistic regression.
+同样我们也可以对逻辑回归使用相同的方法
+
+284
+00:10:18,996 --> 00:10:22,220
+Concretely, let's say my hypothesis looks like this.
+具体地说 假如我的假设函数是这样的
+
+285
+00:10:22,220 --> 00:10:23,718
+Where I've added two extra
+我已经添加了两个额外的特征
+
+286
+00:10:23,718 --> 00:10:27,691
+features, X1 squared and X2 squared, to my features.
+x1平方和x2平方
+
+287
+00:10:27,691 --> 00:10:29,811
+So that I now have 5 parameters,
+所以 我现在有5个参数
+
+288
+00:10:29,811 --> 00:10:32,676
+theta 0 through theta 4.
+θ0 到 θ4
+
+289
+00:10:32,676 --> 00:10:34,936
+As before, we'll defer to
+之前讲过 我们会
+
+290
+00:10:34,936 --> 00:10:37,398
+the next video our discussion
+在下一个视频中讨论
+
+291
+00:10:37,420 --> 00:10:39,289
+on how to automatically choose
+如何自动选择
+
+292
+00:10:39,289 --> 00:10:42,511
+values for the parameters theta 0 through theta 4.
+参数θ0到θ4的取值
+
+293
+00:10:42,511 --> 00:10:44,326
+But let's say that
+但是 假设我
+
+294
+00:10:44,326 --> 00:10:46,691
+very procedure to be specified,
+已经使用了这个方法
+
+295
+00:10:46,691 --> 00:10:49,243
+I end up choosing theta 0
+我最终选择θ0等于-1
+
+296
+00:10:49,243 --> 00:10:51,324
+equals minus 1, theta 1
+θ1等于0
+
+297
+00:10:51,324 --> 00:10:52,921
+equals 0, theta 2
+θ2等于0
+
+298
+00:10:52,921 --> 00:10:55,664
+equals 0, theta 3 equals
+θ3等于1
+
+299
+00:10:55,664 --> 00:10:58,039
+1, and theta 4 equals 1.
+θ4等于1
+
+300
+00:10:58,039 --> 00:11:00,223
+What this means
+这意味着
+
+301
+00:11:00,223 --> 00:11:02,160
+is that with this particular choice
+在这个参数选择下
+
+302
+00:11:02,160 --> 00:11:04,566
+of parameters, my parameter
+我的参数向量
+
+303
+00:11:04,566 --> 00:11:09,422
+vector theta looks like minus 1, 0, 0, 1, 1.
+θ将是[-1 0 0 1 1]
+
+304
+00:11:10,550 --> 00:11:12,356
+Following our earlier discussion, this
+根据我们前面的讨论
+
+305
+00:11:12,356 --> 00:11:14,439
+means that my hypothesis will predict
+这意味着我的假设函数将预测
+
+306
+00:11:14,439 --> 00:11:16,407
+that Y is equal to 1
+y=1
+
+307
+00:11:16,407 --> 00:11:18,259
+whenever minus 1 plus X1
+只要-1加x1平方
+
+308
+00:11:18,259 --> 00:11:21,088
+squared plus X2 squared is greater than or equal to 0.
+加x2平方大于等于0
+
+309
+00:11:21,088 --> 00:11:24,184
+This is whenever theta transpose
+也就是θ转置
+
+310
+00:11:24,184 --> 00:11:26,346
+times my theta transpose
+我的θ转置
+
+311
+00:11:26,350 --> 00:11:30,030
+my features is greater than or equal to 0.
+乘以特征变量大于等于0的时候
+
+312
+00:11:30,060 --> 00:11:31,685
+And if I take minus
+如果我将
+
+313
+00:11:31,690 --> 00:11:32,950
+1 and just bring this to
+-1放到不等式右侧
+
+314
+00:11:32,950 --> 00:11:34,810
+the right, I'm saying that
+我可以说
+
+315
+00:11:34,810 --> 00:11:36,642
+my hypothesis will predict that
+我的假设函数将预测
+
+316
+00:11:36,642 --> 00:11:38,100
+Y is equal to 1
+y=1
+
+317
+00:11:38,120 --> 00:11:40,710
+whenever X1 squared plus
+只要x1平方加
+
+318
+00:11:40,710 --> 00:11:43,648
+X2 squared is greater than or equal to 1.
+x2的平方大于等于1
+
+319
+00:11:43,648 --> 00:11:47,990
+So, what does decision boundary look like?
+那么决策边界是什么样子的呢?
+
+320
+00:11:47,990 --> 00:11:49,767
+Well, if you were to plot the
+好吧 如果我们绘制
+
+321
+00:11:49,780 --> 00:11:51,905
+curve for X1 squared plus
+x1平方加
+
+322
+00:11:51,905 --> 00:11:53,665
+X2 squared equals 1.
+x2的平方等于1的曲线
+
+323
+00:11:53,665 --> 00:11:55,531
+Some of you will
+你们有些人已经
+
+324
+00:11:55,531 --> 00:11:58,294
+that is the equation for
+知道这个方程对应
+
+325
+00:11:58,294 --> 00:12:01,296
+a circle of radius
+半径为1
+
+326
+00:12:01,296 --> 00:12:04,163
+1 centered around the origin.
+原点为中心的圆
+
+327
+00:12:04,163 --> 00:12:08,382
+So, that is my decision boundary.
+所以 这就是我们的决策边界
+
+328
+00:12:10,410 --> 00:12:12,190
+And everything outside the
+圆外面的一切
+
+329
+00:12:12,250 --> 00:12:14,207
+circle I'm going to predict
+我将预测
+
+330
+00:12:14,207 --> 00:12:15,404
+as Y equals 1.
+y=1
+
+331
+00:12:15,404 --> 00:12:17,706
+So out here is, you know, my
+所以这里就是
+
+332
+00:12:17,706 --> 00:12:19,337
+Y equals 1 region.
+y等于1的区域
+
+333
+00:12:19,360 --> 00:12:22,693
+I'm going to predict Y equals 1 out here.
+我们在这里预测y=1
+
+334
+00:12:22,693 --> 00:12:24,294
+And inside the circle is where
+而在圆里面
+
+335
+00:12:24,310 --> 00:12:27,786
+I'll predict Y is equal to 0.
+我会预测y=0
+
+336
+00:12:27,790 --> 00:12:30,060
+So, by adding these more
+因此 通过增加这些
+
+337
+00:12:30,060 --> 00:12:33,163
+complex or these polynomial terms to my features as well.
+复杂的多项式特征变量
+
+338
+00:12:33,163 --> 00:12:35,040
+I can get more complex decision
+我可以得到更复杂的决定边界
+
+339
+00:12:35,040 --> 00:12:36,550
+boundaries that don't just
+而不只是
+
+340
+00:12:36,550 --> 00:12:39,560
+try to separate the positive and negative examples of a straight line.
+用直线分开正负样本
+
+341
+00:12:39,560 --> 00:12:41,317
+I can get in this example
+在这个例子中 我可以得到
+
+342
+00:12:41,317 --> 00:12:44,258
+a decision boundary that's a circle.
+一个圆形的决策边界
+
+343
+00:12:44,258 --> 00:12:46,010
+Once again the decision boundary
+再次强调 决策边界
+
+344
+00:12:46,010 --> 00:12:47,888
+is a property not of
+不是训练集的属性
+
+345
+00:12:47,888 --> 00:12:51,636
+the training set, but of the hypothesis and of the parameters.
+而是假设本身及其参数的属性
+
+346
+00:12:51,640 --> 00:12:53,115
+So long as we've
+只要我们
+
+347
+00:12:53,115 --> 00:12:55,389
+given my parameter vector theta,
+给定了参数向量θ
+
+348
+00:12:55,389 --> 00:12:57,185
+that defines the decision
+圆形的决定边界
+
+349
+00:12:57,185 --> 00:12:59,208
+boundary which is the circle.
+就确定了
+
+350
+00:12:59,210 --> 00:13:03,052
+But the training set is not what we use to define decision boundary.
+我们不是用训练集来定义的决策边界
+
+351
+00:13:03,052 --> 00:13:06,563
+The training set may be used to fit the parameters theta.
+我们用训练集来拟合参数θ
+
+352
+00:13:06,563 --> 00:13:08,632
+We'll talk about how to do that later.
+以后我们将谈论如何做到这一点
+
+353
+00:13:08,632 --> 00:13:09,858
+But once you have the
+但是 一旦你有
+
+354
+00:13:09,858 --> 00:13:13,638
+parameters theta, that is what defines the decision boundary.
+参数θ它就确定了决策边界
+
+355
+00:13:13,638 --> 00:13:16,388
+Let me put back the training set
+让我重新显示训练集
+
+356
+00:13:16,400 --> 00:13:18,587
+just for visualization.
+以方便可视化
+
+357
+00:13:18,587 --> 00:13:22,313
+And finally, let's look at a more complex example.
+最后 让我们来看看一个更复杂的例子
+
+358
+00:13:22,320 --> 00:13:23,303
+So can we come up
+我们可以得到
+
+359
+00:13:23,303 --> 00:13:26,538
+with even more complex decision boundaries and this?
+更复杂的决策边界吗?
+
+360
+00:13:26,538 --> 00:13:28,418
+If I have even higher
+如果我有
+
+361
+00:13:28,420 --> 00:13:31,155
+order polynomial terms, so things
+高阶多项式特征变量
+
+362
+00:13:31,155 --> 00:13:34,505
+like X1 squared, X1
+比如x1平方
+
+363
+00:13:34,505 --> 00:13:36,604
+squared X2, X1 squared
+x1平方乘以x2 x1平方乘以x2平方
+
+364
+00:13:36,604 --> 00:13:37,826
+X2 squared, and so on.
+等等
+
+365
+00:13:37,826 --> 00:13:39,001
+If I have much higher order
+如果我有更高阶
+
+366
+00:13:39,001 --> 00:13:41,574
+polynomials, then it's possible
+多项式 那么可以证明
+
+367
+00:13:41,574 --> 00:13:42,856
+to show that you can get
+你将得到
+
+368
+00:13:42,856 --> 00:13:45,268
+even more complex decision boundaries and
+更复杂的决策边界
+
+369
+00:13:45,268 --> 00:13:46,963
+logistic regression can be
+而逻辑回归
+
+370
+00:13:46,963 --> 00:13:48,480
+used to find the zero boundaries
+可以用于找到决策边界
+
+371
+00:13:48,500 --> 00:13:50,093
+that may, for example, be
+例如
+
+372
+00:13:50,093 --> 00:13:52,085
+an ellipse like that, or
+这样一个椭圆
+
+373
+00:13:52,085 --> 00:13:53,503
+maybe with a different setting of
+或者参数不同的椭圆
+
+374
+00:13:53,503 --> 00:13:55,453
+the parameters, maybe you
+也许你
+
+375
+00:13:55,453 --> 00:13:57,834
+can get instead a different decision boundary that
+可以得到一个不同的决定边界
+
+376
+00:13:57,840 --> 00:13:59,776
+may even look like, you know, some funny
+像这个样子
+
+377
+00:13:59,776 --> 00:14:04,145
+shape like that.
+一些有趣的形状
+
+378
+00:14:04,145 --> 00:14:06,423
+Or for even more complex examples
+或者更为复杂的例子
+
+379
+00:14:06,423 --> 00:14:08,915
+you can also get decision boundaries
+你也可以得到决策边界
+
+380
+00:14:08,950 --> 00:14:10,381
+that can look like, you know,
+看起来这样
+
+381
+00:14:10,390 --> 00:14:12,045
+more complex shapes like that.
+这样更复杂的形状
+
+382
+00:14:12,045 --> 00:14:13,365
+Where everything in here you
+在这个区域
+
+383
+00:14:13,365 --> 00:14:15,453
+predict Y equals 1, and
+你预测y=1
+
+384
+00:14:15,453 --> 00:14:17,531
+everything outside you predict Y equals 0.
+在这个区域外面你预测y=0
+
+385
+00:14:17,531 --> 00:14:19,556
+So these higher order polynomial
+因此 这些高阶多项式
+
+386
+00:14:19,560 --> 00:14:23,060
+features you can get very complex decision boundaries.
+特征变量 可以让你得到非常复杂的决策边界
+
+387
+00:14:23,070 --> 00:14:24,786
+So with these visualizations, I
+因此 通过这些可视化图形
+
+388
+00:14:24,786 --> 00:14:26,163
+hope that gives you a
+我希望告诉你
+
+389
+00:14:26,163 --> 00:14:28,623
+what's the range of hypothesis
+什么范围的假设函数
+
+390
+00:14:28,623 --> 00:14:30,676
+functions we can represent using
+我们可以使用
+
+391
+00:14:30,676 --> 00:14:34,966
+the representation that we have for logistic regression.
+逻辑回归来表示
+
+392
+00:14:34,966 --> 00:14:37,713
+Now that we know what H of X can represent.
+现在我们知道了h(x)表示什么
+
+393
+00:14:37,713 --> 00:14:39,004
+What I'd like to do next in
+在下一个视频中
+
+394
+00:14:39,004 --> 00:14:40,560
+the following video is talk
+我将介绍
+
+395
+00:14:40,560 --> 00:14:44,096
+about how to automatically choose the parameters theta.
+如何自动选择参数θ
+
+396
+00:14:44,110 --> 00:14:45,570
+So that given a training
+使我们能在给定一个训练集时
+
+397
+00:14:45,570 --> 00:14:49,359
+set we can automatically fit the parameters to our data.
+我们可以根据数据自动拟合参数
+
diff --git a/srt/6 - 4 - Cost Function (11 min).srt b/srt/6 - 4 - Cost Function (11 min).srt
new file mode 100644
index 00000000..34a5bd31
--- /dev/null
+++ b/srt/6 - 4 - Cost Function (11 min).srt
@@ -0,0 +1,1661 @@
+1
+00:00:00,160 --> 00:00:01,704
+In this video we'll talk about
+在这段视频中 我们要讲
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,704 --> 00:00:04,010
+how to fit the parameters theta
+如何拟合逻辑回归
+
+3
+00:00:04,040 --> 00:00:05,869
+for logistic regression.
+模型的参数θ
+
+4
+00:00:05,880 --> 00:00:06,982
+In particular, I'd like to
+具体来说 我要定义
+
+5
+00:00:07,020 --> 00:00:10,386
+define the optimization objective or the
+用来拟合参数的
+
+6
+00:00:10,400 --> 00:00:14,470
+cost function that we'll use to fit the parameters.
+优化目标或者叫代价函数
+
+7
+00:00:15,390 --> 00:00:17,370
+Here's to supervised learning problem
+这便是监督学习问题中的
+
+8
+00:00:17,370 --> 00:00:19,892
+of fitting a logistic regression model.
+逻辑回归模型的拟合问题
+
+9
+00:00:19,960 --> 00:00:22,210
+We have a training set
+我们有一个训练集
+
+10
+00:00:22,210 --> 00:00:24,964
+of M training examples.
+里面有m个训练样本
+
+11
+00:00:24,964 --> 00:00:26,577
+And as usual each of
+像以前一样
+
+12
+00:00:26,577 --> 00:00:28,130
+our examples is represented by
+我们的每个样本
+
+13
+00:00:28,150 --> 00:00:32,830
+feature vector that's N plus 1 dimensional.
+用n+1维的特征向量表示
+
+14
+00:00:32,830 --> 00:00:35,133
+And as usual we have
+同样和以前一样
+
+15
+00:00:35,180 --> 00:00:36,498
+X 0 equals 1.
+x0 = 1
+
+16
+00:00:36,498 --> 00:00:38,315
+Our first feature, or our 0
+第一个特征变量
+
+17
+00:00:38,315 --> 00:00:39,951
+feature is always equal to 1,
+或者说第0个特征变量 一直是1
+
+18
+00:00:39,970 --> 00:00:41,203
+and because this is a
+而且因为这是一个分类问题
+
+19
+00:00:41,203 --> 00:00:43,335
+classification problem, our training
+我们的训练集
+
+20
+00:00:43,350 --> 00:00:44,999
+set has the property that
+具有这样的特征
+
+21
+00:00:45,010 --> 00:00:48,422
+every label Y, is either 0 or 1.
+所有的y 不是0就是1
+
+22
+00:00:48,422 --> 00:00:50,576
+This is a hypothesis
+这是一个假设函数
+
+23
+00:00:50,576 --> 00:00:52,007
+and the parameters of the
+它的参数
+
+24
+00:00:52,007 --> 00:00:54,460
+hypothesis is this theta over here.
+是这里的这个θ
+
+25
+00:00:54,490 --> 00:00:55,572
+And the question I want
+我要说的问题是
+
+26
+00:00:55,610 --> 00:00:57,339
+to talk about is given this
+对于这个给定的训练集
+
+27
+00:00:57,340 --> 00:00:58,846
+training set how do
+我们如何选择
+
+28
+00:00:58,880 --> 00:01:02,482
+we choose, or how do we fit the parameters theta?
+或者说如何拟合参数θ
+
+29
+00:01:02,510 --> 00:01:04,125
+Back when we were developing the
+以前我们推导线性回归时
+
+30
+00:01:04,125 --> 00:01:08,463
+linear regression model, we use the following cost function.
+使用了这个代价函数
+
+31
+00:01:08,480 --> 00:01:10,868
+I've written this slightly differently, where
+我把这个写成稍微有点儿不同的形式
+
+32
+00:01:10,900 --> 00:01:12,663
+instead of 1/2m, I've
+不写原先的1/2m
+
+33
+00:01:12,670 --> 00:01:16,440
+taken the 1/2 and put it inside the summation instead.
+我把1/2放到求和符号里面了
+
+34
+00:01:16,440 --> 00:01:17,440
+Now, I want to use
+现在我想用
+
+35
+00:01:17,440 --> 00:01:19,132
+an alternative way of writing
+另一种方法
+
+36
+00:01:19,140 --> 00:01:20,663
+out this cost function which is
+来写代价函数
+
+37
+00:01:20,700 --> 00:01:22,009
+that instead of writing out
+去掉这个平方项
+
+38
+00:01:22,030 --> 00:01:23,920
+this squared and return here,
+把这里写成
+
+39
+00:01:23,920 --> 00:01:27,100
+let's write here, cost of
+这样的形式
+
+40
+00:01:28,310 --> 00:01:31,476
+H of X comma
+(具体公式请看屏幕)
+
+41
+00:01:31,500 --> 00:01:33,605
+Y, and I'm going
+(具体公式请看屏幕)
+
+42
+00:01:33,605 --> 00:01:37,176
+to define that term cost
+定义这个代价函数Cost函数
+
+43
+00:01:37,210 --> 00:01:39,727
+of H of X comma Y to be equal to this.
+等于这个
+
+44
+00:01:39,740 --> 00:01:42,641
+It's just equal to just one half of the square root error.
+等于这个1/2的平方根误差
+
+45
+00:01:42,670 --> 00:01:43,800
+So now, we can see more
+因此现在
+
+46
+00:01:43,800 --> 00:01:46,018
+clearly that the cost
+我们能更清楚的看到
+
+47
+00:01:46,018 --> 00:01:48,145
+function is a sum
+代价函数是这个Cost函数
+
+48
+00:01:48,145 --> 00:01:49,740
+over my training set, or
+在训练集范围上的求和
+
+49
+00:01:49,740 --> 00:01:51,427
+is 1/m times the sum
+或者说是1/m倍的
+
+50
+00:01:51,427 --> 00:01:56,046
+over my training set of this cost term here.
+这个代价项在训练集范围上的求和
+
+51
+00:01:56,050 --> 00:01:58,065
+And to simplify this
+然后稍微简化一下这个式子
+
+52
+00:01:58,065 --> 00:01:59,470
+equation a little bit more, it's gonna
+去掉这些上标
+
+53
+00:01:59,490 --> 00:02:02,587
+be convenient to get rid of those superscripts.
+会显得方便一些
+
+54
+00:02:02,610 --> 00:02:04,408
+So just define cost of
+所以直接定义
+
+55
+00:02:04,408 --> 00:02:05,527
+H of X comma Y to
+代价值(h(X), Y)
+
+56
+00:02:05,527 --> 00:02:06,618
+be equal to 1/2 of
+等于1/2倍的
+
+57
+00:02:06,618 --> 00:02:08,925
+this square root error and the
+这个平方根误差
+
+58
+00:02:08,925 --> 00:02:10,336
+interpretation of this cost function
+对这个代价项的理解是这样的
+
+59
+00:02:10,360 --> 00:02:11,876
+is that this is the
+这是我所期望的
+
+60
+00:02:11,890 --> 00:02:13,447
+cost I want my learning
+我的学习算法
+
+61
+00:02:13,460 --> 00:02:15,110
+algorithm to, you know,
+如果想要达到这个值
+
+62
+00:02:15,110 --> 00:02:16,701
+have to pay, if it
+也就是这个假设h(x)
+
+63
+00:02:16,750 --> 00:02:18,737
+outputs that value it
+所需要付出的代价
+
+64
+00:02:18,737 --> 00:02:19,912
+this prediction is H of
+这个希望的预测值是h(x)
+
+65
+00:02:19,912 --> 00:02:21,258
+X, and the actual
+而实际值则是y
+
+66
+00:02:21,310 --> 00:02:24,035
+label was Y. So just
+干脆
+
+67
+00:02:24,050 --> 00:02:27,836
+cross off those superscripts. All right.
+全部去掉那些上标好了
+
+68
+00:02:27,840 --> 00:02:29,756
+And no surprise for linear
+显然 在线性回归中
+
+69
+00:02:29,756 --> 00:02:31,537
+regression the cost for you to define is that.
+代价值会被定义为这个
+
+70
+00:02:31,537 --> 00:02:32,757
+Well the cost for this
+这个代价值是
+
+71
+00:02:32,757 --> 00:02:34,535
+is, that is 1/2
+1/2乘以
+
+72
+00:02:34,540 --> 00:02:36,232
+times the square difference
+预测值h和
+
+73
+00:02:36,232 --> 00:02:37,663
+between what are predicted and the
+实际值观测的结果y
+
+74
+00:02:37,670 --> 00:02:38,943
+actual value that we observe
+的差的平方
+
+75
+00:02:38,943 --> 00:02:41,103
+for Y. Now, this cost
+这个代价值可以
+
+76
+00:02:41,103 --> 00:02:42,848
+function worked fine for linear
+很好地用在线性回归里
+
+77
+00:02:42,848 --> 00:02:47,418
+regression, but here we're interested in logistic regression.
+但是我们现在要用在逻辑回归里
+
+78
+00:02:47,430 --> 00:02:49,146
+If we could minimize this cost
+如果我们可以最小化
+
+79
+00:02:49,150 --> 00:02:51,992
+function that is plugged into J here.
+代价函数J里面的这个代价值
+
+80
+00:02:52,020 --> 00:02:53,817
+That will work okay.
+它会工作得很好
+
+81
+00:02:53,817 --> 00:02:55,476
+But it turns out that if
+但实际上
+
+82
+00:02:55,480 --> 00:02:57,640
+we use this particular cost function
+如果我们使用这个代价值
+
+83
+00:02:57,640 --> 00:03:01,807
+this would be a non-convex function of the parameters theta.
+它会变成参数θ的非凸函数
+
+84
+00:03:01,820 --> 00:03:03,968
+Here's what I mean by non-convex.
+我说的非凸函数是这个意思
+
+85
+00:03:03,990 --> 00:03:05,313
+We have some cost function J
+对于这样一个代价函数J(θ)
+
+86
+00:03:05,313 --> 00:03:08,118
+of theta, and for logistic
+对于逻辑回归来说
+
+87
+00:03:08,140 --> 00:03:12,113
+regression this function H here
+这里的h函数
+
+88
+00:03:12,113 --> 00:03:13,495
+has a non linearity, right?
+是非线性的 对吧?
+
+89
+00:03:13,500 --> 00:03:14,538
+It says, you know, 1 over
+它是等于 1 除以
+
+90
+00:03:14,538 --> 00:03:16,384
+1 plus E to the negative theta transfers
+1+e的-θ转置乘以X次方
+
+91
+00:03:16,384 --> 00:03:19,591
+X. So it's a pretty complicated nonlinear function.
+所以它是一个很复杂的非线性函数
+
+92
+00:03:19,591 --> 00:03:21,108
+And if you take the sigmoid
+如果对它取Sigmoid函数
+
+93
+00:03:21,130 --> 00:03:22,104
+function and plug it in
+然后把它放到这里
+
+94
+00:03:22,104 --> 00:03:23,239
+here and then take
+然后求它的代价值
+
+95
+00:03:23,300 --> 00:03:25,016
+this cost function and plug
+再把它放到这里
+
+96
+00:03:25,020 --> 00:03:26,746
+it in there, and then plot
+然后再画出
+
+97
+00:03:26,746 --> 00:03:28,200
+what J of theta looks
+J(θ)长什么模样
+
+98
+00:03:28,210 --> 00:03:29,650
+like, you find that
+你会发现
+
+99
+00:03:29,650 --> 00:03:33,493
+J of theta can look like a function just like this.
+J(θ)可能是一个这样的函数
+
+100
+00:03:33,500 --> 00:03:35,958
+You know with many local optima and
+有很多局部最优值
+
+101
+00:03:35,958 --> 00:03:37,321
+the formal term for this
+称呼它的正式术语是
+
+102
+00:03:37,340 --> 00:03:39,488
+is that this a non convex function.
+这是一个非凸函数
+
+103
+00:03:39,500 --> 00:03:40,644
+And you can kind of tell.
+你大概可以发现
+
+104
+00:03:40,644 --> 00:03:41,880
+If you were to run gradient
+如果你把梯度下降法
+
+105
+00:03:41,880 --> 00:03:43,192
+descent on this sort of
+用在一个这样的函数上
+
+106
+00:03:43,192 --> 00:03:45,160
+function, it is not guaranteed
+不能保证它会
+
+107
+00:03:45,170 --> 00:03:47,747
+to converge to the global minimum.
+收敛到全局最小值
+
+108
+00:03:47,747 --> 00:03:48,867
+Whereas in contrast, what
+相应地
+
+109
+00:03:48,870 --> 00:03:50,350
+we would like is to have
+我们希望
+
+110
+00:03:50,350 --> 00:03:52,100
+a cost function J of theta
+我们的代价函数J(θ)
+
+111
+00:03:52,100 --> 00:03:53,599
+that is convex, that is
+是一个凸函数
+
+112
+00:03:53,599 --> 00:03:55,250
+a single bow-shaped function that
+是一个单弓形函数
+
+113
+00:03:55,250 --> 00:03:56,675
+looks like this, so that
+大概是这样
+
+114
+00:03:56,675 --> 00:03:58,543
+if you run gradient descent, we
+所以如果对它使用梯度下降法
+
+115
+00:03:58,543 --> 00:04:01,147
+would be guaranteed that gradient descent, you know,
+我们可以保证梯度下降法
+
+116
+00:04:01,170 --> 00:04:04,917
+would converge to the global minimum.
+会收敛到该函数的全局最小值
+
+117
+00:04:04,917 --> 00:04:07,020
+And the problem of using the
+但使用这个
+
+118
+00:04:07,020 --> 00:04:08,460
+the square cost function is that
+平方代价函数的问题是
+
+119
+00:04:08,520 --> 00:04:10,400
+because of this very
+因为中间的这个
+
+120
+00:04:10,400 --> 00:04:12,371
+non linear sigmoid function that
+非常非线性的
+
+121
+00:04:12,371 --> 00:04:14,107
+appears in the middle here, J of
+sigmoid函数的出现
+
+122
+00:04:14,107 --> 00:04:15,987
+theta ends up being
+导致J(θ)成为
+
+123
+00:04:15,987 --> 00:04:17,962
+a non convex function if you
+一个非凸函数
+
+124
+00:04:17,962 --> 00:04:21,294
+were to define it as the square cost function.
+如果你要用平方函数定义它的话
+
+125
+00:04:21,294 --> 00:04:22,313
+So what we'd would like to do
+所以我们想做的是
+
+126
+00:04:22,320 --> 00:04:23,822
+is to instead come up with
+另外找一个
+
+127
+00:04:23,822 --> 00:04:25,576
+a different cost function that
+不同的代价函数
+
+128
+00:04:25,576 --> 00:04:28,063
+is convex and so
+它是凸函数
+
+129
+00:04:28,063 --> 00:04:29,257
+that we can apply a great
+使得我们可以使用很好的算法
+
+130
+00:04:29,280 --> 00:04:30,919
+algorithm like gradient descent
+如梯度下降法
+
+131
+00:04:30,940 --> 00:04:33,683
+and be guaranteed to find a global minimum.
+而且能保证找到全局最小值
+
+132
+00:04:33,683 --> 00:04:37,295
+Here's a cost function that we're going to use for logistic regression.
+这个代价函数便是我们要用在逻辑回归上的
+
+133
+00:04:37,295 --> 00:04:39,313
+We're going to say the cost
+我们认为
+
+134
+00:04:39,320 --> 00:04:40,710
+or the penalty that the algorithm
+这个算法要付的代价或者惩罚
+
+135
+00:04:40,710 --> 00:04:42,924
+pays if it outputs
+如果输出值是h(x)
+
+136
+00:04:42,924 --> 00:04:44,596
+a value H of X.
+或者换句话说
+
+137
+00:04:44,620 --> 00:04:46,722
+So, this is some number like 0.7
+假如说预测值h(x)
+
+138
+00:04:46,722 --> 00:04:48,670
+where it predicts a value H
+是一个数 比如0.7
+
+139
+00:04:48,670 --> 00:04:50,780
+of X. And the actual
+而实际上
+
+140
+00:04:50,780 --> 00:04:52,032
+cost label turns out to
+真实的标签值是y
+
+141
+00:04:52,032 --> 00:04:54,087
+be Y. The cost is
+那么代价值将等于
+
+142
+00:04:54,090 --> 00:04:56,061
+going to be minus log
+-log(h(X))
+
+143
+00:04:56,100 --> 00:04:57,861
+H of X if Y is equal 1.
+当y=1时
+
+144
+00:04:57,861 --> 00:04:59,447
+And minus log, 1 minus
+以及-log(1-h(X))
+
+145
+00:04:59,460 --> 00:05:02,010
+H of X if Y is equal to 0.
+当y=0时
+
+146
+00:05:02,020 --> 00:05:04,205
+This looks like a pretty complicated function.
+这看起来是个非常复杂的函数
+
+147
+00:05:04,230 --> 00:05:05,773
+But let's plot function to
+但是让我们画出这个函数
+
+148
+00:05:05,773 --> 00:05:08,147
+gain some intuition about what it's doing.
+可以直观地感受一下它在做什么
+
+149
+00:05:08,160 --> 00:05:11,054
+Let's start up with the case of Y equals 1.
+我们从y=1这个情况开始
+
+150
+00:05:11,070 --> 00:05:12,461
+If Y is equal equal
+如果y等于1
+
+151
+00:05:12,461 --> 00:05:14,958
+to 1, then the cost function
+那么这个代价函数
+
+152
+00:05:14,958 --> 00:05:18,240
+is -log H of X, and
+是-log(h(X))
+
+153
+00:05:18,240 --> 00:05:19,601
+if we plot that, so let's
+如果我们画出它
+
+154
+00:05:19,601 --> 00:05:21,564
+say that the horizontal
+我们将h(X)
+
+155
+00:05:21,580 --> 00:05:22,961
+axis is H of X.
+画在横坐标上
+
+156
+00:05:22,961 --> 00:05:24,722
+So we know that a hypothesis
+我们知道假设函数
+
+157
+00:05:24,730 --> 00:05:26,611
+is going to output a value between
+的输出值
+
+158
+00:05:26,630 --> 00:05:28,465
+0 and 1.
+是在0和1之间的
+
+159
+00:05:28,465 --> 00:05:28,465
+Right?
+对吧?
+
+160
+00:05:28,490 --> 00:05:30,514
+So H of X that varies
+所以h(X)的值
+
+161
+00:05:30,530 --> 00:05:31,940
+between 0 and 1.
+在0和1之间变化
+
+162
+00:05:31,940 --> 00:05:35,469
+If you plot what this cost function looks like.
+如果你画出这个代价函数的样子
+
+163
+00:05:35,470 --> 00:05:37,981
+You find that it looks like this.
+你会发现它看起来是这样的
+
+164
+00:05:37,981 --> 00:05:39,044
+One way to see why the
+理解这个函数为什么是这样的
+
+165
+00:05:39,044 --> 00:05:41,363
+plot like this it is because
+一个方式是
+
+166
+00:05:41,440 --> 00:05:44,988
+if you were to plot log Z
+如果你画出log(z)
+
+167
+00:05:45,000 --> 00:05:47,656
+with Z on the horizontal axis.
+z在横轴上
+
+168
+00:05:47,656 --> 00:05:48,794
+Then that looks like that.
+它看起来会是这样
+
+169
+00:05:48,794 --> 00:05:50,369
+And it's approach is minus infinity.
+它趋于负无穷
+
+170
+00:05:50,369 --> 00:05:53,700
+So this is what the log function looks like.
+这是对数函数的样子
+
+171
+00:05:53,700 --> 00:05:55,963
+And so this is 0, this is 1.
+所以这里是0 这里是1
+
+172
+00:05:55,980 --> 00:05:57,560
+Here Z is of
+显然 这里的Z
+
+173
+00:05:57,560 --> 00:05:59,653
+course playing the role of
+就是代表h(x)的角色
+
+174
+00:05:59,653 --> 00:06:02,030
+H of X. And so
+因此
+
+175
+00:06:02,030 --> 00:06:06,329
+minus log Z will look like this.
+-log(Z)看起来这样
+
+176
+00:06:06,330 --> 00:06:08,098
+Right just flipping the sign.
+就是翻转一下符号
+
+177
+00:06:08,100 --> 00:06:09,822
+Minus log Z. And we're
+-log(Z)
+
+178
+00:06:09,822 --> 00:06:11,013
+interested only in the
+我们所感兴趣的是
+
+179
+00:06:11,020 --> 00:06:12,580
+range of when this function
+函数在0到1
+
+180
+00:06:12,610 --> 00:06:14,014
+goes between 0 and 1.
+之间的这个区间
+
+181
+00:06:14,014 --> 00:06:15,924
+So, get rid of that.
+所以 忽略那些
+
+182
+00:06:15,924 --> 00:06:17,962
+And so, we're just left with,
+所以只剩下
+
+183
+00:06:17,980 --> 00:06:21,555
+you know, this part of the curve.
+曲线的这部分
+
+184
+00:06:21,630 --> 00:06:23,200
+And that's what this curve on the left looks like.
+这就是左边这条曲线的样子
+
+185
+00:06:23,200 --> 00:06:25,472
+Now this cost function
+现在这个代价函数
+
+186
+00:06:25,500 --> 00:06:29,666
+has a few interesting and desirable properties.
+有一些有趣而且很好的性质
+
+187
+00:06:29,690 --> 00:06:32,103
+First you notice that if
+首先 你注意到
+
+188
+00:06:32,103 --> 00:06:35,003
+Y is equal to 1 and H of X is equal 1, in
+如果y=1而且h(X)=1
+
+189
+00:06:35,010 --> 00:06:37,367
+other words, if the hypothesis
+也就是说
+
+190
+00:06:37,410 --> 00:06:39,000
+exactly, you know, predicts
+如果假设函数
+
+191
+00:06:39,000 --> 00:06:40,261
+H equals 1, and Y
+刚好预测值是1
+
+192
+00:06:40,261 --> 00:06:42,744
+is exactly equal to what I predicted.
+而且y刚好等于我预测的
+
+193
+00:06:42,744 --> 00:06:44,432
+Then the cost is equal 0.
+那么这个代价值等于0
+
+194
+00:06:44,432 --> 00:06:44,432
+Right?
+对吧?
+
+195
+00:06:44,432 --> 00:06:47,475
+That corresponds to, the curve doesn't actually flatten out.
+这对应于… 这个曲线并不是平的
+
+196
+00:06:47,475 --> 00:06:49,866
+The curve is still going. First, notice
+曲线还在继续走
+
+197
+00:06:49,880 --> 00:06:51,006
+that if H of X
+首先 注意到如果h(x)=1
+
+198
+00:06:51,006 --> 00:06:53,056
+equals 1, if the hypothesis
+如果假设函数
+
+199
+00:06:53,056 --> 00:06:55,113
+predicts that Y is equal to 1.
+预测Y=1
+
+200
+00:06:55,113 --> 00:06:56,342
+And if indeed Y is
+并且如果y确实等于1
+
+201
+00:06:56,342 --> 00:06:58,502
+equal to 1 then the cost is equal to 0.
+那么代价值等于0
+
+202
+00:06:58,530 --> 00:07:00,975
+That corresponds to this point down here.
+这对应于下面这个点
+
+203
+00:07:00,975 --> 00:07:00,975
+Right?
+对吧?
+
+204
+00:07:01,030 --> 00:07:02,332
+If H of X is equal
+如果h(X)=1
+
+205
+00:07:02,332 --> 00:07:04,068
+to 1, and we're only
+这里我们只需要考虑
+
+206
+00:07:04,068 --> 00:07:06,273
+concerned the case that Y equals 1 here.
+y=1的情况
+
+207
+00:07:06,273 --> 00:07:08,366
+But if H of X is equal to 1.
+如果h(x)等于1
+
+208
+00:07:08,366 --> 00:07:11,063
+Then the cost is down here is equal to 0.
+那么代价值等于0
+
+209
+00:07:11,063 --> 00:07:13,082
+And that is what we like it to be.
+这是我们所希望的
+
+210
+00:07:13,082 --> 00:07:13,968
+Because, you know, if we
+因为如果我们
+
+211
+00:07:13,968 --> 00:07:17,673
+correctly predict the output Y then the cost is 0.
+正确预测了输出值y 那么代价值是0
+
+212
+00:07:17,673 --> 00:07:21,466
+But now, notice also
+但是现在 同样注意到
+
+213
+00:07:21,470 --> 00:07:23,456
+that H of X approaches 0.
+h(x)趋于0时
+
+214
+00:07:23,456 --> 00:07:25,037
+So, that's H. As the
+所以 那是h
+
+215
+00:07:25,037 --> 00:07:26,909
+output of the hypothesis approaches 0
+当假设函数的输出趋于0时
+
+216
+00:07:26,909 --> 00:07:30,163
+the cost blows up, and it goes to infinity.
+代价值激增 并且趋于无穷
+
+217
+00:07:30,163 --> 00:07:31,513
+And what this does is
+我们这样描述
+
+218
+00:07:31,513 --> 00:07:34,271
+it captures the intuition that if
+体现出了这样一种直观的感觉
+
+219
+00:07:34,310 --> 00:07:36,890
+a hypothesis, you know, outputs 0.
+那就是如果假设函数输出0
+
+220
+00:07:36,890 --> 00:07:38,574
+That's like saying, our hypothesis is
+相当于说
+
+221
+00:07:38,574 --> 00:07:39,960
+saying, the chance of Y
+我们的假设函数说
+
+222
+00:07:39,960 --> 00:07:41,541
+equals 1 is equal to 0.
+Y=1的概率等于0
+
+223
+00:07:41,541 --> 00:07:42,516
+It's kind of like our going
+这类似于
+
+224
+00:07:42,520 --> 00:07:44,010
+to our medical patient and saying,
+我们对病人说
+
+225
+00:07:44,020 --> 00:07:45,594
+"The probability that you
+你有一个恶性肿瘤的概率
+
+226
+00:07:45,610 --> 00:07:47,337
+have a malignant tumor, the
+也就是说
+
+227
+00:07:47,337 --> 00:07:49,807
+probability that Y equals 1 is zero."
+y=1的概率是0
+
+228
+00:07:49,807 --> 00:07:52,154
+So, it's like absolutely impossible that your
+就是说你的肿瘤
+
+229
+00:07:52,160 --> 00:07:55,130
+tumor is malignant.
+完全不可能是恶性的
+
+230
+00:07:55,150 --> 00:07:56,776
+But if it turns out that
+然而结果是
+
+231
+00:07:56,776 --> 00:08:00,111
+the tumor, the patient's tumor, actually is malignant.
+病人的肿瘤确实是恶性的
+
+232
+00:08:00,111 --> 00:08:01,879
+So if Y is equal to
+所以如果y=1
+
+233
+00:08:01,880 --> 00:08:03,291
+1 even after we told them
+即使我们告诉他们
+
+234
+00:08:03,300 --> 00:08:05,375
+you know, the probability of it happening is 0.
+它发生的概率是0
+
+235
+00:08:05,390 --> 00:08:08,716
+It's absolutely impossible for it to be malignant.
+它完全不可能是恶性的
+
+236
+00:08:08,716 --> 00:08:09,759
+But if we told them
+如果我们告诉他们这个
+
+237
+00:08:09,760 --> 00:08:11,186
+this with that level of certainty,
+和我们的确信程度
+
+238
+00:08:11,240 --> 00:08:13,018
+and we turn out to be wrong,
+并且最后我们是错的
+
+239
+00:08:13,018 --> 00:08:14,688
+then we penalize the learning algorithm
+那么我们用非常非常大的代价值
+
+240
+00:08:14,690 --> 00:08:16,122
+by a very, very large cost,
+惩罚这个学习算法
+
+241
+00:08:16,122 --> 00:08:17,963
+and that's captured by having this
+它是被这样体现出来
+
+242
+00:08:17,963 --> 00:08:20,474
+cost goes infinity if Y
+这个代价值趋于无穷
+
+243
+00:08:20,474 --> 00:08:21,900
+equals 1 and H
+如果y=1
+
+244
+00:08:21,900 --> 00:08:24,334
+of X approaches 0.
+而h(x)趋于0
+
+245
+00:08:24,334 --> 00:08:26,725
+This might consider of
+这是y=1时的情况
+
+246
+00:08:26,725 --> 00:08:28,875
+Y1, let's look at what
+我们再来看看
+
+247
+00:08:28,875 --> 00:08:32,371
+the cost function looks like for Y0.
+y=0时 代价值函数是什么样
+
+248
+00:08:32,410 --> 00:08:35,710
+If Y is equal to 0, then the cost
+如果y=0
+
+249
+00:08:35,720 --> 00:08:39,121
+looks like this expression over here.
+那么代价值是这个表达式
+
+250
+00:08:39,121 --> 00:08:40,403
+And if you plot
+如果画出函数
+
+251
+00:08:40,403 --> 00:08:42,751
+the function minus log 1
+-log(1-z)
+
+252
+00:08:42,780 --> 00:08:45,839
+minus Z what you
+那么你得到的
+
+253
+00:08:45,839 --> 00:08:49,245
+get is the cost function actually looks like this.
+代价函数实际上是这样
+
+254
+00:08:49,245 --> 00:08:50,256
+So, it goes from 0 to 1.
+它从0到1
+
+255
+00:08:50,270 --> 00:08:53,263
+Something like that.
+差不多这样
+
+256
+00:08:53,280 --> 00:08:54,611
+And so if you plot
+如果你画出
+
+257
+00:08:54,611 --> 00:08:55,872
+the cost function for the case
+y=0情况下的
+
+258
+00:08:55,872 --> 00:08:57,823
+of y equals zero, you find that it looks
+代价函数
+
+259
+00:08:57,823 --> 00:09:00,763
+like this and what
+你会发现大概是这样
+
+260
+00:09:00,763 --> 00:09:02,404
+this curve does is it
+它现在所做的是
+
+261
+00:09:02,404 --> 00:09:04,937
+now blows up,
+在h(X)趋于1时激增
+
+262
+00:09:04,937 --> 00:09:08,273
+and it goes to plus infinity as H of X goes to 1.
+趋于正无穷
+
+263
+00:09:08,290 --> 00:09:09,880
+Because it's saying that
+因为它是说
+
+264
+00:09:09,900 --> 00:09:11,199
+if Y turns out to be
+如果最后发现
+
+265
+00:09:11,200 --> 00:09:12,168
+equal to 0, but we
+y等于0
+
+266
+00:09:12,168 --> 00:09:13,966
+predicted that you know, Y is
+而我们却几乎
+
+267
+00:09:13,966 --> 00:09:15,286
+equal to 1 with almost
+非常肯定地预测
+
+268
+00:09:15,320 --> 00:09:17,281
+certainty with probability 1, then
+y=1的概率是1
+
+269
+00:09:17,281 --> 00:09:21,569
+we end up paying a very large cost.
+那么我们最后就要付出非常大的代价值
+
+270
+00:09:21,569 --> 00:09:23,143
+Let's plot the cost function for
+(以下这一段和前面重复了)让我们画出y=0时的
+
+271
+00:09:23,143 --> 00:09:25,063
+the case of Y equals 0.
+代价函数
+
+272
+00:09:25,063 --> 00:09:29,702
+So if Y equals 0 that's going to be our cost function.
+所以如果y=0 这就是我们的代价值函数
+
+273
+00:09:29,702 --> 00:09:31,914
+If you look at this expression,
+如果你看着这个表达式
+
+274
+00:09:31,914 --> 00:09:33,726
+and if you plot, you know, minus
+然后你画出
+
+275
+00:09:33,726 --> 00:09:36,221
+log 1 minus Z, if
+-log(1-Z)
+
+276
+00:09:36,221 --> 00:09:37,428
+you figure out what that looks like,
+如果你清楚它是什么样的
+
+277
+00:09:37,428 --> 00:09:40,071
+you get a figure that looks like this.
+你会得到这样一个图形
+
+278
+00:09:40,071 --> 00:09:41,669
+Where, which goes from 0
+这样随着
+
+279
+00:09:41,680 --> 00:09:43,610
+to 1 with the Z
+横轴上的z
+
+280
+00:09:43,610 --> 00:09:45,850
+axis on the horizontal axis.
+从0到1
+
+281
+00:09:45,850 --> 00:09:47,221
+So If you take this cost
+如果你画出
+
+282
+00:09:47,221 --> 00:09:48,397
+function and plot it for
+y=0时的
+
+283
+00:09:48,397 --> 00:09:49,614
+the case of Y equals 0,
+代价函数
+
+284
+00:09:49,614 --> 00:09:51,186
+what you get is
+你会发现
+
+285
+00:09:51,186 --> 00:09:55,109
+that the cost function looks like this.
+代价函数是这样的
+
+286
+00:09:55,109 --> 00:09:56,743
+And what this cost function
+它所做的是
+
+287
+00:09:56,743 --> 00:09:58,650
+does is that it blows
+代价函数会在这里激增
+
+288
+00:09:58,660 --> 00:09:59,530
+up or it goes to a
+趋于正无穷
+
+289
+00:09:59,560 --> 00:10:01,448
+positive infinity as each
+随着h(X)的增大
+
+290
+00:10:01,448 --> 00:10:03,707
+H of X goes to one
+而趋近于1
+
+291
+00:10:03,710 --> 00:10:05,443
+and this captures the
+这体现了这样一个直观的感觉
+
+292
+00:10:05,443 --> 00:10:07,159
+intuition that if a hypothesis
+如果假设函数预测
+
+293
+00:10:07,180 --> 00:10:08,847
+predicted that, you know, H of
+h(X)=1
+
+294
+00:10:08,850 --> 00:10:10,406
+X is equal to 1 with
+并且非常确定
+
+295
+00:10:10,406 --> 00:10:12,121
+certainty, with like probability 1,
+比如这样的概率是1
+
+296
+00:10:12,121 --> 00:10:14,283
+it's absolutely got to be Y equals 1.
+认为y肯定是1
+
+297
+00:10:14,283 --> 00:10:15,563
+But if Y turned out to
+但是最后发现
+
+298
+00:10:15,563 --> 00:10:17,219
+be equal to 0 then
+y其实等于0
+
+299
+00:10:17,219 --> 00:10:18,206
+it makes sense to make the
+这就必须要让假设函数
+
+300
+00:10:18,206 --> 00:10:21,940
+hypothesis, or make the learning algorithm pay a very large cost.
+或者学习算法付出一个很大的代价
+
+301
+00:10:21,940 --> 00:10:24,609
+And conversely, if H
+反过来
+
+302
+00:10:24,610 --> 00:10:25,942
+of X is equal to
+如果h(x)=0
+
+303
+00:10:25,950 --> 00:10:27,483
+0 and Y equals zero,
+而且y=0
+
+304
+00:10:27,483 --> 00:10:28,983
+then the hypothesis nailed it.
+那么假设函数预测对了
+
+305
+00:10:29,000 --> 00:10:30,626
+The predicted Y is equal
+预测的是y=0
+
+306
+00:10:30,630 --> 00:10:32,371
+to zero and it turns
+并且y就是等于0
+
+307
+00:10:32,371 --> 00:10:34,376
+out Y is equal to zero
+并且Y就是等于0
+
+308
+00:10:34,376 --> 00:10:36,701
+so at this point the cost
+那么代价值函数在这点上
+
+309
+00:10:36,750 --> 00:10:40,139
+function is going to be 0.
+应该等于0
+
+310
+00:10:40,160 --> 00:10:42,163
+In this video, we
+在这个视频中
+
+311
+00:10:42,163 --> 00:10:43,886
+have defined the cost function
+我们定义了
+
+312
+00:10:43,886 --> 00:10:46,428
+for a single training example.
+单训练样本的代价函数
+
+313
+00:10:46,428 --> 00:10:50,251
+The topic of convexity analysis is beyond the scope of this course.
+凸性分析的内容是超出这门课的范围的
+
+314
+00:10:50,270 --> 00:10:51,594
+But it is possible to show
+但是可以证明
+
+315
+00:10:51,620 --> 00:10:53,080
+that with our particular choice
+我们所选的
+
+316
+00:10:53,150 --> 00:10:54,774
+of cost function this would
+代价值函数
+
+317
+00:10:54,774 --> 00:10:57,926
+give us a convex optimization problem
+会给我们一个
+
+318
+00:10:57,960 --> 00:11:00,081
+as cost function, overall cost function
+凸优化问题
+
+319
+00:11:00,081 --> 00:11:01,463
+J of theta will be
+代价函数J(θ)会是一个凸函数
+
+320
+00:11:01,463 --> 00:11:04,368
+convex and local optima free.
+并且没有局部最优值
+
+321
+00:11:04,370 --> 00:11:05,691
+In the next video we're going
+在下一个视频中
+
+322
+00:11:05,691 --> 00:11:07,753
+to take these ideas of the
+我们会把单训练样本的
+
+323
+00:11:07,753 --> 00:11:08,923
+cost function for a single
+代价函数的这些理念
+
+324
+00:11:08,923 --> 00:11:10,839
+training example and develop that
+进一步发展
+
+325
+00:11:10,839 --> 00:11:12,522
+further and define the
+然后给出
+
+326
+00:11:12,522 --> 00:11:13,773
+cost function for the entire
+整个训练集的代价函数的定义
+
+327
+00:11:13,780 --> 00:11:16,104
+training set, and we'll also
+我们还会找到一种
+
+328
+00:11:16,104 --> 00:11:17,404
+figure out a simpler way to
+比我们目前用的
+
+329
+00:11:17,404 --> 00:11:19,699
+write it than we have been using so far.
+更简单的写法
+
+330
+00:11:19,699 --> 00:11:21,016
+And based on that will
+基于这些推导出的结果
+
+331
+00:11:21,030 --> 00:11:22,779
+work out gradient descent, and
+我们将应用梯度下降法
+
+332
+00:11:22,779 --> 00:11:25,835
+that will give us our logistic regression algorithm.
+得到我们的逻辑回归算法
+
diff --git a/srt/6 - 5 - Simplified Cost Function and Gradient Descent (10 min).srt b/srt/6 - 5 - Simplified Cost Function and Gradient Descent (10 min).srt
new file mode 100644
index 00000000..bdca411a
--- /dev/null
+++ b/srt/6 - 5 - Simplified Cost Function and Gradient Descent (10 min).srt
@@ -0,0 +1,1426 @@
+1
+00:00:00,310 --> 00:00:02,286
+In this video, we'll figure out
+在这段视频中 我们将会找出
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,300 --> 00:00:03,903
+a slightly simpler way to
+一种稍微简单一点的方法来
+
+3
+00:00:03,910 --> 00:00:06,513
+write the cost function than we have been using so far.
+写代价函数 来替换我们现在用的方法
+
+4
+00:00:06,520 --> 00:00:08,252
+And we'll also figure out
+同时我们还要弄清楚
+
+5
+00:00:08,252 --> 00:00:10,779
+how to apply gradient descent to fit
+如何运用梯度下降法
+
+6
+00:00:10,779 --> 00:00:13,321
+the parameters of logistic regression.
+来拟合出逻辑回归的参数
+
+7
+00:00:13,321 --> 00:00:14,210
+So by the end of this
+因此 听了这节课
+
+8
+00:00:14,210 --> 00:00:15,589
+video you know how to
+你就应该知道如何
+
+9
+00:00:15,589 --> 00:00:19,201
+implement a fully working version of logistic regression.
+实现一个完整的逻辑回归算法
+
+10
+00:00:19,201 --> 00:00:24,802
+Here's our cost function for logistic regression.
+这就是逻辑回归的代价函数
+
+11
+00:00:24,802 --> 00:00:27,613
+Our overall cost function
+我们的整体代价函数
+
+12
+00:00:27,613 --> 00:00:29,477
+is 1 over M times sum
+不同的训练样本
+
+13
+00:00:29,477 --> 00:00:31,003
+of the training set of the
+假设函数 h(x) 对实际值 y(i) 进行预测
+
+14
+00:00:31,003 --> 00:00:32,797
+cost of making different
+所得到的不同误差
+
+15
+00:00:32,797 --> 00:00:34,580
+predictions on the different examples
+算出的 Cost 函数值
+
+16
+00:00:34,580 --> 00:00:36,408
+of labels Y I. And
+并且这是我们之前
+
+17
+00:00:36,408 --> 00:00:39,492
+this is a cost for a single example that we worked out earlier.
+算出来的一个单个样本的代价值
+
+18
+00:00:39,492 --> 00:00:40,604
+And I just want to remind
+我只是想提醒你一下
+
+19
+00:00:40,604 --> 00:00:43,514
+you that for classification problems
+对于分类问题
+
+20
+00:00:43,514 --> 00:00:45,778
+in our training and in fact
+我们的训练集
+
+21
+00:00:45,778 --> 00:00:47,076
+even for examples known in
+甚至其他不在训练集中的样本
+
+22
+00:00:47,076 --> 00:00:48,915
+our training set, Y is
+y 的值总是等于0或1的
+
+23
+00:00:48,915 --> 00:00:51,056
+always equal to 0 or 1.
+y 的值总是等于0或1的
+
+24
+00:00:51,056 --> 00:00:51,056
+Right?
+对吗?
+
+25
+00:00:51,056 --> 00:00:52,150
+That's sort of part of the
+这就是
+
+26
+00:00:52,150 --> 00:00:55,700
+mathematical definition of Y.
+y 的数学定义决定的
+
+27
+00:00:55,720 --> 00:00:57,430
+Because Y is either 0 or 1.
+由于 y 是0或1
+
+28
+00:00:57,430 --> 00:00:59,453
+We'll be able to
+我们就可以
+
+29
+00:00:59,460 --> 00:01:00,735
+come up with a simpler
+想出一个简单的
+
+30
+00:01:00,760 --> 00:01:03,001
+way to write this cost function.
+方式来写这个代价函数
+
+31
+00:01:03,001 --> 00:01:04,980
+And in particular, rather than writing
+具体来说
+
+32
+00:01:04,980 --> 00:01:06,394
+out this cost function on two
+为了避免把代价函数
+
+33
+00:01:06,410 --> 00:01:07,966
+separate lines with two separate
+写成两行
+
+34
+00:01:07,966 --> 00:01:09,519
+cases for Y equals 1 and Y equals
+避免分成 y=1 或 y=0 两种情况来写
+
+35
+00:01:09,519 --> 00:01:11,123
+0, I am going to show
+我们要用一种方法
+
+36
+00:01:11,130 --> 00:01:12,687
+you a way take these
+来把这两个式子
+
+37
+00:01:12,687 --> 00:01:16,241
+two lines and compress them into one equation.
+合并成一个
+
+38
+00:01:16,241 --> 00:01:17,743
+And this will make it more convenient
+这将使我们更方便地
+
+39
+00:01:17,743 --> 00:01:19,250
+to write out the cost function
+写出代价函数
+
+40
+00:01:19,250 --> 00:01:21,493
+and derive gradient descent.
+并推导出梯度下降
+
+41
+00:01:21,493 --> 00:01:24,492
+Concretely, we can write out the cost function as follows.
+具体而言 我们可以如下写出代价函数
+
+42
+00:01:24,492 --> 00:01:27,304
+We'll say the cost of H
+Cost(h(x), y) 可以写成
+
+43
+00:01:27,304 --> 00:01:29,269
+of X comma Y. I'm going
+以下的形式
+
+44
+00:01:29,269 --> 00:01:31,750
+to write this as minus Y
+-y log(h(x))- (1-y) log(1-h(x))
+
+45
+00:01:31,770 --> 00:01:34,201
+times log H of
+-y log(h(x))- (1-y) log(1-h(x))
+
+46
+00:01:34,201 --> 00:01:37,730
+X minus 1
+-y log(h(x))- (1-y) log(1-h(x))
+
+47
+00:01:38,060 --> 00:01:41,615
+minus Y times log 1
+-y log(h(x))- (1-y) log(1-h(x))
+
+48
+00:01:41,660 --> 00:01:44,655
+minus H of X.
+-y log(h(x))- (1-y) log(1-h(x))
+
+49
+00:01:44,670 --> 00:01:45,824
+And I'll show you in a
+我马上就会给你演示
+
+50
+00:01:45,824 --> 00:01:48,062
+second that this expression, or
+这个表达式或
+
+51
+00:01:48,062 --> 00:01:51,038
+this equation is an equivalent
+等式与我们已经得出的
+
+52
+00:01:51,038 --> 00:01:52,354
+way or more compact way
+代价函数的表达
+
+53
+00:01:52,354 --> 00:01:54,195
+of writing out this definition
+是完全等效的
+
+54
+00:01:54,195 --> 00:01:56,353
+of the cost function that we had up here.
+并且更加紧凑
+
+55
+00:01:56,353 --> 00:02:00,243
+Let's see why that's the case.
+让我们来看看为什么会是这样
+
+56
+00:02:03,730 --> 00:02:06,190
+We know that there are only 2 possible cases.
+我们知道有两种可能情况
+
+57
+00:02:06,190 --> 00:02:07,210
+Y must be 0 or 1.
+y 必须是0或1
+
+58
+00:02:07,230 --> 00:02:10,857
+So let's suppose Y equals 1.
+因此 我们假设 y 等于1
+
+59
+00:02:10,857 --> 00:02:12,480
+If Y is equal
+如果 y 是等于
+
+60
+00:02:12,480 --> 00:02:14,822
+to 1 then this equation
+那么这个等式
+
+61
+00:02:14,822 --> 00:02:17,603
+is saying that the cost
+这个 Cost 值
+
+62
+00:02:18,573 --> 00:02:20,172
+is equal to.
+是等于
+
+63
+00:02:20,172 --> 00:02:23,895
+Well if Y is equal to one, then this thing here is equal to one.
+如果 y 等于1 那么这一项等于1
+
+64
+00:02:23,900 --> 00:02:26,631
+And one minus Y is going to be equal to zero, right?
+1-y 将会等于零 对吧?
+
+65
+00:02:26,631 --> 00:02:27,852
+So if Y is equal
+如果 y 等于1
+
+66
+00:02:27,860 --> 00:02:29,348
+to one, then one minus Y
+那么 1-y
+
+67
+00:02:29,370 --> 00:02:32,336
+is one minus one, which is therefore zero.
+就是1-1 也就是0
+
+68
+00:02:32,336 --> 00:02:34,076
+So the second term gets multiplied
+因此第二项乘以0
+
+69
+00:02:34,076 --> 00:02:36,047
+by zero and goes away,
+就被消去了
+
+70
+00:02:36,047 --> 00:02:37,380
+and we're left with only this
+我们只留下了
+
+71
+00:02:37,420 --> 00:02:38,631
+first term which is Y
+第一项 y倍的 log 项
+
+72
+00:02:38,650 --> 00:02:40,654
+times log, minus Y times
+-y 乘以
+
+73
+00:02:40,654 --> 00:02:42,174
+log H of X. Y is
+log(h(x)) y等于1
+
+74
+00:02:42,174 --> 00:02:43,621
+1 so that's equal to minus
+因此就等于
+
+75
+00:02:43,630 --> 00:02:46,313
+log H of X.
+-log(h(x))
+
+76
+00:02:46,320 --> 00:02:48,300
+And this equation is
+这个等式
+
+77
+00:02:48,300 --> 00:02:50,050
+exactly what we have
+正是我们在这里的
+
+78
+00:02:50,060 --> 00:02:53,276
+up here for if Y is equal to one.
+y=1 的情况
+
+79
+00:02:53,276 --> 00:02:55,566
+The other case is if
+另一种情况是
+
+80
+00:02:55,566 --> 00:02:57,275
+Y is equal to 0.
+如果 y=0
+
+81
+00:02:57,290 --> 00:02:58,718
+And if that is
+如果是这样的话
+
+82
+00:02:58,718 --> 00:03:01,430
+the case then, writing of
+那么写出的
+
+83
+00:03:01,500 --> 00:03:03,584
+the cost function is saying that
+Cost 函数就是这样的
+
+84
+00:03:03,600 --> 00:03:05,500
+if Y is equal to zero,
+如果 y 是等于0
+
+85
+00:03:05,500 --> 00:03:08,381
+then this term here, will be equal to zero.
+那么这一项就为0
+
+86
+00:03:08,381 --> 00:03:10,111
+Whereas 1 minus Y, if
+而1-y 在y=0时
+
+87
+00:03:10,111 --> 00:03:11,270
+Y equals zero, would be
+1-y 就是0
+
+88
+00:03:11,280 --> 00:03:12,528
+equal to 1, because 1 minus
+因为1-y就是
+
+89
+00:03:12,530 --> 00:03:14,556
+Y becomes 1 minus 0,
+1-0 所以
+
+90
+00:03:14,556 --> 00:03:16,650
+which is just equal to 1.
+最后就等于1
+
+91
+00:03:16,650 --> 00:03:18,643
+And so the cost function
+这样 Cost 函数
+
+92
+00:03:18,643 --> 00:03:22,583
+simplifies to just this last term here.
+就简化为只有这最后一项
+
+93
+00:03:22,583 --> 00:03:22,583
+Right?
+对吧?
+
+94
+00:03:22,583 --> 00:03:24,724
+Because the first term
+因为第一项
+
+95
+00:03:24,724 --> 00:03:27,493
+over here gets multiplied by zero, and so it disappears.
+在这里乘以零 所以它被消去了
+
+96
+00:03:27,493 --> 00:03:28,802
+So we're just left with this last
+所以 我们只剩下最后的
+
+97
+00:03:28,802 --> 00:03:30,486
+term, which is minus
+这一项 也就是
+
+98
+00:03:30,510 --> 00:03:32,566
+log, 1 minus H of
+-log(1-h(x))
+
+99
+00:03:32,590 --> 00:03:34,243
+X. And you can
+你可以证明
+
+100
+00:03:34,260 --> 00:03:36,013
+verify that this term here
+这里的这一项
+
+101
+00:03:36,013 --> 00:03:40,434
+is just exactly what we had for when Y is equal to 0.
+就是当y=0时的这一项
+
+102
+00:03:40,450 --> 00:03:42,260
+So this shows that this
+因此这表明
+
+103
+00:03:42,260 --> 00:03:43,628
+definition for the cost is
+这样定义的 Cost 函数
+
+104
+00:03:43,628 --> 00:03:45,423
+just a more compact way of
+只是把这两个式子
+
+105
+00:03:45,423 --> 00:03:47,376
+taking both of these expressions,
+写成一种更紧凑的形式
+
+106
+00:03:47,376 --> 00:03:48,757
+the cases Y equals 1 and
+不需要分 y=1
+
+107
+00:03:48,757 --> 00:03:50,284
+Y equals 0, and writing
+或 y=0 来写
+
+108
+00:03:50,284 --> 00:03:52,014
+them in one, in a
+直接写在一起
+
+109
+00:03:52,030 --> 00:03:54,580
+more convenient form with just one line.
+只用一行来表示
+
+110
+00:03:54,600 --> 00:03:56,449
+We can, therefore, write
+这样我们就可以写出
+
+111
+00:03:56,449 --> 00:03:59,898
+all of our cost function for logistic regression as follows.
+逻辑回归的代价函数如下
+
+112
+00:03:59,898 --> 00:04:00,628
+It is this
+它是这样的
+
+113
+00:04:00,628 --> 00:04:01,746
+one of m of the sum
+就是 1/m 乘以后面这个 Cost 函数
+
+114
+00:04:01,746 --> 00:04:03,856
+of this cost functions, and plugging
+在这里放入之前
+
+115
+00:04:03,856 --> 00:04:05,123
+in the definition for the
+定义好的 Cost 函数
+
+116
+00:04:05,123 --> 00:04:07,255
+cost that we worked out earlier, we end up with this.
+这个函数就完成了
+
+117
+00:04:07,255 --> 00:04:09,767
+And we just brought the minus sign outside.
+我们把负号放在外面
+
+118
+00:04:09,767 --> 00:04:12,214
+And why do we choose this particular cost function?
+我们为什么要把代价函数写成这种形式
+
+119
+00:04:12,230 --> 00:04:16,250
+When it looks like there could be other cost functions that we could have chosen.
+似乎我们也可以选择别的方法来写代价函数
+
+120
+00:04:16,250 --> 00:04:17,427
+Although I won't have time to
+在这节课中我没有时间
+
+121
+00:04:17,430 --> 00:04:19,171
+go into great detail of this
+来介绍有关这个问题的细节
+
+122
+00:04:19,171 --> 00:04:21,345
+in this course, this cost function
+但我可以告诉你
+
+123
+00:04:21,345 --> 00:04:23,566
+can be derived from statistics using
+这个式子是从统计学中的
+
+124
+00:04:23,566 --> 00:04:25,416
+the principle of maximum likelihood
+极大似然法得来的
+
+125
+00:04:25,440 --> 00:04:26,816
+estimation, which is an
+估计 统计学的思路是
+
+126
+00:04:26,820 --> 00:04:28,754
+idea statistics for how
+如何为不同的模型
+
+127
+00:04:28,770 --> 00:04:33,014
+to efficiently find parameters data for different models.
+有效地找出不同的参数
+
+128
+00:04:33,014 --> 00:04:35,843
+And it also has a nice property that it is convex.
+同时它还有一个很好的性质 它是凸的
+
+129
+00:04:35,860 --> 00:04:37,666
+So this is the cost function
+因此 这就是基本上
+
+130
+00:04:37,666 --> 00:04:40,003
+that, you know, essentially everyone uses
+大部分人使用的
+
+131
+00:04:40,040 --> 00:04:42,736
+when putting Logistic Regression models.
+逻辑回归代价函数
+
+132
+00:04:42,740 --> 00:04:44,264
+If we don't understand the terms
+如果我们不理解这些项
+
+133
+00:04:44,264 --> 00:04:45,731
+I just say and you don't
+如果你不知道
+
+134
+00:04:45,731 --> 00:04:47,280
+know what the principle of maximum
+什么是极大似然估计
+
+135
+00:04:47,280 --> 00:04:49,706
+likelihood estimation is, don't worry about.
+不用担心
+
+136
+00:04:49,706 --> 00:04:51,240
+There's just a deeper
+这里只是一个更深入
+
+137
+00:04:51,250 --> 00:04:53,780
+rational and justification behind this
+更合理的证明而已
+
+138
+00:04:53,790 --> 00:04:55,617
+particular cost function then I
+在这节课中
+
+139
+00:04:55,630 --> 00:04:58,203
+have time to go into in this class.
+我没有时间去仔细讲解
+
+140
+00:04:58,203 --> 00:05:00,683
+Given this cost function, in
+根据这个代价函数
+
+141
+00:05:00,683 --> 00:05:02,601
+order to fit the parameters,
+为了拟合出参数
+
+142
+00:05:02,601 --> 00:05:04,541
+what we're going to do then is
+我们怎么办呢?
+
+143
+00:05:04,541 --> 00:05:07,896
+try to find the parameters theta that minimizes J of theta.
+我们要试图找尽量让 J(θ) 取得最小值的参数 θ
+
+144
+00:05:07,910 --> 00:05:10,716
+So if we, you know, try to minimize this.
+所以我们想要尽量减小这一项
+
+145
+00:05:10,716 --> 00:05:15,006
+This would give us some set of parameters theta.
+这将我们将得到某个参数 θ
+
+146
+00:05:15,006 --> 00:05:17,157
+Finally, if we're given a new
+最后 如??果我们给出一个新的样本
+
+147
+00:05:17,157 --> 00:05:18,549
+example with some set
+假如某个特征 x
+
+148
+00:05:18,549 --> 00:05:20,164
+of features X. We can
+假如某个特征 x
+
+149
+00:05:20,164 --> 00:05:21,640
+then take the thetas that we
+我们可以用拟合训练样本的参数 θ
+
+150
+00:05:21,640 --> 00:05:23,980
+fit our training set and output
+来输出对假设的预测
+
+151
+00:05:23,980 --> 00:05:25,793
+our prediction as this, and
+来输出对假设的预测
+
+152
+00:05:25,800 --> 00:05:27,336
+just to remind you the output
+另外提醒你一下
+
+153
+00:05:27,336 --> 00:05:28,842
+of my hypothesis, I am
+我们假设的输出
+
+154
+00:05:28,850 --> 00:05:30,253
+going to interpret as the
+实际上就是这个概率值
+
+155
+00:05:30,253 --> 00:05:33,001
+probability that Y is equal to 1.
+p(y=1|x;θ)
+
+156
+00:05:33,001 --> 00:05:34,656
+And this is given the
+就是关于 x 以 θ 为参数
+
+157
+00:05:34,670 --> 00:05:36,900
+implement X and parameters by theta.
+y=1 的概率
+
+158
+00:05:36,900 --> 00:05:38,070
+But think of this
+你就把这个想成
+
+159
+00:05:38,070 --> 00:05:40,613
+as just my hypothesis is
+我们的假设就是
+
+160
+00:05:40,613 --> 00:05:43,873
+estimating the probability that Y is equal to 1.
+估计 y=1 的概率
+
+161
+00:05:43,880 --> 00:05:45,579
+So all that remains to
+所以 接下来要做的事情
+
+162
+00:05:45,590 --> 00:05:47,143
+be done is figure out
+就是弄清楚
+
+163
+00:05:47,150 --> 00:05:49,520
+how to actually minimize J
+如何最大限度地
+
+164
+00:05:49,520 --> 00:05:51,005
+of theta as a function
+最小化代价函数 J(θ)
+
+165
+00:05:51,010 --> 00:05:52,519
+of theta so we can actually
+作为一个关于 θ 的函数
+
+166
+00:05:52,519 --> 00:05:55,625
+fit the parameters to our training set.
+这样我们才能为训练集拟合出参数 θ
+
+167
+00:05:56,390 --> 00:05:57,819
+The way we're going to minimize the
+最小化代价函数的方法
+
+168
+00:05:57,819 --> 00:06:00,599
+cost function is using gradient descent.
+是使用梯度下降法(gradient descent)
+
+169
+00:06:00,600 --> 00:06:02,225
+Here's our cost function.
+这是我们的代价函数
+
+170
+00:06:02,250 --> 00:06:05,307
+And if we want to minimize it as a function of theta.
+如果我们要最小化这个关于 θ 的函数值
+
+171
+00:06:05,340 --> 00:06:08,070
+Here's our usual template for gradient descent.
+这就是我们通常用的梯度下降法的模板
+
+172
+00:06:08,070 --> 00:06:09,880
+Where we repeatedly update each
+我们要反复更新每个参数
+
+173
+00:06:09,880 --> 00:06:12,398
+parameter by taking updating
+用这个式子来更新
+
+174
+00:06:12,398 --> 00:06:14,099
+it as itself minus a
+就是用它自己减去
+
+175
+00:06:14,099 --> 00:06:17,684
+learning rate alpha times this derivative term.
+学习率 α 乘以后面的微分项
+
+176
+00:06:17,684 --> 00:06:19,219
+If you know some calculus feel
+如果你知道一些微积分的知识
+
+177
+00:06:19,219 --> 00:06:20,739
+free to take this term and
+你可以自己动手
+
+178
+00:06:20,739 --> 00:06:22,788
+try to compute a derivative yourself
+算一算这个微分项
+
+179
+00:06:22,788 --> 00:06:24,592
+and see if you can simplify
+看看你算出来的
+
+180
+00:06:24,592 --> 00:06:26,664
+it to the same answer that I get.
+跟我得到的是不是一样
+
+181
+00:06:26,664 --> 00:06:30,538
+But even if you don't know calculus don't worry about it.
+即使你不知道微积分 也不用担心
+
+182
+00:06:30,538 --> 00:06:32,355
+If you actually compute this,
+如果你计算一下的话
+
+183
+00:06:32,370 --> 00:06:34,811
+what you get is this equation.
+你会得到的是这个式子
+
+184
+00:06:34,811 --> 00:06:37,634
+And just write it out here.
+我把它写在这里
+
+185
+00:06:37,634 --> 00:06:39,047
+The sum from I equals 1
+将后面这个式子
+
+186
+00:06:39,047 --> 00:06:41,386
+through M of the,
+在 i=1 到 m 上求和
+
+187
+00:06:41,386 --> 00:06:43,722
+essentially the error, times
+其实就是预测误差
+
+188
+00:06:43,722 --> 00:06:46,378
+X I J. So if
+乘以 x(i)j
+
+189
+00:06:46,390 --> 00:06:48,504
+you take this partial derivative
+所以你把这个偏导数项
+
+190
+00:06:48,504 --> 00:06:49,716
+term and plug it back
+放回到原来式子这里
+
+191
+00:06:49,716 --> 00:06:51,210
+in here, we can then
+我们就可以将
+
+192
+00:06:51,230 --> 00:06:55,203
+write out our gradient descent algorithm as follows.
+梯度下降算法写作如下形式
+
+193
+00:06:55,203 --> 00:06:56,393
+And all I've done is I
+我做的就是把
+
+194
+00:06:56,393 --> 00:06:57,633
+took the derivative term from
+前一张幻灯片中的那一行
+
+195
+00:06:57,633 --> 00:07:00,163
+the previous line and plugged it in there.
+放到这里了
+
+196
+00:07:00,170 --> 00:07:01,454
+So if you have N
+所以 如果你有 n 个特征
+
+197
+00:07:01,454 --> 00:07:03,856
+features, you would have, you know, a
+也就是说
+
+198
+00:07:03,856 --> 00:07:06,865
+parameter vector theta, which parameters
+参数向量θ 包括
+
+199
+00:07:06,865 --> 00:07:08,417
+theta zero, theta one, theta
+θ0 θ1 θ2
+
+200
+00:07:08,417 --> 00:07:10,031
+two, down to theta
+一直到 θn
+
+201
+00:07:10,031 --> 00:07:11,324
+N and you will
+那么你就需要
+
+202
+00:07:11,340 --> 00:07:13,930
+use this update to simultaneously
+用这个式子
+
+203
+00:07:13,930 --> 00:07:15,920
+update all of your values of theta.
+来同时更新所有 θ 的值
+
+204
+00:07:15,950 --> 00:07:17,378
+Now if you take this
+现在 如果你把这个
+
+205
+00:07:17,378 --> 00:07:19,498
+update rule and compare it
+更新规则和我们之前
+
+206
+00:07:19,498 --> 00:07:21,175
+to what we were doing
+用在线性回归上的
+
+207
+00:07:21,180 --> 00:07:23,364
+for linear regression, you might
+进行比较的话
+
+208
+00:07:23,370 --> 00:07:25,679
+be surprised to realize that,
+你会惊讶地发现
+
+209
+00:07:25,710 --> 00:07:28,958
+well, this equation was exactly
+这个式子正是
+
+210
+00:07:28,970 --> 00:07:30,529
+what we had for linear regression.
+我们用来做线性回归梯度下降的
+
+211
+00:07:30,550 --> 00:07:31,678
+In fact, if you look
+事实上 如果你看一下
+
+212
+00:07:31,678 --> 00:07:33,234
+at the earlier videos and look
+前面的视频
+
+213
+00:07:33,240 --> 00:07:35,123
+at the update rule, the
+再仔细想想这个更新规则
+
+214
+00:07:35,123 --> 00:07:36,543
+gradient descent rule for linear
+线性梯度下降规则
+
+215
+00:07:36,550 --> 00:07:38,418
+regression, it looked exactly
+实际上跟我蓝色框里
+
+216
+00:07:38,418 --> 00:07:41,268
+like what I drew here inside the blue box.
+写出来的式子是完全一样的
+
+217
+00:07:41,268 --> 00:07:43,280
+So are linear regression and
+那么 线性回归和
+
+218
+00:07:43,280 --> 00:07:45,875
+logistic regression different algorithms or not?
+逻辑回归是同一个算法吗?
+
+219
+00:07:45,900 --> 00:07:47,415
+Well, this is resolved by
+要回答这个问题
+
+220
+00:07:47,415 --> 00:07:49,468
+observing that for logistic
+我们要观察逻辑回归
+
+221
+00:07:49,500 --> 00:07:51,376
+regression, what has changed
+看看发生了哪些变化
+
+222
+00:07:51,380 --> 00:07:54,723
+is that the definition for this hypothesis has changed.
+实际上 假设的定义发生了变化
+
+223
+00:07:54,723 --> 00:07:56,788
+So whereas for linear regression
+所以对于线性回归
+
+224
+00:07:56,800 --> 00:07:58,586
+we had H of X equals
+假设函数是 h(x) 为
+
+225
+00:07:58,620 --> 00:08:01,093
+theta transpose X, now the
+θ 转置乘以 x
+
+226
+00:08:01,093 --> 00:08:02,633
+definition of H of
+而现在逻辑函数假设的定义
+
+227
+00:08:02,633 --> 00:08:04,060
+X has changed and is
+已经发生了变化
+
+228
+00:08:04,060 --> 00:08:05,460
+instead now 1 over 1
+现在已经变成了
+
+229
+00:08:05,460 --> 00:08:07,897
+plus e to the negative theta transpose X.
+这样的形式
+
+230
+00:08:07,910 --> 00:08:09,326
+So even though the update
+因此 即使更新参数的
+
+231
+00:08:09,340 --> 00:08:12,213
+rule looks cosmetically identical, because
+规则看起来基本相同
+
+232
+00:08:12,230 --> 00:08:13,872
+the definition of the hypothesis
+但由于假设的定义
+
+233
+00:08:13,872 --> 00:08:15,826
+has changed, this is actually
+发生了变化 所以逻辑函数的梯度下降
+
+234
+00:08:15,826 --> 00:08:19,445
+not the same thing as gradient descent for linear regression.
+跟线性回归的梯度下降实际上是两个完全不同的东西
+
+235
+00:08:19,445 --> 00:08:21,063
+In an earlier video, when
+在先前的视频中
+
+236
+00:08:21,090 --> 00:08:22,889
+we were talking about gradient descent
+当我们在谈论线性回归的
+
+237
+00:08:22,900 --> 00:08:24,514
+for linear regression, we had
+梯度下降法时
+
+238
+00:08:24,514 --> 00:08:26,128
+talked about how to monitor
+我们谈到了如何监控
+
+239
+00:08:26,160 --> 00:08:29,630
+gradient descent to make sure that it is converging.
+梯度下降法以确保其收敛
+
+240
+00:08:29,630 --> 00:08:31,463
+I usually apply that same
+我通常也把同样的方法
+
+241
+00:08:31,463 --> 00:08:33,354
+method to logistic regression too
+用在逻辑回归中
+
+242
+00:08:33,354 --> 00:08:37,193
+to monitor gradient descent to make sure it's conversion correctly.
+来监测梯度下降 以确保它正常收敛
+
+243
+00:08:37,220 --> 00:08:38,612
+And hopefully you can figure
+希望你自己能想清楚
+
+244
+00:08:38,612 --> 00:08:40,306
+out how to apply that technique
+如何把同样的方法
+
+245
+00:08:40,306 --> 00:08:43,984
+to logistic regression yourself.
+应用到逻辑函数的梯度下降中
+
+246
+00:08:43,984 --> 00:08:46,603
+When implementing logistic regression with
+当使用梯度下降法
+
+247
+00:08:46,610 --> 00:08:48,229
+gradient descent, we have
+来实现逻辑回归时
+
+248
+00:08:48,229 --> 00:08:50,404
+all of these different parameter
+我们有这些不同的参数 θ
+
+249
+00:08:50,404 --> 00:08:52,093
+values, you know, theta
+就是 θ0 到 θn
+
+250
+00:08:52,130 --> 00:08:55,816
+0 down to theta N that we need to update using this expression.
+我们需要用这个表达式来更新这些参数
+
+251
+00:08:55,816 --> 00:08:58,770
+And one thing we could do is have a for loop.
+我们还可以使用 for 循环来实现
+
+252
+00:08:58,770 --> 00:09:00,926
+So for I equals 0 to
+所以 for i=1 to n
+
+253
+00:09:00,926 --> 00:09:03,658
+N of i equals 1 to N plus 1.
+或者 for i=1 to n+1
+
+254
+00:09:03,658 --> 00:09:07,217
+So update each of these parameter values in turn.
+用一个 for 循环来更新这些参数值
+
+255
+00:09:07,217 --> 00:09:08,653
+But of course, rather than using
+当然 不用 for 循环也是可以的
+
+256
+00:09:08,653 --> 00:09:10,588
+a folder, ideally we would
+理想情况下
+
+257
+00:09:10,600 --> 00:09:13,163
+also use a vectorized implementation.
+我们更提倡使用向量化的实现
+
+258
+00:09:13,170 --> 00:09:15,072
+And so that a vectorized
+因此 向量化的实现
+
+259
+00:09:15,072 --> 00:09:16,899
+implementation can update, you
+可以把所有这些 n 个
+
+260
+00:09:16,899 --> 00:09:18,310
+know, all of these N plus
+参数同时更新
+
+261
+00:09:18,310 --> 00:09:21,110
+1 parameters all in one fell swoop.
+一举搞定
+
+262
+00:09:21,110 --> 00:09:22,233
+And to check your own
+为了检查你自己的理解
+
+263
+00:09:22,233 --> 00:09:23,675
+understanding, you might see
+是否到位
+
+264
+00:09:23,690 --> 00:09:25,223
+if you can figure out how
+你可以自己想想
+
+265
+00:09:25,223 --> 00:09:27,763
+to do the vectorized implementation
+应该怎么样实现这个
+
+266
+00:09:27,763 --> 00:09:31,020
+of this algorithm as well.
+向量化的实现方法
+
+267
+00:09:31,030 --> 00:09:32,331
+So now you know how
+好的 现在你知道如何
+
+268
+00:09:32,350 --> 00:09:35,079
+to implement gradient descents for logistic aggression.
+实现逻辑回归的梯度下降
+
+269
+00:09:35,079 --> 00:09:36,706
+There was one last idea
+最后还有一个
+
+270
+00:09:36,706 --> 00:09:40,753
+that we had talked about earlier for which was feature scaling.
+我们之前在谈线性回归时讲到的特征缩放
+
+271
+00:09:40,753 --> 00:09:42,946
+We saw how feature scaling can
+我们看到了特征缩放是如何
+
+272
+00:09:42,946 --> 00:09:46,502
+help gradient descents converge faster for linear regression.
+提高梯度下降的收敛速度的
+
+273
+00:09:46,502 --> 00:09:48,827
+The idea of feature scaling also
+这个特征缩放的方法
+
+274
+00:09:48,850 --> 00:09:51,712
+applies to gradient descent for logistic regression.
+也适用于逻辑回归
+
+275
+00:09:51,730 --> 00:09:54,874
+And if you have features that are on very different scales.
+如果你的特征范围差距很大的话
+
+276
+00:09:54,890 --> 00:09:56,857
+Then applying feature scaling can also
+那么应用特征缩放的方法
+
+277
+00:09:56,857 --> 00:09:58,941
+make it, gradient descent, run
+同样也可以让逻辑回归中
+
+278
+00:09:58,941 --> 00:10:01,550
+faster for logistic regression.
+梯度下降收敛更快
+
+279
+00:10:01,550 --> 00:10:02,699
+So, that's it.
+就是这样
+
+280
+00:10:02,699 --> 00:10:04,552
+You now know how to implement
+现在你知道如何实现
+
+281
+00:10:04,552 --> 00:10:06,549
+logistic regression, and this
+逻辑回归
+
+282
+00:10:06,549 --> 00:10:08,918
+is a very powerful and
+这是一种非常强大
+
+283
+00:10:08,918 --> 00:10:10,441
+probably even most widely used
+甚至可能世界上使用最广泛的
+
+284
+00:10:10,441 --> 00:10:11,982
+classification algorithm in the world.
+一种分类算法
+
+285
+00:10:11,982 --> 00:10:14,130
+And you now know how we get to work with yourself.
+而现在你已经知道如何去实现它了
+
diff --git a/srt/6 - 6 - Advanced Optimization (14 min).srt b/srt/6 - 6 - Advanced Optimization (14 min).srt
new file mode 100644
index 00000000..6995f089
--- /dev/null
+++ b/srt/6 - 6 - Advanced Optimization (14 min).srt
@@ -0,0 +1,1951 @@
+1
+00:00:00,300 --> 00:00:01,680
+In the last video, we talked
+在上一个视频中 我们讨论了
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,990 --> 00:00:03,920
+about gradient descent for minimizing
+用梯度下降的方法最小化
+
+3
+00:00:04,440 --> 00:00:06,700
+the cost function J of theta for logistic regression.
+逻辑回归中代价函数 J(θ)
+
+4
+00:00:07,800 --> 00:00:08,930
+In this video, I'd like to
+在本次视频中 我会
+
+5
+00:00:09,020 --> 00:00:10,250
+tell you about some advanced
+教你们一些
+
+6
+00:00:10,850 --> 00:00:12,340
+optimization algorithms and some
+高级优化算法和一些
+
+7
+00:00:12,670 --> 00:00:14,060
+advanced optimization concepts.
+高级的优化概念
+
+8
+00:00:15,180 --> 00:00:16,480
+Using some of these ideas, we'll
+利用这些方法 我们就能够
+
+9
+00:00:16,630 --> 00:00:17,930
+be able to get logistic regression
+使通过梯度下降
+
+10
+00:00:19,010 --> 00:00:20,220
+to run much more quickly than
+进行逻辑回归的速度
+
+11
+00:00:20,350 --> 00:00:21,970
+it's possible with gradient descent.
+大大提高
+
+12
+00:00:22,880 --> 00:00:24,190
+And this will also let
+而这也将使
+
+13
+00:00:24,320 --> 00:00:26,060
+the algorithms scale much better
+算法更加适合解决
+
+14
+00:00:26,670 --> 00:00:28,030
+to very large machine learning problems,
+大型的机器学习问题
+
+15
+00:00:28,660 --> 00:00:30,950
+such as if we had a very large number of features.
+比如 我们有数目庞大的特征量
+
+16
+00:00:31,850 --> 00:00:33,360
+Here's an alternative view of
+现在我们换个角度
+
+17
+00:00:33,750 --> 00:00:34,910
+what gradient descent is doing.
+来看什么是梯度下降
+
+18
+00:00:35,590 --> 00:00:38,030
+We have some cost function J and we want to minimize it.
+我们有个代价函数 J 而我们想要使其最小化
+
+19
+00:00:38,950 --> 00:00:39,980
+So what we need to
+那么我们需要做的是
+
+20
+00:00:40,340 --> 00:00:41,080
+is, we need to write
+我们需要
+
+21
+00:00:41,330 --> 00:00:42,640
+code that can take
+编写代码
+
+22
+00:00:42,850 --> 00:00:44,980
+as input the parameters theta and
+当输入参数 θ 时
+
+23
+00:00:45,200 --> 00:00:46,470
+they can compute two things: J
+它们会计算出两样东西
+
+24
+00:00:46,700 --> 00:00:48,190
+of theta and these partial
+J(θ) 以及
+
+25
+00:00:48,620 --> 00:00:50,280
+derivative terms for, you
+J等于 0 1直到 n 时的
+
+26
+00:00:50,530 --> 00:00:51,820
+know, J equals 0, 1
+偏导数项
+
+27
+00:00:51,890 --> 00:00:53,700
+up to N. Given code that
+假设我们已经完成了
+
+28
+00:00:53,830 --> 00:00:54,980
+can do these two things, what
+可以实现这两件事的代码
+
+29
+00:00:55,160 --> 00:00:56,710
+gradient descent does is it
+那么梯度下降所做的就是
+
+30
+00:00:56,790 --> 00:00:58,620
+repeatedly performs the following update.
+反复执行这些更新
+
+31
+00:00:59,100 --> 00:00:59,100
+Right?
+对吧?
+
+32
+00:00:59,280 --> 00:01:00,500
+So given the code that
+所以给出我们
+
+33
+00:01:00,670 --> 00:01:01,750
+we wrote to compute these partial
+用于计算这些的偏导数的代码
+
+34
+00:01:02,090 --> 00:01:03,800
+derivatives, gradient descent plugs
+梯度下降法就把它插入
+
+35
+00:01:04,480 --> 00:01:07,330
+in here and uses that to update our parameters theta.
+到这里 从而来更新参数 θ
+
+36
+00:01:08,650 --> 00:01:09,590
+So another way of thinking
+因此另一种考虑
+
+37
+00:01:09,910 --> 00:01:11,070
+about gradient descent is that
+梯度下降的思路是
+
+38
+00:01:11,350 --> 00:01:12,670
+we need to supply code to
+我们需要写出代码
+
+39
+00:01:12,810 --> 00:01:14,050
+compute J of theta and
+来计算 J(θ) 和
+
+40
+00:01:14,230 --> 00:01:15,700
+these derivatives, and then
+这些偏导数 然后
+
+41
+00:01:15,900 --> 00:01:16,930
+these get plugged into gradient
+把这些插入到梯度下降中
+
+42
+00:01:17,370 --> 00:01:20,110
+descents, which can then try to minimize the function for us.
+然后它就可以为我们最小化这个函数
+
+43
+00:01:20,970 --> 00:01:21,970
+For gradient descent, I guess
+对于梯度下降来说 我认为
+
+44
+00:01:22,480 --> 00:01:23,790
+technically you don't actually need code
+从技术上讲 你实际并不需要编写代码
+
+45
+00:01:24,170 --> 00:01:26,520
+to compute the cost function J of theta.
+来计算代价函数 J(θ)
+
+46
+00:01:26,940 --> 00:01:28,980
+You only need code to compute the derivative terms.
+你只需要编写代码来计算导数项
+
+47
+00:01:29,740 --> 00:01:30,480
+But if you think of your
+但是 如果你希望
+
+48
+00:01:30,590 --> 00:01:32,300
+code as also monitoring convergence
+代码还要能够监控
+
+49
+00:01:33,000 --> 00:01:34,060
+of some such,
+这些 J(θ) 的收敛性
+
+50
+00:01:34,190 --> 00:01:35,440
+we'll just think of
+那么我们就
+
+51
+00:01:35,530 --> 00:01:37,380
+ourselves as providing code to
+需要自己编写代码
+
+52
+00:01:37,510 --> 00:01:38,530
+compute both the cost
+来计算
+
+53
+00:01:38,890 --> 00:01:40,250
+function and the derivative terms.
+代价函数和偏导数项
+
+54
+00:01:42,700 --> 00:01:44,130
+So, having written code to
+所以 在写完能够
+
+55
+00:01:44,280 --> 00:01:45,860
+compute these two things, one
+计算这两者的代码之后
+
+56
+00:01:46,090 --> 00:01:47,820
+algorithm we can use is gradient descent.
+我们就可以使用梯度下降
+
+57
+00:01:48,910 --> 00:01:51,590
+But gradient descent isn't the only algorithm we can use.
+但梯度下降并不是我们可以使用的唯一算法
+
+58
+00:01:52,280 --> 00:01:53,690
+And there are other algorithms,
+还有其他一些算法
+
+59
+00:01:54,330 --> 00:01:55,930
+more advanced, more sophisticated ones,
+更高级 更复杂
+
+60
+00:01:56,720 --> 00:01:57,880
+that, if we only provide
+如果我们能用
+
+61
+00:01:58,400 --> 00:01:59,520
+them a way to compute
+这些方法来计算
+
+62
+00:01:59,960 --> 00:02:01,550
+these two things, then these
+这两个项的话 那么这些算法
+
+63
+00:02:01,760 --> 00:02:03,040
+are different approaches to optimize
+就是为我们优化
+
+64
+00:02:03,490 --> 00:02:04,790
+the cost function for us.
+代价函数的不同方法
+
+65
+00:02:05,110 --> 00:02:07,910
+So conjugate gradient BFGS and
+共轭梯度法 BFGS (变尺度法) 和
+
+66
+00:02:08,110 --> 00:02:09,240
+L-BFGS are examples of more
+L-BFGS (限制变尺度法) 就是其中
+
+67
+00:02:09,460 --> 00:02:11,490
+sophisticated optimization algorithms that
+一些更高级的优化算法
+
+68
+00:02:11,640 --> 00:02:12,610
+need a way to compute J
+它们需要有一种方法来计算 J(θ)
+
+69
+00:02:12,810 --> 00:02:13,670
+of theta, and need a way
+以及需要一种方法
+
+70
+00:02:13,750 --> 00:02:15,430
+to compute the derivatives, and can
+计算导数项
+
+71
+00:02:15,670 --> 00:02:16,940
+then use more sophisticated
+然后使用比梯度下降更复杂
+
+72
+00:02:17,620 --> 00:02:19,880
+strategies than gradient descent to minimize the cost function.
+的算法来最小化代价函数
+
+73
+00:02:21,260 --> 00:02:22,560
+The details of exactly what
+这三种算法的具体细节
+
+74
+00:02:22,780 --> 00:02:25,920
+these three algorithms is well beyond the scope of this course.
+超出了本门课程的范畴
+
+75
+00:02:26,490 --> 00:02:28,200
+And in fact you often
+实际上你最后通常会
+
+76
+00:02:28,650 --> 00:02:30,570
+end up spending, you know, many days,
+花费很多天
+
+77
+00:02:31,060 --> 00:02:32,670
+or a small number of weeks studying these algorithms.
+或几周时间研究这些算法
+
+78
+00:02:33,240 --> 00:02:35,840
+If you take a class and advance the numerical computing.
+你可以专门学一门课来提高数值计算能力
+
+79
+00:02:36,920 --> 00:02:38,200
+But let me just tell you about some of their properties.
+不过让我来告诉你他们的一些特性
+
+80
+00:02:40,080 --> 00:02:42,150
+These three algorithms have a number of advantages.
+这三种算法有许多优点
+
+81
+00:02:42,900 --> 00:02:44,070
+One is that, with any
+一个是
+
+82
+00:02:44,290 --> 00:02:45,850
+of this algorithms you usually do
+使用这其中任何一个算法 你通常
+
+83
+00:02:46,000 --> 00:02:48,970
+not need to manually pick the learning rate alpha.
+不需要手动选择学习率 α
+
+84
+00:02:50,670 --> 00:02:51,450
+So one way to think
+所以对于
+
+85
+00:02:51,650 --> 00:02:53,630
+of these algorithms is that given
+这些算法的一种思路是 给出
+
+86
+00:02:54,230 --> 00:02:56,900
+is the way to compute the derivative and a cost function.
+计算导数项和代价函数的方法
+
+87
+00:02:57,320 --> 00:02:59,740
+You can think of these algorithms as having a clever inter-loop.
+你可以认为算法有一个智能的内部循环
+
+88
+00:03:00,060 --> 00:03:00,680
+And, in fact, they have a clever
+而且 事实上 他们确实有一个智能的
+
+89
+00:03:01,810 --> 00:03:03,780
+inter-loop called a line
+内部循环
+
+90
+00:03:04,200 --> 00:03:05,840
+search algorithm that automatically
+称为线性搜索(line search)算法 它可以自动
+
+91
+00:03:06,520 --> 00:03:08,010
+tries out different values for
+尝试不同的
+
+92
+00:03:08,080 --> 00:03:09,360
+the learning rate alpha and automatically
+学习速率 α 并自动
+
+93
+00:03:10,010 --> 00:03:11,090
+picks a good learning rate alpha
+选择一个好的学习速率 α
+
+94
+00:03:12,030 --> 00:03:12,900
+so that it can even pick
+因此它甚至可以
+
+95
+00:03:13,130 --> 00:03:14,570
+a different learning rate for every iteration.
+为每次迭代选择不同的学习速率
+
+96
+00:03:15,490 --> 00:03:18,230
+And so then you don't need to choose it yourself.
+那么你就不需要自己选择
+
+97
+00:03:21,430 --> 00:03:22,770
+These algorithms actually do
+这些算法实际上在做
+
+98
+00:03:22,910 --> 00:03:24,260
+more sophisticated things than just
+更复杂的事情 而不仅仅是
+
+99
+00:03:24,470 --> 00:03:25,640
+pick a good learning rate, and
+选择一个好的学习率
+
+100
+00:03:25,800 --> 00:03:27,300
+so they often end up
+所以它们往往最终
+
+101
+00:03:27,490 --> 00:03:30,320
+converging much faster than gradient descent.
+收敛得远远快于梯度下降
+
+102
+00:03:32,470 --> 00:03:33,740
+These algorithms actually do more
+这些算法实际上在做
+
+103
+00:03:33,980 --> 00:03:35,160
+sophisticated things than just
+更复杂的事情 不仅仅是
+
+104
+00:03:35,360 --> 00:03:36,740
+pick a good learning rate, and
+选择一个好的学习速率
+
+105
+00:03:36,880 --> 00:03:38,770
+so they often end up converging much
+所以它们往往最终
+
+106
+00:03:39,020 --> 00:03:40,840
+faster than gradient descent, but
+比梯度下降收敛得快多了 不过
+
+107
+00:03:41,040 --> 00:03:42,230
+detailed discussion of exactly
+关于它们到底做什么的详细讨论
+
+108
+00:03:42,710 --> 00:03:44,420
+what they do is beyond the scope of this course.
+已经超过了本门课程的范围
+
+109
+00:03:45,580 --> 00:03:47,060
+In fact, I actually used
+实际上 我过去
+
+110
+00:03:47,570 --> 00:03:49,020
+to have used these algorithms for
+使用这些算法
+
+111
+00:03:49,170 --> 00:03:50,170
+a long time, like maybe over
+已经很长一段时间了 也许超过
+
+112
+00:03:50,470 --> 00:03:53,070
+a decade, quite frequently, and it
+十年了 使用得相当频繁
+
+113
+00:03:53,290 --> 00:03:54,410
+was only, you know, a
+而直到几年前
+
+114
+00:03:54,510 --> 00:03:55,460
+few years ago that I actually
+我才真正
+
+115
+00:03:56,150 --> 00:03:57,200
+figured out for myself the details
+搞清楚
+
+116
+00:03:57,780 --> 00:04:00,220
+of what conjugate gradient, BFGS and O-BFGS do.
+共轭梯度法 BFGS 和 L-BFGS的细节
+
+117
+00:04:00,980 --> 00:04:02,740
+So it is actually entirely possible
+因此 实际上完全有可能
+
+118
+00:04:03,560 --> 00:04:05,380
+to use these algorithms successfully and
+成功使用这些算法
+
+119
+00:04:05,480 --> 00:04:06,530
+apply to lots of different learning
+并应用于许多不同的学习
+
+120
+00:04:06,780 --> 00:04:08,490
+problems without actually understanding
+问题 而不需要真正理解
+
+121
+00:04:09,460 --> 00:04:11,140
+the inter-loop of what these algorithms do.
+这些算法的内环间在做什么
+
+122
+00:04:12,270 --> 00:04:13,630
+If these algorithms have a disadvantage,
+如果说这些算法有缺点的话
+
+123
+00:04:14,200 --> 00:04:15,350
+I'd say that the main
+那么我想说主要
+
+124
+00:04:15,610 --> 00:04:16,970
+disadvantage is that they're
+缺点是它们比
+
+125
+00:04:17,110 --> 00:04:19,390
+quite a lot more complex than gradient descent.
+梯度下降法复杂多了
+
+126
+00:04:20,180 --> 00:04:21,700
+And in particular, you probably should
+特别是你最好
+
+127
+00:04:21,970 --> 00:04:23,290
+not implement these algorithms
+不要使用 L-BGFS BFGS这些算法
+
+128
+00:04:23,850 --> 00:04:26,060
+- conjugate gradient, L-BGFS, BFGS -
+共轭梯度 L-BGFS BFGS
+
+129
+00:04:26,360 --> 00:04:29,520
+yourself unless you're an expert in numerical computing.
+除非你是数值计算方面的专家
+
+130
+00:04:30,720 --> 00:04:32,320
+Instead, just as I
+实际上
+
+131
+00:04:32,420 --> 00:04:33,640
+wouldn't recommend that you write
+我不会建议你们编写
+
+132
+00:04:33,850 --> 00:04:35,240
+your own code to compute square
+自己的代码来计算
+
+133
+00:04:35,590 --> 00:04:36,660
+roots of numbers or to
+数据的平方根或者
+
+134
+00:04:36,770 --> 00:04:39,010
+compute inverses of matrices, for
+计算逆矩阵
+
+135
+00:04:39,140 --> 00:04:40,600
+these algorithms also what I
+因为对于这些算法我
+
+136
+00:04:40,710 --> 00:04:42,530
+would recommend you do is just use a software library.
+还是会建议你直接使用一个软件库
+
+137
+00:04:43,030 --> 00:04:43,770
+So, you know, to take a square
+所以 要求一个平方根
+
+138
+00:04:44,120 --> 00:04:44,940
+root what all of us
+我们所能做的
+
+139
+00:04:45,150 --> 00:04:46,440
+do is use some function
+就是调用一些
+
+140
+00:04:47,080 --> 00:04:48,310
+that someone else has
+别人已经
+
+141
+00:04:48,530 --> 00:04:50,200
+written to compute the square roots of our numbers.
+写好用来计算数字平方根的函数
+
+142
+00:04:51,330 --> 00:04:53,530
+And fortunately, Octave and
+幸运的是 有 Octave 和
+
+143
+00:04:53,760 --> 00:04:55,070
+the closely related language MATLAB
+与它密切相关的 MATLAB 语言
+
+144
+00:04:55,430 --> 00:04:57,110
+- we'll be using that -
+我们将会用到它们
+
+145
+00:04:57,140 --> 00:04:58,370
+Octave has a very good. Has a pretty
+Octave 有一个非常
+
+146
+00:04:58,530 --> 00:05:02,410
+reasonable library implementing some of these advanced optimization algorithms.
+理想的库用于实现这些先进的优化算法
+
+147
+00:05:03,380 --> 00:05:04,350
+And so if you just use
+所以 如果你直接调用
+
+148
+00:05:04,600 --> 00:05:06,800
+the built-in library, you know, you get pretty good results.
+它自带的库 你就能得到不错的结果
+
+149
+00:05:08,010 --> 00:05:08,880
+I should say that there is
+我必须指出
+
+150
+00:05:09,370 --> 00:05:10,880
+a difference between good
+这些算法
+
+151
+00:05:11,230 --> 00:05:12,740
+and bad implementations of these algorithms.
+实现得好或不好是有区别的
+
+152
+00:05:13,690 --> 00:05:15,010
+And so, if you're using a
+因此 如果你正在你的
+
+153
+00:05:15,120 --> 00:05:16,270
+different language for your machine
+机器学习程序中使用一种不同的语言
+
+154
+00:05:16,470 --> 00:05:17,560
+learning application, if you're using
+比如如果你正在使用
+
+155
+00:05:18,190 --> 00:05:20,090
+C, C++, Java, and
+C C + + Java
+
+156
+00:05:20,250 --> 00:05:24,060
+so on, you
+等等 你
+
+157
+00:05:24,210 --> 00:05:24,710
+might want to try out a couple
+可能会想尝试一些
+
+158
+00:05:24,730 --> 00:05:25,660
+of different libraries to make sure that you find a
+不同的库 以确保你找到一个
+
+159
+00:05:25,740 --> 00:05:27,790
+good library for implementing these algorithms.
+能很好实现这些算法的库
+
+160
+00:05:28,250 --> 00:05:29,410
+Because there is a difference in
+因为
+
+161
+00:05:29,480 --> 00:05:30,740
+performance between a good implementation
+在 L-BFGS 或者等高线梯度的
+
+162
+00:05:31,680 --> 00:05:33,150
+of, you know, contour gradient or
+实现上
+
+163
+00:05:33,530 --> 00:05:35,150
+LPFGS versus less good
+表现得好与不太好
+
+164
+00:05:35,350 --> 00:05:37,680
+implementation of contour gradient or LPFGS.
+是有差别的
+
+165
+00:05:43,060 --> 00:05:44,310
+So now let's explain how
+因此现在让我们来说明
+
+166
+00:05:44,580 --> 00:05:47,080
+to use these algorithms, I'm going to do so with an example.
+如何使用这些算法 我打算举一个例子
+
+167
+00:05:48,970 --> 00:05:50,220
+Let's say that you have a
+比方说 你有一个
+
+168
+00:05:50,370 --> 00:05:51,620
+problem with two parameters
+含两个参数的问题
+
+169
+00:05:53,380 --> 00:05:55,580
+equals theta zero and theta one.
+这两个参数是 θ0 和 θ1
+
+170
+00:05:56,410 --> 00:05:57,450
+And let's say your cost function
+那么你的成本函数
+
+171
+00:05:57,970 --> 00:05:59,210
+is J of theta equals theta
+J(θ)等于 θ1
+
+172
+00:05:59,430 --> 00:06:01,540
+one minus five squared, plus theta two minus five squared.
+减去5的平方 再加上 θ2 减5的平方
+
+173
+00:06:02,630 --> 00:06:04,080
+So with this cost function.
+因此 通过这个代价函数
+
+174
+00:06:04,590 --> 00:06:06,960
+You know the value for theta 1 and theta 2.
+你可以得到 θ1 和 θ2 的值
+
+175
+00:06:07,080 --> 00:06:09,590
+If you want to minimize J of theta as a function of theta.
+如果你将 J(θ) 最小化的话
+
+176
+00:06:09,940 --> 00:06:10,910
+The value that minimizes it is
+那么它的最小值
+
+177
+00:06:11,030 --> 00:06:12,040
+going to be theta 1
+将是 θ1
+
+178
+00:06:12,420 --> 00:06:14,220
+equals 5, theta 2 equals equals five.
+等于5 θ2 等于5
+
+179
+00:06:15,230 --> 00:06:16,620
+Now, again, I know some of
+我知道你们当中
+
+180
+00:06:16,950 --> 00:06:18,320
+you know more calculus than others,
+有些人比别人微积分更好
+
+181
+00:06:19,010 --> 00:06:20,770
+but the derivatives of the
+但是你应该知道代价函数 J 的导数
+
+182
+00:06:20,850 --> 00:06:23,420
+cost function J turn out to be these two expressions.
+推出来就是这两个表达式
+
+183
+00:06:24,270 --> 00:06:25,060
+I've done the calculus.
+我已经写在这儿了
+
+184
+00:06:26,260 --> 00:06:27,250
+So if you want to apply
+那么你就可以应用
+
+185
+00:06:27,480 --> 00:06:29,220
+one of the advanced optimization algorithms
+高级优化算法里的一个
+
+186
+00:06:29,810 --> 00:06:31,380
+to minimize cost function J.
+来最小化代价函数 J
+
+187
+00:06:31,660 --> 00:06:32,630
+So, you know, if we
+所以 如果我们
+
+188
+00:06:32,880 --> 00:06:34,680
+didn't know the minimum was at
+不知道最小值
+
+189
+00:06:34,780 --> 00:06:36,140
+5, 5, but if you want to have
+是5 5 但你想要
+
+190
+00:06:36,240 --> 00:06:37,550
+a cost function 5 the minimum
+代价函数找到这个最小值
+
+191
+00:06:37,970 --> 00:06:39,840
+numerically using something like
+是用比如
+
+192
+00:06:40,040 --> 00:06:41,560
+gradient descent but preferably more
+梯度下降这些算法 但最好是用
+
+193
+00:06:41,730 --> 00:06:43,430
+advanced than gradient descent, what
+比它更高级的算法
+
+194
+00:06:43,550 --> 00:06:45,010
+you would do is implement an octave
+你要做的就是运行一个
+
+195
+00:06:45,570 --> 00:06:46,690
+function like this, so we
+像这样的 Octave 函数 那么我们
+
+196
+00:06:46,860 --> 00:06:48,190
+implement a cost function,
+运行一个函数
+
+197
+00:06:49,210 --> 00:06:51,180
+cost function theta function like that,
+比如 costFunction
+
+198
+00:06:52,180 --> 00:06:53,250
+and what this does is that
+这个函数的作用就是
+
+199
+00:06:53,380 --> 00:06:55,660
+it returns two arguments, the
+它会返回两个值
+
+200
+00:06:55,760 --> 00:06:57,780
+first J-val, is how
+第一个是 jVal 它是
+
+201
+00:06:58,910 --> 00:07:00,020
+we would compute the cost function
+我们计算的代价函数 J
+
+202
+00:07:00,680 --> 00:07:01,780
+J. And so this says J-val
+所以说 jVal
+
+203
+00:07:02,080 --> 00:07:03,210
+equals, you know, theta
+等于 theta(1)
+
+204
+00:07:03,440 --> 00:07:04,630
+one minus five squared plus theta
+减5的平方加
+
+205
+00:07:05,330 --> 00:07:06,230
+two minus five squared.
+theta(2) 减5的平方
+
+206
+00:07:06,540 --> 00:07:09,140
+So it's just computing this cost function over here.
+这样就计算出这个代价函数
+
+207
+00:07:10,540 --> 00:07:12,040
+And the second argument that
+函数返回的第二个值是
+
+208
+00:07:12,260 --> 00:07:14,190
+this function returns is gradient.
+梯度值
+
+209
+00:07:14,840 --> 00:07:16,030
+So gradient is going to
+梯度值应该是
+
+210
+00:07:16,160 --> 00:07:17,320
+be a two by one vector,
+一个2×1的向量
+
+211
+00:07:18,870 --> 00:07:20,050
+and the two elements of the
+梯度向量的两个元素
+
+212
+00:07:20,120 --> 00:07:22,100
+gradient vector correspond to
+对应
+
+213
+00:07:22,800 --> 00:07:24,670
+the two partial derivative terms over here.
+这里的两个偏导数项
+
+214
+00:07:27,150 --> 00:07:28,570
+Having implemented this cost function,
+运行这个 costFunction 函数后
+
+215
+00:07:29,580 --> 00:07:30,390
+you would, you can then
+你就可以
+
+216
+00:07:31,510 --> 00:07:33,010
+call the advanced optimization
+调用高级的优化函数
+
+217
+00:07:34,270 --> 00:07:35,720
+function called the fminunc
+这个函数叫 fminunc
+
+218
+00:07:35,950 --> 00:07:36,900
+- it stands for function
+它表示
+
+219
+00:07:37,610 --> 00:07:39,360
+minimization unconstrained in Octave
+Octave 里无约束最小化函数
+
+220
+00:07:40,300 --> 00:07:41,520
+-and the way you call this is as follows.
+调用它的方式如下
+
+221
+00:07:41,790 --> 00:07:42,350
+You set a few options.
+你要设置几个 options
+
+222
+00:07:43,230 --> 00:07:43,580
+This is a options
+这个 options 变量
+
+223
+00:07:44,330 --> 00:07:46,680
+as a data structure that stores the options you want.
+作为一个数据结构可以存储你想要的 options
+
+224
+00:07:47,320 --> 00:07:48,960
+So grant up on,
+所以 GradObj 和 On
+
+225
+00:07:49,160 --> 00:07:52,100
+this sets the gradient objective parameter to on.
+这里设置梯度目标参数为打开(on)
+
+226
+00:07:52,270 --> 00:07:55,180
+It just means you are indeed going to provide a gradient to this algorithm.
+这意味着你现在确实要给这个算法提供一个梯度
+
+227
+00:07:56,150 --> 00:07:57,550
+I'm going to set the maximum number
+然后设置最大
+
+228
+00:07:57,840 --> 00:07:59,280
+of iterations to, let's say, one hundred.
+迭代次数 比方说 100
+
+229
+00:07:59,580 --> 00:08:02,230
+We're going give it an initial guess for theta.
+我们给出一个 θ 的猜测初始值
+
+230
+00:08:02,720 --> 00:08:03,680
+There's a 2 by 1 vector.
+它是一个2×1的向量
+
+231
+00:08:04,440 --> 00:08:06,860
+And then this command calls fminunc.
+那么这个命令就调用 fminunc
+
+232
+00:08:07,530 --> 00:08:10,290
+This at symbol presents a
+这个@符号表示
+
+233
+00:08:10,420 --> 00:08:11,810
+pointer to the cost function
+指向我们刚刚定义的
+
+234
+00:08:13,010 --> 00:08:14,320
+that we just defined up there.
+costFunction 函数的指针
+
+235
+00:08:15,060 --> 00:08:16,020
+And if you call this,
+如果你调用它
+
+236
+00:08:16,270 --> 00:08:18,290
+this will compute, you know, will use
+它就会
+
+237
+00:08:18,620 --> 00:08:20,490
+one of the more advanced optimization algorithms.
+使用众多高级优化算法中的一个
+
+238
+00:08:21,110 --> 00:08:23,350
+And if you want to think it as just like gradient descent.
+当然你也可以把它当成梯度下降
+
+239
+00:08:23,690 --> 00:08:25,170
+But automatically choosing the learning
+只不过它能自动选择
+
+240
+00:08:25,500 --> 00:08:27,290
+rate alpha for so you don't have to do so yourself.
+学习速率α 你不需要自己来做
+
+241
+00:08:28,210 --> 00:08:29,880
+But it will then attempt to
+然后它会尝试
+
+242
+00:08:30,160 --> 00:08:32,000
+use the sort of advanced optimization algorithms.
+使用这些高级的优化算法
+
+243
+00:08:32,640 --> 00:08:33,770
+Like gradient descent on steroids.
+就像加强版的梯度下降法
+
+244
+00:08:34,400 --> 00:08:36,490
+To try to find the optimal value of theta for you.
+为你找到最佳的 θ 值
+
+245
+00:08:37,180 --> 00:08:39,040
+Let me actually show you what this looks like in Octave.
+让我告诉你它在 Octave 里什么样
+
+246
+00:08:40,690 --> 00:08:42,460
+So I've written this cost function
+所以我写了这个关于theta的
+
+247
+00:08:42,900 --> 00:08:46,440
+of theta function exactly as we had it on the previous line.
+的 costFunction 函数 跟前面幻灯片中一样
+
+248
+00:08:46,650 --> 00:08:49,070
+It computes J-val which is the cost function.
+它计算出代价函数 jval
+
+249
+00:08:49,920 --> 00:08:51,810
+And it computes the gradient with
+以及梯度 gradient
+
+250
+00:08:52,040 --> 00:08:53,050
+the two elements being the partial
+gradient 有两个元素
+
+251
+00:08:53,450 --> 00:08:54,430
+derivatives of the cost function
+是代价函数对于
+
+252
+00:08:55,220 --> 00:08:56,200
+with respect to, you know,
+theta(1) 和 theta(2) 这两个参数的
+
+253
+00:08:56,360 --> 00:08:57,910
+the two parameters, theta one and theta two.
+偏导数
+
+254
+00:08:59,040 --> 00:09:00,360
+Now let's switch to my Octave window.
+现在 让我们切换到Octave窗口
+
+255
+00:09:00,710 --> 00:09:02,900
+I'm gonna type in those commands I had just now.
+我把刚刚的命令敲进去
+
+256
+00:09:03,470 --> 00:09:05,850
+So, options equals optimset. This is
+options = optimset 这是
+
+257
+00:09:06,630 --> 00:09:08,510
+the notation for setting my
+在我的优化算法的 options上
+
+258
+00:09:09,670 --> 00:09:11,190
+parameters on my options,
+设置参数
+
+259
+00:09:11,710 --> 00:09:13,850
+for my optimization algorithm. Grant option on, maxIter, 100
+的记号
+
+260
+00:09:14,130 --> 00:09:17,600
+so that says 100
+这样就是100
+
+261
+00:09:18,310 --> 00:09:19,610
+iterations, and I am
+次迭代
+
+262
+00:09:19,730 --> 00:09:22,090
+going to provide the gradient to my algorithm.
+我现在要给我的算法提供梯度值
+
+263
+00:09:23,490 --> 00:09:27,190
+Let's say initial theta equals zero's two by one.
+设置 theta 的初始值是一个2×1的零向量
+
+264
+00:09:27,980 --> 00:09:29,280
+So that's my initial guess for theta.
+这是我猜测的 theta 初始值
+
+265
+00:09:30,500 --> 00:09:31,390
+And now I have of theta,
+现在我就可以
+
+266
+00:09:32,620 --> 00:09:35,100
+function val exit flag
+写出三个返回值
+
+267
+00:09:37,610 --> 00:09:39,430
+equals fminunc constraint.
+[optTheta, functionVal, exitFlag] 等于
+
+268
+00:09:40,570 --> 00:09:41,600
+A pointer to the cost function.
+指向代价函数的指针 @costFunction
+
+269
+00:09:43,010 --> 00:09:44,700
+and provide my initial guess.
+我猜测的初始值 initialTheta
+
+270
+00:09:46,090 --> 00:09:49,060
+And the options like so.
+还有options
+
+271
+00:09:49,820 --> 00:09:52,760
+And if I hit enter this will run the optimization algorithm.
+如果我敲回车 这个就会运行优化算法
+
+272
+00:09:53,940 --> 00:09:54,810
+And it returns pretty quickly.
+它很快返回值
+
+273
+00:09:55,790 --> 00:09:57,040
+This funny formatting that's because
+这个格式很有意思
+
+274
+00:09:57,430 --> 00:09:58,430
+my line, you know, my
+因为我的代码
+
+275
+00:09:59,700 --> 00:10:00,290
+code wrapped around.
+是被缠住了
+
+276
+00:10:00,680 --> 00:10:02,540
+So, this funny thing
+所以这个有点意思
+
+277
+00:10:02,760 --> 00:10:04,890
+is just because my command line had wrapped around.
+完全是因为我的命令行被绕住了
+
+278
+00:10:05,490 --> 00:10:06,290
+But what this says is that
+不过这里只是
+
+279
+00:10:06,550 --> 00:10:08,500
+numerically renders, you know, think
+数字上的一些问题
+
+280
+00:10:08,670 --> 00:10:10,400
+of it as gradient descent
+把它看成是加强版梯度下降
+
+281
+00:10:10,440 --> 00:10:11,620
+on steroids, they found the optimal value of
+它们找到 theta 的最优值
+
+282
+00:10:11,760 --> 00:10:13,150
+a theta is theta 1
+是 theta(1) 为5 theta(2) 也为5
+
+283
+00:10:13,400 --> 00:10:15,670
+equals 5, theta 2 equals 5, exactly as we're hoping for.
+这正是我们希望的
+
+284
+00:10:16,520 --> 00:10:18,760
+The function value at the
+functionVal 的值
+
+285
+00:10:18,840 --> 00:10:21,430
+optimum is essentially 10 to the minus 30.
+实际上是10的-30次幂
+
+286
+00:10:21,670 --> 00:10:23,160
+So that's essentially zero, which
+所以 这基本上就是0
+
+287
+00:10:23,370 --> 00:10:24,760
+is also what we're hoping for.
+这也是我们所希望的
+
+288
+00:10:24,840 --> 00:10:27,060
+And the exit flag is
+exitFlag为1
+
+289
+00:10:27,240 --> 00:10:29,080
+1, and this shows
+这说明它的状态
+
+290
+00:10:29,730 --> 00:10:31,400
+what the convergence status of this.
+是已经收敛了的
+
+291
+00:10:31,800 --> 00:10:33,010
+And if you want you can do
+你也可以运行
+
+292
+00:10:33,150 --> 00:10:35,020
+help fminunc to
+help fminunc 命令
+
+293
+00:10:35,130 --> 00:10:36,480
+read the documentation for how
+去查阅相关资料
+
+294
+00:10:36,680 --> 00:10:38,650
+to interpret the exit flag.
+以理解 exitFlag 的作用
+
+295
+00:10:38,760 --> 00:10:41,600
+But the exit flag let's you verify whether or not this algorithm thing has converged.
+exitFlag可以让你确定该算法是否已经收敛
+
+296
+00:10:43,960 --> 00:10:46,450
+So that's how you run these algorithms in Octave.
+这就是在 Octave 里运行这些算法的过程
+
+297
+00:10:47,480 --> 00:10:48,920
+I should mention, by the way,
+哦对了 这里我得指出
+
+298
+00:10:48,940 --> 00:10:51,020
+that for the Octave implementation, this value
+用 Octave 运行的时候
+
+299
+00:10:51,640 --> 00:10:53,010
+of theta, your parameter vector
+向量θ的值 θ的参数向量
+
+300
+00:10:53,370 --> 00:10:54,940
+of theta, must be in
+必须是 d 维的
+
+301
+00:10:55,280 --> 00:10:58,210
+rd for d greater than or equal to 2.
+d 大于等于2
+
+302
+00:10:58,450 --> 00:11:00,330
+So if theta is just a real number.
+所以 θ 仅仅是一个实数
+
+303
+00:11:00,770 --> 00:11:02,040
+So, if it is not at least
+因此如果它不是
+
+304
+00:11:02,160 --> 00:11:03,160
+a two-dimensional vector
+一个至少二维的向量
+
+305
+00:11:03,800 --> 00:11:04,860
+or some higher than two-dimensional
+或高于二维的向量
+
+306
+00:11:05,160 --> 00:11:06,840
+vector, this fminunc
+fminunc 就可能无法运算
+
+307
+00:11:07,560 --> 00:11:08,760
+may not work, so and if
+因此如果你有一个
+
+308
+00:11:09,140 --> 00:11:10,310
+in case you have a
+一维的函数需要优化
+
+309
+00:11:10,590 --> 00:11:11,590
+one-dimensional function that you use
+一维的函数需要优化
+
+310
+00:11:11,830 --> 00:11:12,930
+to optimize, you can look
+你可以查找 Octave 里 fminuc 函数的资料
+
+311
+00:11:13,100 --> 00:11:14,680
+in the octave documentation for fminunc
+来得到更多的细节
+
+312
+00:11:14,950 --> 00:11:16,230
+for additional details.
+来得到更多的细节
+
+313
+00:11:18,230 --> 00:11:19,360
+So, that's how we optimize
+这就是我们如何优化
+
+314
+00:11:19,620 --> 00:11:21,640
+our trial example of this
+一个例子的过程 这是一个
+
+315
+00:11:22,190 --> 00:11:23,810
+simple quick driving cost function.
+简单的二次代价函数
+
+316
+00:11:24,440 --> 00:11:26,520
+However, we apply this to let's just say progression.
+我们如果把它应用到逻辑回归中呢
+
+317
+00:11:27,720 --> 00:11:29,270
+In logistic regression we have
+在逻辑回归中 我们有
+
+318
+00:11:29,520 --> 00:11:31,290
+a parameter vector theta, and
+一个参数向量 theta
+
+319
+00:11:31,430 --> 00:11:32,210
+I'm going to use a mix
+我要混合使用
+
+320
+00:11:32,620 --> 00:11:34,880
+of octave notation and sort of math notation.
+Octave 记号和数学符号
+
+321
+00:11:35,300 --> 00:11:36,400
+But I hope this explanation
+我希望这个写法很明确
+
+322
+00:11:36,870 --> 00:11:38,050
+will be clear, but our parameter
+我们的参数 theta
+
+323
+00:11:38,520 --> 00:11:40,360
+vector theta comprises these
+由 θ0 到 θn 组成
+
+324
+00:11:40,540 --> 00:11:41,780
+parameters theta 0 through theta
+由 θ0 到 θn 组成
+
+325
+00:11:42,210 --> 00:11:44,230
+n because octave indexes,
+因为在 Octave 的标号中
+
+326
+00:11:46,090 --> 00:11:48,040
+vectors using indexing from
+向量的标号是从1开始的
+
+327
+00:11:48,460 --> 00:11:49,640
+1, you know, theta 0 is
+在 Octave 里 θ0实际上
+
+328
+00:11:49,710 --> 00:11:51,190
+actually written theta 1
+写成 theta(1)
+
+329
+00:11:51,330 --> 00:11:53,290
+in octave, theta 1 is gonna be written.
+因此用 theta(1) 表示第一个参数 θ0
+
+330
+00:11:53,930 --> 00:11:54,690
+So, if theta 2 in octave
+然后有 theta(2)
+
+331
+00:11:55,280 --> 00:11:56,180
+and that's gonna be a written
+接下来写到
+
+332
+00:11:56,780 --> 00:11:58,430
+theta n+1, right?
+theta(n+1) 对吧
+
+333
+00:11:58,610 --> 00:12:00,650
+And that's because Octave indexes
+这是因为 Octave 的记号
+
+334
+00:12:01,320 --> 00:12:03,070
+is vectors starting from index
+是向量从1开始的
+
+335
+00:12:03,430 --> 00:12:05,200
+of 1 and so the index of 0.
+而不是从0开始
+
+336
+00:12:06,920 --> 00:12:07,950
+So what we need
+因此 我们需要
+
+337
+00:12:08,160 --> 00:12:09,670
+to do then is write a
+做的是写一个
+
+338
+00:12:09,880 --> 00:12:12,070
+cost function that captures
+costFunction 函数 它为
+
+339
+00:12:12,710 --> 00:12:14,210
+the cost function for logistic regression.
+逻辑回归求得代价函数
+
+340
+00:12:15,170 --> 00:12:16,450
+Concretely, the cost function
+具体点说 costFunction 函数
+
+341
+00:12:16,880 --> 00:12:18,310
+needs to return J-val, which is, you know, J-val
+需要返回 jVal 值
+
+342
+00:12:18,940 --> 00:12:20,430
+as you need some codes to
+因此需要一些代码
+
+343
+00:12:20,640 --> 00:12:22,440
+compute J of theta and
+来计算 J(θ)
+
+344
+00:12:22,710 --> 00:12:24,010
+we also need to give it the gradient.
+我们也需要给出梯度值 gradient
+
+345
+00:12:24,540 --> 00:12:25,460
+So, gradient 1 is going
+那么 gradient(1)
+
+346
+00:12:25,920 --> 00:12:27,080
+to be some code to compute
+对应用来计算代价函数
+
+347
+00:12:27,280 --> 00:12:29,100
+the partial derivative in respect to
+关于 θ0 的偏导数
+
+348
+00:12:29,390 --> 00:12:31,250
+theta 0, the next partial
+接下去关于 θ1 的偏导数
+
+349
+00:12:31,600 --> 00:12:34,300
+derivative respect to theta 1 and so on.
+依此类推
+
+350
+00:12:34,770 --> 00:12:36,260
+Once again, this is gradient
+再次强调 这是 gradient(1)
+
+351
+00:12:37,500 --> 00:12:38,390
+1, gradient 2 and so
+gradient(2) 等等
+
+352
+00:12:39,030 --> 00:12:40,330
+on, rather than gradient 0, gradient
+而不是gradient(0) gradient(1)
+
+353
+00:12:40,500 --> 00:12:42,730
+1 because octave indexes
+因为 Octave 的标号
+
+354
+00:12:43,460 --> 00:12:46,200
+is vectors starting from one rather than from zero.
+是从1开始 而不是从0开始的
+
+355
+00:12:47,440 --> 00:12:48,460
+But the main concept I hope
+我希望你们从这个幻灯片中
+
+356
+00:12:48,690 --> 00:12:49,540
+you take away from this slide
+学到的主要内容是
+
+357
+00:12:49,900 --> 00:12:50,870
+is, that what you need to do,
+你所要做的是
+
+358
+00:12:51,070 --> 00:12:54,370
+is write a function that returns
+写一个函数 它能返回
+
+359
+00:12:55,500 --> 00:12:56,930
+the cost function and returns the gradient.
+代价函数值 以及梯度值
+
+360
+00:12:58,410 --> 00:12:59,750
+And so in order to
+因此要把这个
+
+361
+00:12:59,960 --> 00:13:01,410
+apply this to logistic regression
+应用到逻辑回归
+
+362
+00:13:02,100 --> 00:13:03,430
+or even to linear regression, if
+或者甚至线性回归中
+
+363
+00:13:03,560 --> 00:13:06,230
+you want to use these optimization algorithms for linear regression.
+你也可以把这些优化算法用于线性回归
+
+364
+00:13:07,340 --> 00:13:08,350
+What you need to do is plug in
+你需要做的就是输入
+
+365
+00:13:08,500 --> 00:13:09,960
+the appropriate code to compute
+合适的代码来计算
+
+366
+00:13:10,820 --> 00:13:12,280
+these things over here.
+这里的这些东西
+
+367
+00:13:15,100 --> 00:13:17,910
+So, now you know how to use these advanced optimization algorithms.
+现在你已经知道如何使用这些高级的优化算法
+
+368
+00:13:19,030 --> 00:13:21,170
+Because, using, because for
+有了这些算法
+
+369
+00:13:21,320 --> 00:13:22,660
+these algorithms, you're using a
+你就可以使用一个
+
+370
+00:13:22,870 --> 00:13:25,190
+sophisticated optimization library, it makes
+复杂的优化库
+
+371
+00:13:25,690 --> 00:13:26,710
+the just a little bit
+它让算法使用起来更模糊一点
+
+372
+00:13:26,940 --> 00:13:28,510
+more opaque and so
+more opaque and so
+
+373
+00:13:28,740 --> 00:13:30,390
+just maybe a little bit harder to debug.
+因此也许稍微有点难调试
+
+374
+00:13:31,290 --> 00:13:32,660
+But because these algorithms often
+不过由于这些算法的运行速度
+
+375
+00:13:33,010 --> 00:13:34,370
+run much faster than gradient descent,
+通常远远超过梯度下降
+
+376
+00:13:35,010 --> 00:13:36,760
+often quite typically whenever
+因此当我有一个很大的
+
+377
+00:13:37,060 --> 00:13:38,180
+I have a large machine learning
+机器学习问题时
+
+378
+00:13:38,410 --> 00:13:39,500
+problem, I will use
+我会选择这些高级算法
+
+379
+00:13:39,760 --> 00:13:42,110
+these algorithms instead of using gradient descent.
+而不是梯度下降
+
+380
+00:13:43,900 --> 00:13:45,070
+And with these ideas, hopefully,
+有了这些概念
+
+381
+00:13:45,450 --> 00:13:46,710
+you'll be able to get logistic progression
+你就应该能将逻辑回归
+
+382
+00:13:47,350 --> 00:13:48,780
+and also linear regression to work
+和线性回归应用于
+
+383
+00:13:49,100 --> 00:13:51,410
+on much larger problems.
+更大的问题中
+
+384
+00:13:51,830 --> 00:13:53,820
+So, that's it for advanced optimization concepts.
+这就是高级优化的概念
+
+385
+00:13:55,120 --> 00:13:56,170
+And in the next and
+在下一个视频
+
+386
+00:13:56,320 --> 00:13:57,720
+final video on Logistic Regression,
+也就是逻辑回归这一部分的最后一个视频中
+
+387
+00:13:58,550 --> 00:13:59,470
+I want to tell you how to
+我想要告诉你如何
+
+388
+00:13:59,600 --> 00:14:00,990
+take the logistic regression algorithm
+修改你已经知道的逻辑回归算法
+
+389
+00:14:01,520 --> 00:14:02,790
+that you already know about and make
+然后使它在多类别分类问题中
+
+390
+00:14:02,990 --> 00:14:05,420
+it work also on multi-class classification problems.
+也能正常运行
+
diff --git a/srt/6 - 7 - Multiclass Classification_ One-vs-all (6 min).srt b/srt/6 - 7 - Multiclass Classification_ One-vs-all (6 min).srt
new file mode 100644
index 00000000..025567df
--- /dev/null
+++ b/srt/6 - 7 - Multiclass Classification_ One-vs-all (6 min).srt
@@ -0,0 +1,911 @@
+1
+00:00:00,200 --> 00:00:01,596
+In this video we'll talk about
+在本节视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,620 --> 00:00:03,659
+how to get logistic regression to
+我们将谈到如何使用逻辑回归 (logistic regression)
+
+3
+00:00:03,659 --> 00:00:06,089
+work for multi-class classification problems,
+来解决多类别分类问题
+
+4
+00:00:06,089 --> 00:00:07,526
+and in particular I want to
+具体来说
+
+5
+00:00:07,526 --> 00:00:12,070
+tell you about an algorithm called one-versus-all classification.
+我想通过一个叫做"一对多" (one-vs-all) 的分类算法
+
+6
+00:00:12,150 --> 00:00:14,316
+What's a multi-class classification problem?
+让你了解什么是多类别分类问题
+
+7
+00:00:14,316 --> 00:00:15,945
+Here are some examples.
+先看这样一些例子
+
+8
+00:00:15,945 --> 00:00:17,318
+Let's say you want a learning
+假如说你现在需要
+
+9
+00:00:17,320 --> 00:00:19,691
+algorithm to automatically put your
+一个学习算法 能自动地
+
+10
+00:00:19,710 --> 00:00:21,076
+email into different folders or
+将邮件归类到不同的文件夹里
+
+11
+00:00:21,076 --> 00:00:23,398
+to automatically tag your emails.
+或者说可以自动地加上标签
+
+12
+00:00:23,398 --> 00:00:24,749
+So, you might have different folders
+那么 你也许需要一些不同的文件夹
+
+13
+00:00:24,790 --> 00:00:27,052
+or different tags for work email,
+或者不同的标签来完成这件事
+
+14
+00:00:27,060 --> 00:00:28,236
+email from your friends, email
+来区分开来自工作的邮件、来自朋友的邮件
+
+15
+00:00:28,236 --> 00:00:31,561
+from your family and emails about your hobby.
+来自家人的邮件或者是有关兴趣爱好的邮件
+
+16
+00:00:31,590 --> 00:00:33,145
+And so, here, we have
+那么
+
+17
+00:00:33,145 --> 00:00:34,856
+a classification problem with 4
+我们就有了
+
+18
+00:00:34,900 --> 00:00:36,164
+classes, which we might
+这样一个分类问题
+
+19
+00:00:36,180 --> 00:00:38,129
+assign the numbers, the classes
+其类别有四个
+
+20
+00:00:38,129 --> 00:00:41,326
+y1, y2, y3 and
+分别用y=1、y=2、y=3、
+
+21
+00:00:41,326 --> 00:00:43,530
+y4 to, Another
+y=4 来代表
+
+22
+00:00:44,490 --> 00:00:45,790
+example for a medical
+另一个例子是有关药物诊断的
+
+23
+00:00:46,000 --> 00:00:47,260
+diagnosis: if a patient
+如果一个病人
+
+24
+00:00:47,800 --> 00:00:48,910
+comes into your office with
+因为鼻塞
+
+25
+00:00:48,930 --> 00:00:51,395
+maybe a stuffy nose, the possible
+来到你的诊所
+
+26
+00:00:51,395 --> 00:00:52,762
+diagnoses could be that
+他可能并没有生病
+
+27
+00:00:52,762 --> 00:00:54,140
+they're not ill, maybe that's
+用 y=1 这个类别来代表
+
+28
+00:00:54,140 --> 00:00:55,474
+y1; or they have
+或者患了感冒 用 y=2 来代表
+
+29
+00:00:55,490 --> 00:00:59,026
+a cold, 2; or they have the flu.
+或者得了流感 y=3
+
+30
+00:00:59,026 --> 00:01:00,541
+And the third and final example,
+第三个例子 也是最后一个例子
+
+31
+00:01:00,541 --> 00:01:02,056
+if you are using machine learning
+如果你正在做有关
+
+32
+00:01:02,090 --> 00:01:03,906
+to classify the weather, you know,
+天气的机器学习分类问题
+
+33
+00:01:03,910 --> 00:01:05,299
+maybe you want to decide that
+那么你可能想要区分
+
+34
+00:01:05,299 --> 00:01:07,937
+the weather is sunny, cloudy, rainy
+哪些天是晴天、多云、雨天、
+
+35
+00:01:07,950 --> 00:01:10,211
+or snow, or if there's gonna be snow.
+或者下雪天
+
+36
+00:01:10,230 --> 00:01:11,165
+And so, in all of these
+对上述所有的例子
+
+37
+00:01:11,165 --> 00:01:12,808
+examples, Y can take
+y 可以取
+
+38
+00:01:12,808 --> 00:01:14,300
+on a small number of
+一个很小的数值
+
+39
+00:01:14,300 --> 00:01:16,498
+discreet values, maybe 1 to
+一个相对"谨慎"的数值
+
+40
+00:01:16,498 --> 00:01:17,810
+3, 1 to 4 and so on, and
+比如1到3、1到4或者其它数值
+
+41
+00:01:17,890 --> 00:01:20,659
+these are multi-class classification problems.
+以上说的都是多类分类问题
+
+42
+00:01:20,659 --> 00:01:21,904
+And by the way, it doesn't really
+顺便一提的是
+
+43
+00:01:21,904 --> 00:01:23,632
+matter whether we index as
+对于下标是 0 1 2 3
+
+44
+00:01:23,632 --> 00:01:27,063
+0123 or as 1234, I tend
+还是 1 2 3 4 都不重要
+
+45
+00:01:27,090 --> 00:01:29,138
+to index that my classes
+我更喜欢将分类
+
+46
+00:01:29,138 --> 00:01:31,569
+starting from 1 rather than starting from 0.
+从 1 开始标而不是 0
+
+47
+00:01:31,569 --> 00:01:33,756
+But either way, where often, it really doesn't matter.
+其实怎样标注都不会影响最后的结果
+
+48
+00:01:33,756 --> 00:01:35,243
+Whereas previously, for a
+然而对于之前的一个
+
+49
+00:01:35,243 --> 00:01:39,375
+binary classification problem, our data sets look like this.
+二元分类问题 我们的数据看起来可能是像这样
+
+50
+00:01:39,375 --> 00:01:41,617
+For a multi-class classification problem, our
+对于一个多类分类问题
+
+51
+00:01:41,617 --> 00:01:42,792
+data sets may look like
+我们的数据集
+
+52
+00:01:42,792 --> 00:01:44,362
+this, where here, I'm using
+或许看起来像这样
+
+53
+00:01:44,362 --> 00:01:48,399
+three different symbols to represent our three classes.
+我用三种不同的符号来代表三个类别
+
+54
+00:01:48,410 --> 00:01:49,858
+So, the question is: Given the
+问题就是
+
+55
+00:01:49,858 --> 00:01:51,613
+data set with three classes
+给出三个类型的数据集
+
+56
+00:01:51,613 --> 00:01:53,193
+where this is a the
+这是一个类别中的样本
+
+57
+00:01:53,193 --> 00:01:54,651
+example of one class, that's
+而这个样本是属于
+
+58
+00:01:54,651 --> 00:01:55,768
+the example of the different class,
+另一个类别
+
+59
+00:01:55,790 --> 00:01:58,389
+and, that's the example of yet, the third class.
+而这个样本属于第三个类别
+
+60
+00:01:58,410 --> 00:02:01,421
+How do we get a learning algorithm to work for the setting?
+我们如何得到一个学习算法来进行分类呢?
+
+61
+00:02:01,421 --> 00:02:02,598
+We already know how to
+我们现在已经知道如何
+
+62
+00:02:02,598 --> 00:02:05,096
+do binary classification, using logistic
+进行二元分类
+
+63
+00:02:05,096 --> 00:02:06,594
+regression, we know how the,
+可以使用逻辑斯特回归
+
+64
+00:02:06,594 --> 00:02:07,736
+you know, maybe, for the straight line,
+对于直线或许你也知道
+
+65
+00:02:07,736 --> 00:02:10,613
+to separate the positive and negative classes.
+可以将数据集一分为二为正类和负类
+
+66
+00:02:10,613 --> 00:02:12,116
+Using an idea called one
+用一对多的
+
+67
+00:02:12,116 --> 00:02:14,399
+versus all classification, we can
+分类思想
+
+68
+00:02:14,400 --> 00:02:15,730
+then take this, and, make
+我们可以
+
+69
+00:02:15,730 --> 00:02:18,646
+it work for multi-class classification, as well.
+将其用在多类分类问题上
+
+70
+00:02:18,650 --> 00:02:21,617
+Here's how one versus all classification works.
+下面将介绍如何进行一对多的分类工作
+
+71
+00:02:21,620 --> 00:02:25,777
+And, this is also sometimes called "one versus rest."
+有时这个方法也被称为"一对余"方法
+
+72
+00:02:25,777 --> 00:02:26,941
+Let's say, we have a training
+现在我们有一个训练集
+
+73
+00:02:26,941 --> 00:02:28,138
+set, like that shown on the
+好比左边表示的
+
+74
+00:02:28,150 --> 00:02:30,456
+left, where we have 3 classes.
+有三个类别
+
+75
+00:02:30,470 --> 00:02:32,310
+So, if y1, we denote that
+我们用三角形表示 y=1
+
+76
+00:02:32,310 --> 00:02:34,405
+with a triangle if y2 the
+方框表示 y=2
+
+77
+00:02:34,405 --> 00:02:37,970
+square and, if y3 then, the cross.
+叉叉表示 y=3
+
+78
+00:02:37,980 --> 00:02:39,460
+What we're going to do is,
+我们下面要做的就是
+
+79
+00:02:39,480 --> 00:02:41,350
+take a training set, and, turn
+使用一个训练集
+
+80
+00:02:41,350 --> 00:02:44,816
+this into three separate binary classification problems.
+将其分成三个二元分类问题
+
+81
+00:02:44,816 --> 00:02:46,719
+So, I'll turn this into three separate
+所以我将它分成三个
+
+82
+00:02:46,750 --> 00:02:49,450
+two class classification problems.
+二元分类问题
+
+83
+00:02:49,450 --> 00:02:51,660
+So let's start with Class 1, which is a triangle.
+我们先从用三角形代表的类别1开始
+
+84
+00:02:51,660 --> 00:02:52,990
+We are going to essentially create a
+实际上我们可以创建一个
+
+85
+00:02:53,050 --> 00:02:55,418
+new, sort of fake training set.
+新的"伪"训练集
+
+86
+00:02:55,440 --> 00:02:56,913
+where classes 2 and 3
+类型2和类型3
+
+87
+00:02:56,920 --> 00:02:58,151
+get assigned to the negative
+定为负类
+
+88
+00:02:58,151 --> 00:02:59,873
+class and class 1
+类型1
+
+89
+00:02:59,873 --> 00:03:01,134
+gets assigned to the positive class
+设定为正类
+
+90
+00:03:01,134 --> 00:03:02,352
+when we create a new training
+我们创建一个新的
+
+91
+00:03:02,380 --> 00:03:03,700
+set if that's showing
+训练集
+
+92
+00:03:03,700 --> 00:03:05,508
+on the right and we're going
+如右侧所示的那样
+
+93
+00:03:05,508 --> 00:03:07,573
+to fit a classifier, which I'm
+我们要拟合出一个合适的分类器
+
+94
+00:03:07,573 --> 00:03:10,200
+going to call h subscript theta
+我们称其为
+
+95
+00:03:10,220 --> 00:03:12,626
+superscript 1 of x
+h 下标 θ 上标(1) (x)
+
+96
+00:03:12,640 --> 00:03:15,659
+where here, the triangles
+这里的三角形是正样本
+
+97
+00:03:15,659 --> 00:03:19,008
+are the positive examples and the circles are the negative examples.
+而圆形代表负样本
+
+98
+00:03:19,008 --> 00:03:20,649
+So, think of the triangles be
+可以这样想
+
+99
+00:03:20,649 --> 00:03:21,800
+assigned the value of 1
+设置三角形的值为1
+
+100
+00:03:21,800 --> 00:03:25,291
+and the circles the sum, the value of zero.
+圆形的值为0
+
+101
+00:03:25,300 --> 00:03:26,723
+And we're just going to train
+下面我们来训练一个标准的
+
+102
+00:03:26,723 --> 00:03:29,556
+a standard logistic regression crossfire
+逻辑回归分类器
+
+103
+00:03:29,556 --> 00:03:34,173
+and maybe that will give us a position boundary.
+这样我们就得到一个正边界
+
+104
+00:03:34,173 --> 00:03:34,173
+OK?
+对吧?
+
+105
+00:03:34,890 --> 00:03:37,693
+The superscript 1 here is the class one.
+这里上标(1)表示类别1
+
+106
+00:03:37,693 --> 00:03:40,777
+So, we're doing this for the triangle first class.
+我们可以像这样对三角形类别这么做
+
+107
+00:03:40,800 --> 00:03:42,302
+Next, we do the same thing for class 2.
+下面 我们将为类别2做同样的工作
+
+108
+00:03:42,302 --> 00:03:44,013
+Going to take the squares and
+取这些方块样本
+
+109
+00:03:44,020 --> 00:03:45,456
+assign the squares as the
+然后将这些方块
+
+110
+00:03:45,470 --> 00:03:47,001
+positive class and assign
+作为正样本
+
+111
+00:03:47,001 --> 00:03:50,213
+every thing else the triangles and the crosses as the negative class.
+设其它的为三角形和叉形类别为负样本
+
+112
+00:03:50,220 --> 00:03:54,173
+and then we fit a second logistic regression classifier.
+这样我们找到第二个合适的逻辑回归分类器
+
+113
+00:03:54,173 --> 00:03:56,410
+I'm gonna call this H of X
+我们称为 h 下标 θ 上标(2) (x)
+
+114
+00:03:56,420 --> 00:03:58,352
+superscript 2, where the
+其中上标(2)表示
+
+115
+00:03:58,352 --> 00:04:00,029
+superscript 2 denotes that
+是类别2
+
+116
+00:04:00,029 --> 00:04:01,860
+we're now doing this: treating the
+所以我们做的就是
+
+117
+00:04:01,870 --> 00:04:03,310
+square class as the positive
+把方块类当做正样本
+
+118
+00:04:03,350 --> 00:04:07,518
+class and maybe we get the classifier like that.
+我们可能便会得到这样的一个分类器
+
+119
+00:04:07,518 --> 00:04:08,854
+And finally, we do the
+最后 同样地
+
+120
+00:04:08,854 --> 00:04:10,143
+same thing for the third
+我们对第三个类别采用同样的方法
+
+121
+00:04:10,143 --> 00:04:11,598
+class and fit a third
+并找出
+
+122
+00:04:11,610 --> 00:04:14,632
+classifier H superscript 3
+第三个分类器 h 下标 θ 上标(3) (x)
+
+123
+00:04:14,632 --> 00:04:16,424
+of X and maybe this
+或许这么做
+
+124
+00:04:16,440 --> 00:04:18,106
+will give us a decision boundary
+可以给出一个像这样的
+
+125
+00:04:18,106 --> 00:04:19,749
+or give us a classifier that separates
+判别边界
+
+126
+00:04:19,750 --> 00:04:22,863
+the positive and negative examples like that.
+或者说分类器 能这样分开正负样本
+
+127
+00:04:22,870 --> 00:04:24,353
+So, to summarize, what we've
+总而言之
+
+128
+00:04:24,353 --> 00:04:27,872
+done is we fit 3 classifiers.
+我们已经拟合出三个分类器
+
+129
+00:04:27,890 --> 00:04:29,403
+So, for I equals 1
+对于 i 等于1、2、3
+
+130
+00:04:29,403 --> 00:04:31,836
+2 3 we'll fit a classifier
+我们都找到了一个分类器
+
+131
+00:04:31,880 --> 00:04:33,855
+H superscript I subscript theta
+h 上标(i) 下标θ 括号 x
+
+132
+00:04:33,855 --> 00:04:35,193
+of X, thus trying to
+通过这样来尝试
+
+133
+00:04:35,220 --> 00:04:36,446
+estimate what is the
+估计出
+
+134
+00:04:36,450 --> 00:04:38,208
+probability that y is
+给出 x 和先验 θ 时
+
+135
+00:04:38,208 --> 00:04:41,834
+equal to class I given X and prioritize by theta.
+y的值等于 i 的概率
+
+136
+00:04:41,834 --> 00:04:41,834
+Right?
+对么?
+
+137
+00:04:41,834 --> 00:04:43,229
+So, in the first
+在一开始
+
+138
+00:04:43,230 --> 00:04:44,903
+instance, for this first one
+对于第一个在这里的
+
+139
+00:04:44,910 --> 00:04:47,277
+up here, this classifier
+分类器
+
+140
+00:04:47,280 --> 00:04:49,364
+with learning to by the triangle.
+完成了对三角形的识别
+
+141
+00:04:49,364 --> 00:04:52,037
+So it's thinking of the triangles as a positive class.
+把三角形当做是正类别
+
+142
+00:04:52,060 --> 00:04:53,840
+So, X superscript one is
+所以 h(1) 实际上是在计算
+
+143
+00:04:53,840 --> 00:04:55,163
+essentially trying to estimate what is
+给定x 以 θ 为参数时
+
+144
+00:04:55,170 --> 00:04:57,343
+the probability that the Y
+y的值为1的
+
+145
+00:04:57,350 --> 00:04:59,083
+is equal to one, given
+概率是多少
+
+146
+00:04:59,083 --> 00:05:02,037
+X and parametrized by theta.
+概率是多少
+
+147
+00:05:02,037 --> 00:05:04,475
+And similarly, this is treating,
+同样地 这个也是这么处理
+
+148
+00:05:04,480 --> 00:05:05,859
+you know, the square class as
+矩形类型当做一个正类别
+
+149
+00:05:05,859 --> 00:05:07,400
+a positive took pause so its
+同样地
+
+150
+00:05:07,400 --> 00:05:10,748
+trying to estimate the probability that y2 and so on.
+可以计算出 y=2 的概率和其它的概率值来
+
+151
+00:05:10,750 --> 00:05:13,300
+So we now have 3 classifiers each
+现在我们便有了三个分类器
+
+152
+00:05:13,310 --> 00:05:16,649
+of which was trained is one of the three crosses.
+且每个分类器都作为其中一种情况进行训练
+
+153
+00:05:16,670 --> 00:05:17,859
+Just to summarize, what we've
+总之
+
+154
+00:05:17,860 --> 00:05:19,685
+done is we've, we want
+我们已经把要做的做完了
+
+155
+00:05:19,700 --> 00:05:21,280
+to train a logistic regression
+现在要做的就是训练这个
+
+156
+00:05:21,300 --> 00:05:23,560
+classifier, H superscript I
+逻辑回归分类器 h(i)
+
+157
+00:05:23,560 --> 00:05:24,947
+of X, for each plus
+逻辑回归分类器 h(i)
+
+158
+00:05:24,950 --> 00:05:26,183
+i that predicts it's probably a
+其中 i 对应每一个可能的 y=i
+
+159
+00:05:26,183 --> 00:05:28,550
+y i. Finally, to
+最后
+
+160
+00:05:28,570 --> 00:05:29,740
+make a prediction when we
+为了做出预测
+
+161
+00:05:29,820 --> 00:05:31,772
+give it a new input x and
+我们给出输入一个新的 x 值
+
+162
+00:05:31,772 --> 00:05:33,326
+we want to make a prediction,
+用这个做预测
+
+163
+00:05:33,340 --> 00:05:34,729
+we do is we just
+我们要做的
+
+164
+00:05:34,730 --> 00:05:36,706
+run Let's say three
+就是
+
+165
+00:05:36,706 --> 00:05:38,557
+what run our 3 of our
+在我们三个分类器
+
+166
+00:05:38,557 --> 00:05:40,010
+classifiers on the input
+里面输入 x
+
+167
+00:05:40,010 --> 00:05:41,535
+x and we then
+然后
+
+168
+00:05:41,535 --> 00:05:44,068
+pick the class i that maximizes the three.
+我们选择一个让 h 最大的 i
+
+169
+00:05:44,068 --> 00:05:45,387
+So, we just you know, basically
+你现在知道了
+
+170
+00:05:45,387 --> 00:05:47,180
+pick the classifier, pick whichever
+基本的挑选分类器的方法
+
+171
+00:05:47,180 --> 00:05:49,163
+one of the three classifiers is
+选择出哪一个分类器是
+
+172
+00:05:49,210 --> 00:05:52,178
+most confident, or most enthusistically
+可信度最高效果最好的
+
+173
+00:05:52,178 --> 00:05:54,352
+says that it thinks it has a right class.
+那么就可认为得到一个正确的分类
+
+174
+00:05:54,352 --> 00:05:56,153
+So, whichever value of i
+无论i值是多少
+
+175
+00:05:56,190 --> 00:05:58,069
+gives us the highest probability, we
+我们都有最高的概率值
+
+176
+00:05:58,069 --> 00:06:01,056
+then predict y to be that value.
+我们预测 y 就是那个值
+
+177
+00:06:02,660 --> 00:06:04,453
+So, that's it for multi-class
+这就是多类别分类问题
+
+178
+00:06:04,470 --> 00:06:07,677
+classification and one-versus-all method.
+以及一对多的方法
+
+179
+00:06:07,677 --> 00:06:09,120
+And with this little method
+通过这个小方法
+
+180
+00:06:09,120 --> 00:06:10,521
+you can now take the logistic
+你现在也可以将
+
+181
+00:06:10,521 --> 00:06:12,033
+regression classifier and make
+逻辑回归分类器
+
+182
+00:06:12,033 --> 00:06:15,051
+it work on multi-class classification problems as well.
+用在多类分类的问题上
+
diff --git a/srt/7 - 1 - The Problem of Overfitting (10 min).srt b/srt/7 - 1 - The Problem of Overfitting (10 min).srt
new file mode 100644
index 00000000..6300673a
--- /dev/null
+++ b/srt/7 - 1 - The Problem of Overfitting (10 min).srt
@@ -0,0 +1,1366 @@
+1
+00:00:00,360 --> 00:00:01,753
+By now, you've seen a
+到现在为止 你已经见识了
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,760 --> 00:00:04,097
+couple different learning algorithms, linear
+几种不同的学习算法
+
+3
+00:00:04,097 --> 00:00:06,504
+regression and logistic regression.
+包括线性回归和逻辑回归
+
+4
+00:00:06,510 --> 00:00:08,583
+They work well for many problems,
+它们能够有效地解决许多问题
+
+5
+00:00:08,583 --> 00:00:09,684
+but when you apply them
+但是当将它们应用到
+
+6
+00:00:09,684 --> 00:00:11,903
+to certain machine learning applications, they
+某些特定的机器学习应用时
+
+7
+00:00:11,903 --> 00:00:13,889
+can run into a problem called
+会遇到过度拟合(over-fitting)的问题
+
+8
+00:00:13,900 --> 00:00:18,052
+overfitting that can cause them to perform very poorly.
+可能会导致它们效果很差
+
+9
+00:00:18,052 --> 00:00:18,866
+What I'd like to do in
+在这段视频中
+
+10
+00:00:18,866 --> 00:00:20,393
+this video is explain to
+我将为你解释
+
+11
+00:00:20,393 --> 00:00:22,400
+you what is this overfitting
+什么是过度拟合问题
+
+12
+00:00:22,400 --> 00:00:24,083
+problem, and in the
+并且
+
+13
+00:00:24,083 --> 00:00:25,861
+next few videos after this,
+在此之后接下来的几个视频中
+
+14
+00:00:25,861 --> 00:00:27,759
+we'll talk about a technique called
+我们将谈论一种
+
+15
+00:00:27,760 --> 00:00:29,787
+regularization, that will allow
+称为正则化(regularization)的技术
+
+16
+00:00:29,787 --> 00:00:31,529
+us to ameliorate or to
+它可以改善或者
+
+17
+00:00:31,529 --> 00:00:33,607
+reduce this overfitting problem and
+减少过度拟合问题
+
+18
+00:00:33,607 --> 00:00:36,844
+get these learning algorithms to maybe work much better.
+以使学习算法更好实现
+
+19
+00:00:36,860 --> 00:00:39,607
+So what is overfitting?
+那么什么是过度拟合呢?
+
+20
+00:00:39,607 --> 00:00:41,616
+Let's keep using our running
+让我们继续使用
+
+21
+00:00:41,620 --> 00:00:44,030
+example of predicting housing
+那个用线性回归
+
+22
+00:00:44,050 --> 00:00:46,146
+prices with linear regression
+来预测房价的例子
+
+23
+00:00:46,146 --> 00:00:47,123
+where we want to predict the
+我们通过建立
+
+24
+00:00:47,123 --> 00:00:50,730
+price as a function of the size of the house.
+以住房面积为自变量的函数来预测房价
+
+25
+00:00:50,730 --> 00:00:51,870
+One thing we could do is
+我们可以
+
+26
+00:00:51,910 --> 00:00:53,620
+fit a linear function to
+对该数据做线性回归
+
+27
+00:00:53,620 --> 00:00:54,892
+this data, and if we
+如果这么做
+
+28
+00:00:54,892 --> 00:00:56,296
+do that, maybe we get
+我们也许能够获得
+
+29
+00:00:56,296 --> 00:00:58,913
+that sort of straight line fit to the data.
+拟合数据的这样一条直线
+
+30
+00:00:58,913 --> 00:01:01,012
+But this isn't a very good model.
+但是 这不是一个很好的模型
+
+31
+00:01:01,012 --> 00:01:02,543
+Looking at the data, it seems
+我们看看这些数据
+
+32
+00:01:02,560 --> 00:01:04,100
+pretty clear that as the
+很明显
+
+33
+00:01:04,100 --> 00:01:06,274
+size of the housing increases, the
+随着房子面积增大
+
+34
+00:01:06,274 --> 00:01:08,268
+housing prices plateau, or kind
+住房价格的变化趋于稳定
+
+35
+00:01:08,270 --> 00:01:11,721
+of flattens out as we move to the right and so
+或者越往右越平缓
+
+36
+00:01:11,740 --> 00:01:14,020
+this algorithm does not
+因此该算法
+
+37
+00:01:14,020 --> 00:01:15,898
+fit the training and we
+没有很好拟合训练数据
+
+38
+00:01:15,898 --> 00:01:19,166
+call this problem underfitting, and
+我们把这个问题称为欠拟合(underfitting)
+
+39
+00:01:19,180 --> 00:01:20,494
+another term for this is
+这个问题的另一个术语叫做
+
+40
+00:01:20,500 --> 00:01:24,666
+that this algorithm has high bias.
+高偏差(bias)
+
+41
+00:01:25,140 --> 00:01:26,841
+Both of these roughly
+这两种说法大致相似
+
+42
+00:01:26,890 --> 00:01:30,760
+mean that it's just not even fitting the training data very well.
+意思是它只是没有很好地拟合训练数据
+
+43
+00:01:30,760 --> 00:01:32,328
+The term is kind of
+这个词是
+
+44
+00:01:32,328 --> 00:01:34,515
+a historical or technical one,
+过去传下来的一个专业名词
+
+45
+00:01:34,515 --> 00:01:36,109
+but the idea is that
+它的意思是
+
+46
+00:01:36,110 --> 00:01:37,303
+if a fitting a straight line to
+如果拟合一条直线
+
+47
+00:01:37,303 --> 00:01:38,909
+the data, then, it's as
+到训练数据
+
+48
+00:01:38,920 --> 00:01:40,290
+if the algorithm has a
+就好像算法
+
+49
+00:01:40,330 --> 00:01:42,638
+very strong preconception, or a
+有一个很强的偏见
+
+50
+00:01:42,638 --> 00:01:44,633
+very strong bias that housing
+或者说非常大的偏差
+
+51
+00:01:44,650 --> 00:01:46,339
+prices are going to vary
+因为该算法认为房子价格与面积仅仅线性相关
+
+52
+00:01:46,339 --> 00:01:49,988
+linearly with their size and despite the data to the contrary.
+尽管与该数据的事实相反
+
+53
+00:01:50,000 --> 00:01:51,281
+Despite the evidence of the
+尽管相反的证据
+
+54
+00:01:51,290 --> 00:01:54,174
+contrary is preconceptions still
+被事前定义为
+
+55
+00:01:54,174 --> 00:01:55,413
+are bias, still closes
+偏差 它还是接近于
+
+56
+00:01:55,440 --> 00:01:56,974
+it to fit a straight line
+拟合一条直线
+
+57
+00:01:56,974 --> 00:02:00,638
+and this ends up being a poor fit to the data.
+而此法最终导致拟合数据效果很差
+
+58
+00:02:00,638 --> 00:02:02,173
+Now, in the middle, we could
+我们现在可以在中间
+
+59
+00:02:02,210 --> 00:02:04,626
+fit a quadratic functions enter and,
+加入一个二次项
+
+60
+00:02:04,626 --> 00:02:06,222
+with this data set, we fit the
+在这组数据中
+
+61
+00:02:06,222 --> 00:02:07,793
+quadratic function, maybe, we get
+我们用二次函数来拟合它
+
+62
+00:02:07,810 --> 00:02:10,211
+that kind of curve
+然后可以拟合出一条曲线
+
+63
+00:02:10,211 --> 00:02:14,361
+and, that works pretty well.
+事实证明这个拟合效果很好
+
+64
+00:02:14,361 --> 00:02:17,543
+And, at the other extreme, would be if we were to fit, say a fourth other polynomial to the data.
+另一个极端情况是 如果我们拟合一个四次多项式
+
+65
+00:02:17,550 --> 00:02:19,442
+So, here we have five parameters,
+因此在这里我们有五个参数
+
+66
+00:02:19,470 --> 00:02:23,196
+theta zero through theta four,
+θ0到θ4
+
+67
+00:02:23,210 --> 00:02:23,926
+and, with that, we can actually fill a curve
+这样我们可以拟合一条曲线
+
+68
+00:02:23,926 --> 00:02:26,727
+that process through all five of our training examples.
+通过我们的五个训练样本
+
+69
+00:02:26,727 --> 00:02:29,507
+You might get a curve that looks like this.
+你可以得到看上去如此的一条曲线
+
+70
+00:02:31,260 --> 00:02:32,454
+That, on the one
+一方面
+
+71
+00:02:32,460 --> 00:02:33,791
+hand, seems to do
+似乎
+
+72
+00:02:33,791 --> 00:02:35,052
+a very good job fitting the
+对训练数据
+
+73
+00:02:35,052 --> 00:02:36,291
+training set and, that is
+做了一个很好的拟合
+
+74
+00:02:36,291 --> 00:02:38,269
+processed through all of my data, at least.
+因为这条曲线通过了所有的训练实例
+
+75
+00:02:38,270 --> 00:02:40,284
+But, this is still a very wiggly curve, right?
+但是 这仍然是一条扭曲的曲线 对吧?
+
+76
+00:02:40,300 --> 00:02:41,660
+So, it's going up and down all
+它不停上下波动
+
+77
+00:02:41,660 --> 00:02:43,430
+over the place, and, we don't actually
+因此事实上
+
+78
+00:02:43,430 --> 00:02:46,996
+think that's such a good model for predicting housing prices.
+我们并不认为它是一个预测房价的好模型
+
+79
+00:02:47,000 --> 00:02:48,924
+So, this problem we
+所以 这个问题我们把他叫做
+
+80
+00:02:48,924 --> 00:02:51,967
+call overfitting, and, another
+过度拟合或过拟合(overfitting)
+
+81
+00:02:51,970 --> 00:02:53,165
+term for this is that
+另一个描述该问题的术语是
+
+82
+00:02:53,170 --> 00:02:57,304
+this algorithm has high variance..
+高方差(variance)
+
+83
+00:02:57,890 --> 00:02:59,951
+The term high variance is another
+高方差是另一个
+
+84
+00:02:59,951 --> 00:03:02,110
+historical or technical one.
+历史上的叫法
+
+85
+00:03:02,130 --> 00:03:03,797
+But, the intuition is that,
+但是 从第一印象上来说
+
+86
+00:03:03,800 --> 00:03:05,080
+if we're fitting such a high
+如果我们拟合一个
+
+87
+00:03:05,080 --> 00:03:07,326
+order polynomial, then, the
+高阶多项式 那么
+
+88
+00:03:07,330 --> 00:03:08,603
+hypothesis can fit, you know,
+这个函数能很好的拟合训练集
+
+89
+00:03:08,620 --> 00:03:09,584
+it's almost as if it can
+能拟合几乎所有的
+
+90
+00:03:09,584 --> 00:03:11,995
+fit almost any function and
+训练数据
+
+91
+00:03:11,995 --> 00:03:14,159
+this face of possible hypothesis
+这就面临可能函数太过庞大的问题
+
+92
+00:03:14,159 --> 00:03:16,601
+is just too large, it's too variable.
+变量太多
+
+93
+00:03:16,610 --> 00:03:18,052
+And we don't have enough data
+同时如果我们没有足够的数据
+
+94
+00:03:18,052 --> 00:03:19,279
+to constrain it to give
+去约束这个变量过多的模型
+
+95
+00:03:19,279 --> 00:03:22,714
+us a good hypothesis so that's called overfitting.
+那么这就是过度拟合
+
+96
+00:03:22,740 --> 00:03:24,340
+And in the middle, there isn't really
+在两者之间的情况 叫"刚好合适"
+
+97
+00:03:24,350 --> 00:03:26,990
+a name but I'm just going to write, you know, just right.
+这并不是一个真正的名词 我只是把它写在这里
+
+98
+00:03:26,990 --> 00:03:29,911
+Where a second degree polynomial, quadratic function
+这个二次多项式 二次函数
+
+99
+00:03:29,911 --> 00:03:32,559
+seems to be just right for fitting this data.
+可以说是恰好拟合这些数据
+
+100
+00:03:32,559 --> 00:03:34,684
+To recap a bit the
+概括地说
+
+101
+00:03:34,690 --> 00:03:37,042
+problem of over fitting comes
+过度拟合的问题
+
+102
+00:03:37,042 --> 00:03:38,258
+when if we have
+将会在变量过多的时候
+
+103
+00:03:38,258 --> 00:03:40,729
+too many features, then to
+发生
+
+104
+00:03:40,729 --> 00:03:43,881
+learn hypothesis may fit the training side very well.
+这种时候训练出的方程总能很好的拟合训练数据
+
+105
+00:03:43,881 --> 00:03:46,023
+So, your cost function
+所以 你的代价函数
+
+106
+00:03:46,023 --> 00:03:47,344
+may actually be very close
+实际上可能非常接近于0
+
+107
+00:03:47,344 --> 00:03:48,446
+to zero or may be
+或者
+
+108
+00:03:48,446 --> 00:03:50,750
+even zero exactly, but you
+就是0
+
+109
+00:03:50,750 --> 00:03:52,063
+may then end up with a
+但是
+
+110
+00:03:52,063 --> 00:03:53,950
+curve like this that, you
+这样的曲线
+
+111
+00:03:53,950 --> 00:03:55,314
+know tries too hard to
+它千方百计的拟合于训练数据
+
+112
+00:03:55,314 --> 00:03:57,103
+fit the training set, so that it
+这样导致
+
+113
+00:03:57,110 --> 00:03:59,233
+even fails to generalize to
+它无法泛化到
+
+114
+00:03:59,250 --> 00:04:01,117
+new examples and fails to
+新的数据样本中
+
+115
+00:04:01,120 --> 00:04:03,018
+predict prices on new examples
+以至于无法预测新样本价格
+
+116
+00:04:03,050 --> 00:04:04,337
+as well, and here the
+在这里
+
+117
+00:04:04,350 --> 00:04:06,853
+term generalized refers to
+术语"泛化"
+
+118
+00:04:06,853 --> 00:04:10,868
+how well a hypothesis applies even to new examples.
+指的是一个假设模型能够应用到新样本的能力
+
+119
+00:04:10,868 --> 00:04:12,274
+That is to data to
+新样本数据是
+
+120
+00:04:12,320 --> 00:04:16,467
+houses that it has not seen in the training set.
+没有出现在训练集中的房子
+
+121
+00:04:16,600 --> 00:04:17,910
+On this slide, we looked at
+在这张幻灯片上 我们看到了
+
+122
+00:04:17,910 --> 00:04:20,802
+over fitting for the case of linear regression.
+线性回归情况下的过拟合
+
+123
+00:04:20,810 --> 00:04:24,182
+A similar thing can apply to logistic regression as well.
+类似的方法同样可以应用到逻辑回归
+
+124
+00:04:24,190 --> 00:04:26,090
+Here is a logistic regression
+这里是一个以x1与x2为变量的
+
+125
+00:04:26,090 --> 00:04:28,871
+example with two features X1 and x2.
+逻辑回归
+
+126
+00:04:28,910 --> 00:04:30,136
+One thing we could do, is
+我们可以做的就是
+
+127
+00:04:30,140 --> 00:04:31,522
+fit logistic regression with
+用这样一个简单的假设模型
+
+128
+00:04:31,522 --> 00:04:34,518
+just a simple hypothesis like this,
+来拟合逻辑回归
+
+129
+00:04:34,530 --> 00:04:38,076
+where, as usual, G is my sigmoid function.
+和以前一样 字母g代表S型函数
+
+130
+00:04:38,120 --> 00:04:39,334
+And if you do that, you end up
+如果这样做
+
+131
+00:04:39,334 --> 00:04:41,593
+with a hypothesis, trying to
+你会得到一个假设模型
+
+132
+00:04:41,600 --> 00:04:42,923
+use, maybe, just a straight
+这个假设模型是一条直线
+
+133
+00:04:42,923 --> 00:04:45,713
+line to separate the positive and the negative examples.
+它直接分开了正样本和负样本
+
+134
+00:04:45,713 --> 00:04:49,071
+And this doesn't look like a very good fit to the hypothesis.
+但这个模型并不能够很好的拟合数据
+
+135
+00:04:49,100 --> 00:04:50,659
+So, once again, this
+因此
+
+136
+00:04:50,659 --> 00:04:52,577
+is an example of underfitting
+这又是一个欠拟合的例子
+
+137
+00:04:52,577 --> 00:04:56,040
+or of the hypothesis having high bias.
+或者说假设模型具有高偏差
+
+138
+00:04:56,210 --> 00:04:57,504
+In contrast, if you were
+相比之下 如果
+
+139
+00:04:57,504 --> 00:04:59,146
+to add to your features
+如果再加入一些变量
+
+140
+00:04:59,170 --> 00:05:01,032
+these quadratic terms, then,
+比如这些二次项
+
+141
+00:05:01,032 --> 00:05:02,613
+you could get a decision
+那么你可以得到一个判定边界
+
+142
+00:05:02,613 --> 00:05:05,620
+boundary that might look more like this.
+像这样
+
+143
+00:05:05,620 --> 00:05:07,784
+And, you know, that's a pretty good fit to the data.
+这样就很好的拟合了数据
+
+144
+00:05:07,784 --> 00:05:10,838
+Probably, about as
+这很可能
+
+145
+00:05:10,860 --> 00:05:13,991
+good as we could get, on this training set.
+是训练集的最好拟合结果
+
+146
+00:05:14,010 --> 00:05:15,157
+And, finally, at the other
+最后
+
+147
+00:05:15,170 --> 00:05:16,169
+extreme, if you were to
+在另一种极端情况下
+
+148
+00:05:16,169 --> 00:05:18,207
+fit a very high-order polynomial, if
+如果你用高阶多项式来拟合数据
+
+149
+00:05:18,207 --> 00:05:20,036
+you were to generate lots of
+你加入了很多
+
+150
+00:05:20,036 --> 00:05:22,461
+high-order polynomial terms of speeches,
+高阶项
+
+151
+00:05:22,490 --> 00:05:24,730
+then, logistical regression may contort
+那么逻辑回归可能发生自身扭曲
+
+152
+00:05:24,750 --> 00:05:26,551
+itself, may try really
+它千方百计的
+
+153
+00:05:26,560 --> 00:05:28,233
+hard to find a
+形成这样一个
+
+154
+00:05:28,233 --> 00:05:31,742
+decision boundary that fits
+判定边界
+
+155
+00:05:31,742 --> 00:05:33,013
+your training data or go
+来拟合你的训练数据
+
+156
+00:05:33,030 --> 00:05:35,006
+to great lengths to contort itself,
+以至于成为一条扭曲的曲线
+
+157
+00:05:35,006 --> 00:05:37,689
+to fit every single training example well.
+使其能够拟合每一个训练集中的样本
+
+158
+00:05:37,700 --> 00:05:38,757
+And, you know, if the
+而且
+
+159
+00:05:38,757 --> 00:05:39,547
+features X1 and
+如果x1和x2
+
+160
+00:05:39,550 --> 00:05:41,435
+X2 offer predicting, maybe,
+能够预测
+
+161
+00:05:41,435 --> 00:05:43,350
+the cancer to the,
+癌症
+
+162
+00:05:43,390 --> 00:05:46,448
+you know, cancer is a malignant, benign breast tumors.
+你知道 癌症是一种恶性肿瘤 同时肿瘤也可能是良性
+
+163
+00:05:46,448 --> 00:05:47,988
+This doesn't, this really doesn't
+确实
+
+164
+00:05:47,988 --> 00:05:51,893
+look like a very good hypothesis, for making predictions.
+这个假设模型不是一个很好的预测
+
+165
+00:05:51,930 --> 00:05:53,463
+And so, once again, this is
+因此
+
+166
+00:05:53,463 --> 00:05:55,432
+an instance of overfitting
+这又是一个过拟合例子
+
+167
+00:05:55,432 --> 00:05:57,128
+and, of a hypothesis having
+是一个
+
+168
+00:05:57,128 --> 00:05:59,403
+high variance and not really,
+有高方差的假设模型
+
+169
+00:05:59,403 --> 00:06:04,243
+and, being unlikely to generalize well to new examples.
+并且不能够很好泛化到新样本
+
+170
+00:06:04,560 --> 00:06:06,158
+Later, in this course, when we
+在今后课程中
+
+171
+00:06:06,158 --> 00:06:08,453
+talk about debugging and diagnosing
+我们会讲到调试和诊断
+
+172
+00:06:08,460 --> 00:06:09,794
+things that can go wrong with
+诊断出导致学习算法故障的东西
+
+173
+00:06:09,810 --> 00:06:11,490
+learning algorithms, we'll give you
+我们告诉你如何用
+
+174
+00:06:11,490 --> 00:06:13,297
+specific tools to recognize
+专门的工具来识别
+
+175
+00:06:13,297 --> 00:06:14,953
+when overfitting and, also,
+过拟合
+
+176
+00:06:14,953 --> 00:06:17,503
+when underfitting may be occurring.
+和可能发生的欠拟合
+
+177
+00:06:17,503 --> 00:06:18,775
+But, for now, lets talk about
+但是 现在 让我们谈谈
+
+178
+00:06:18,780 --> 00:06:20,342
+the problem of, if we
+过拟合
+
+179
+00:06:20,360 --> 00:06:22,206
+think overfitting is occurring,
+的问题
+
+180
+00:06:22,250 --> 00:06:24,864
+what can we do to address it?
+我们怎么样解决呢
+
+181
+00:06:24,864 --> 00:06:26,640
+In the previous examples, we had
+在前面的例子中
+
+182
+00:06:26,660 --> 00:06:28,701
+one or two dimensional data so,
+当我们使用一维或二维数据时
+
+183
+00:06:28,701 --> 00:06:31,335
+we could just plot the hypothesis and see what was going
+我们可以通过绘出假设模型的图像来研究问题所在
+
+184
+00:06:31,335 --> 00:06:34,612
+on and select the appropriate degree polynomial.
+再选择合适的多项式来拟合数据
+
+185
+00:06:34,620 --> 00:06:36,836
+So, earlier for the housing
+因此 以之前的房屋价格为例
+
+186
+00:06:36,836 --> 00:06:38,405
+prices example, we could just
+我们可以
+
+187
+00:06:38,410 --> 00:06:40,597
+plot the hypothesis and, you
+绘制假设模型的图像
+
+188
+00:06:40,600 --> 00:06:41,628
+know, maybe see that it
+就能看到
+
+189
+00:06:41,628 --> 00:06:42,830
+was fitting the sort of
+模型的曲线
+
+190
+00:06:42,830 --> 00:06:46,339
+very wiggly function that goes all over the place to predict housing prices.
+非常扭曲并通过所有样本房价
+
+191
+00:06:46,339 --> 00:06:47,701
+And we could then use figures
+我们可以通过绘制这样的图形
+
+192
+00:06:47,740 --> 00:06:50,667
+like these to select an appropriate degree polynomial.
+来选择合适的多项式阶次
+
+193
+00:06:50,680 --> 00:06:54,166
+So plotting the hypothesis, could
+因此绘制假设模型曲线
+
+194
+00:06:54,166 --> 00:06:55,728
+be one way to try to
+可以作为决定多项式阶次
+
+195
+00:06:55,750 --> 00:06:58,160
+decide what degree polynomial to use.
+的一种方法
+
+196
+00:06:58,160 --> 00:07:00,163
+But that doesn't always work.
+但是这并不是总是有用的
+
+197
+00:07:00,180 --> 00:07:02,019
+And, in fact more often we
+而且事实上更多的时候我们
+
+198
+00:07:02,019 --> 00:07:06,075
+may have learning problems that where we just have a lot of features.
+会遇到有很多变量的假设模型
+
+199
+00:07:06,075 --> 00:07:07,563
+And there is not
+并且
+
+200
+00:07:07,563 --> 00:07:10,599
+just a matter of selecting what degree polynomial.
+这不仅仅是选择多项式阶次的问题
+
+201
+00:07:10,630 --> 00:07:12,147
+And, in fact, when we
+事实上 当我们
+
+202
+00:07:12,170 --> 00:07:13,779
+have so many features, it also
+有这么多的特征变量
+
+203
+00:07:13,779 --> 00:07:15,593
+becomes much harder to plot
+这也使得绘图变得更难
+
+204
+00:07:15,630 --> 00:07:17,698
+the data and it becomes
+并且
+
+205
+00:07:17,710 --> 00:07:19,211
+much harder to visualize it,
+更难使其可视化
+
+206
+00:07:19,211 --> 00:07:22,396
+to decide what features to keep or not.
+因此并不能通过这种方法决定保留哪些特征变量
+
+207
+00:07:22,420 --> 00:07:24,142
+So concretely, if we're trying
+具体地说 如果我们试图
+
+208
+00:07:24,160 --> 00:07:27,849
+predict housing prices sometimes we can just have a lot of different features.
+预测房价 同时又拥有这么多特征变量
+
+209
+00:07:27,880 --> 00:07:31,373
+And all of these features seem, you know, maybe they seem kind of useful.
+这些变量看上去都很有用
+
+210
+00:07:31,373 --> 00:07:32,609
+But, if we have a
+但是 如果我们有
+
+211
+00:07:32,609 --> 00:07:34,123
+lot of features, and, very little
+过多的变量 同时
+
+212
+00:07:34,123 --> 00:07:35,820
+training data, then, over
+只有非常少的训练数据
+
+213
+00:07:35,840 --> 00:07:37,776
+fitting can become a problem.
+就会出现过度拟合的问题
+
+214
+00:07:37,776 --> 00:07:39,180
+In order to address over
+为了解决过度拟合
+
+215
+00:07:39,180 --> 00:07:40,651
+fitting, there are two
+有两个办法
+
+216
+00:07:40,651 --> 00:07:43,780
+main options for things that we can do.
+来解决问题
+
+217
+00:07:43,780 --> 00:07:45,759
+The first option is, to try
+第一个办法是要尽量
+
+218
+00:07:45,770 --> 00:07:47,976
+to reduce the number of features.
+减少选取变量的数量
+
+219
+00:07:47,990 --> 00:07:49,337
+Concretely, one thing we
+具体而言
+
+220
+00:07:49,337 --> 00:07:51,383
+could do is manually look through
+我们可以人工检查
+
+221
+00:07:51,383 --> 00:07:53,236
+the list of features, and, use
+变量的条目
+
+222
+00:07:53,236 --> 00:07:54,894
+that to try to decide which
+并以此决定哪些变量更为重要
+
+223
+00:07:54,894 --> 00:07:57,256
+are the more important features, and, therefore,
+然后
+
+224
+00:07:57,256 --> 00:07:58,476
+which are the features we should
+决定保留哪些特征变量
+
+225
+00:07:58,476 --> 00:08:01,844
+keep, and, which are the features we should throw out.
+哪些应该舍弃
+
+226
+00:08:01,844 --> 00:08:03,401
+Later in this course, where also
+在今后的课程中
+
+227
+00:08:03,401 --> 00:08:06,018
+talk about model selection algorithms.
+我们会提到模型选择算法
+
+228
+00:08:06,040 --> 00:08:08,361
+Which are algorithms for automatically
+这种算法是为了自动选择
+
+229
+00:08:08,361 --> 00:08:09,788
+deciding which features
+采用哪些特征变量
+
+230
+00:08:09,800 --> 00:08:12,500
+to keep and, which features to throw out.
+自动舍弃不需要的变量
+
+231
+00:08:12,500 --> 00:08:13,987
+This idea of reducing the
+这种减少特征变量
+
+232
+00:08:13,987 --> 00:08:15,562
+number of features can work
+的做法是非常有效的
+
+233
+00:08:15,562 --> 00:08:17,853
+well, and, can reduce over fitting.
+并且可以减少过拟合的发生
+
+234
+00:08:17,853 --> 00:08:19,383
+And, when we talk about model
+当我们今后讲到模型选择时
+
+235
+00:08:19,383 --> 00:08:22,534
+selection, we'll go into this in much greater depth.
+我们将深入探讨这个问题
+
+236
+00:08:22,534 --> 00:08:24,386
+But, the disadvantage is that, by
+但是其缺点是
+
+237
+00:08:24,386 --> 00:08:25,603
+throwing away some of the
+舍弃一部分特征变量
+
+238
+00:08:25,603 --> 00:08:27,010
+features, is also throwing
+你也舍弃了
+
+239
+00:08:27,370 --> 00:08:30,615
+away some of the information you have about the problem.
+问题中的一些信息
+
+240
+00:08:30,650 --> 00:08:31,942
+For example, maybe, all of
+例如 也许所有的
+
+241
+00:08:31,942 --> 00:08:33,760
+those features are actually useful
+特征变量
+
+242
+00:08:33,780 --> 00:08:35,050
+for predicting the price of a
+对于预测房价都是有用的
+
+243
+00:08:35,070 --> 00:08:36,636
+house, so, maybe, we don't actually
+我们实际上并不想
+
+244
+00:08:36,640 --> 00:08:37,687
+want to throw some of
+舍弃一些信息
+
+245
+00:08:37,687 --> 00:08:40,990
+our information or throw some of our features away.
+或者舍弃这些特征变量
+
+246
+00:08:41,540 --> 00:08:44,515
+The second option, which we'll
+第二个选择
+
+247
+00:08:44,515 --> 00:08:45,995
+talk about in the
+我们将在接下来的视频中讨论
+
+248
+00:08:46,010 --> 00:08:49,268
+next few videos, is regularization.
+就是正则化
+
+249
+00:08:49,268 --> 00:08:50,390
+Here, we're going to keep
+正则化中我们将保留
+
+250
+00:08:50,390 --> 00:08:52,579
+all the features, but we're
+所有的特征变量
+
+251
+00:08:52,579 --> 00:08:55,063
+going to reduce the magnitude
+但是数量级
+
+252
+00:08:55,063 --> 00:08:56,506
+or the values of the parameters
+或参数数值的大小
+
+253
+00:08:56,520 --> 00:08:58,745
+theta J. And, this
+θ(j)
+
+254
+00:08:58,750 --> 00:09:00,690
+method works well, we'll see,
+这个方法非常有效
+
+255
+00:09:00,690 --> 00:09:01,925
+when we have a lot of
+当我们有很多特征变量时
+
+256
+00:09:01,925 --> 00:09:03,822
+features, each of which contributes
+其中每一个变量
+
+257
+00:09:03,822 --> 00:09:05,502
+a little bit to predicting
+都能对预测产生一点影响
+
+258
+00:09:05,502 --> 00:09:07,723
+the value of Y, like we
+y的值
+
+259
+00:09:07,740 --> 00:09:10,283
+saw in the housing price prediction example.
+正如我们在房价的例子中看到的那样
+
+260
+00:09:10,283 --> 00:09:11,413
+Where we could have a lot
+在那里我们可以有很多特征变量
+
+261
+00:09:11,413 --> 00:09:12,720
+of features, each of which
+其中每一个变量
+
+262
+00:09:12,750 --> 00:09:16,902
+are, you know, somewhat useful, so, maybe, we don't want to throw them away.
+都是有用的 因此我们不希望把它们删掉
+
+263
+00:09:16,930 --> 00:09:19,247
+So, this subscribes the
+这就导致了
+
+264
+00:09:19,250 --> 00:09:22,790
+idea of regularization at a very high level.
+正则化概念的发生
+
+265
+00:09:22,790 --> 00:09:24,354
+And, I realize that, all
+我知道
+
+266
+00:09:24,360 --> 00:09:26,763
+of these details probably don't make sense to you yet.
+这些东西你们现在可能还听不懂
+
+267
+00:09:26,763 --> 00:09:28,316
+But, in the next video, we'll
+但是在接下来的视频中
+
+268
+00:09:28,316 --> 00:09:30,960
+start to formulate exactly how
+我们将开始详细讲述
+
+269
+00:09:30,960 --> 00:09:35,117
+to apply regularization and, exactly what regularization means.
+怎样应用正则化和什么叫做正则化均值
+
+270
+00:09:35,140 --> 00:09:36,810
+And, then we'll start to
+然后我们将开始
+
+271
+00:09:36,810 --> 00:09:38,310
+figure out, how to use this,
+讲解怎样使用正则化
+
+272
+00:09:38,310 --> 00:09:40,412
+to make how learning algorithms work
+怎样使学习算法正常工作
+
+273
+00:09:40,412 --> 00:09:42,460
+well and avoid overfitting.
+并避免过拟合
+
diff --git a/srt/7 - 2 - Cost Function (10 min).srt b/srt/7 - 2 - Cost Function (10 min).srt
new file mode 100644
index 00000000..7fc7c9cf
--- /dev/null
+++ b/srt/7 - 2 - Cost Function (10 min).srt
@@ -0,0 +1,1441 @@
+1
+00:00:00,144 --> 00:00:02,011
+In this video, I'd like to
+在这段视频中 我想要
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,011 --> 00:00:03,990
+convey to you, the main intuitions
+传达给你一个直观的感受
+
+3
+00:00:03,990 --> 00:00:05,771
+behind how regularization works.
+告诉你正规化是如何进行的
+
+4
+00:00:05,771 --> 00:00:07,386
+And, we'll also write down
+而且 我们还要写出
+
+5
+00:00:07,386 --> 00:00:11,724
+the cost function that we'll use, when we were using regularization.
+我们使用正规化时 需要使用的代价函数
+
+6
+00:00:11,780 --> 00:00:13,327
+With the hand drawn examples that
+根据我们幻灯片上的
+
+7
+00:00:13,327 --> 00:00:14,916
+we have on these slides, I
+这些例子
+
+8
+00:00:14,950 --> 00:00:17,642
+think I'll be able to convey part of the intuition.
+我想我可以给你一个直观的感受
+
+9
+00:00:17,700 --> 00:00:19,608
+But, an even better
+但是 一个更好的
+
+10
+00:00:19,608 --> 00:00:21,192
+way to see for yourself, how
+让你自己去理解正规化
+
+11
+00:00:21,192 --> 00:00:22,643
+regularization works, is if
+如何工作的方法是
+
+12
+00:00:22,643 --> 00:00:25,869
+you implement it, and, see it work for yourself.
+你自己亲自去实现它 并且看看它是如何工作的
+
+13
+00:00:25,869 --> 00:00:26,888
+And, if you do the
+如果在这节课后
+
+14
+00:00:26,888 --> 00:00:28,603
+appropriate exercises after this,
+你进行一些适当的练习
+
+15
+00:00:28,603 --> 00:00:30,053
+you get the chance
+你就有机会亲自体验一下
+
+16
+00:00:30,053 --> 00:00:33,927
+to self see regularization in action for yourself.
+正规化到底是怎么工作的
+
+17
+00:00:33,930 --> 00:00:36,519
+So, here is the intuition.
+那么 这里就是一些直观解释
+
+18
+00:00:36,519 --> 00:00:38,233
+In the previous video, we saw
+在前面的视频中 我们看到了
+
+19
+00:00:38,233 --> 00:00:39,771
+that, if we were to fit
+如果说我们要
+
+20
+00:00:39,771 --> 00:00:41,420
+a quadratic function to this
+用一个二次函数来
+
+21
+00:00:41,420 --> 00:00:44,283
+data, it gives us a pretty good fit to the data.
+拟合这些数据 它给了我们一个对数据很好的拟合
+
+22
+00:00:44,283 --> 00:00:45,286
+Whereas, if we were to
+然而 如果我们
+
+23
+00:00:45,310 --> 00:00:47,175
+fit an overly high order
+用一个更高次的
+
+24
+00:00:47,210 --> 00:00:48,823
+degree polynomial, we end
+多项式去拟合 我们最终
+
+25
+00:00:48,850 --> 00:00:50,111
+up with a curve that may fit
+可能得到一个曲线
+
+26
+00:00:50,111 --> 00:00:51,760
+the training set very well, but,
+能非常好地拟合训练集 但是
+
+27
+00:00:51,760 --> 00:00:53,381
+really not be a,
+这真的不是一个好的结果
+
+28
+00:00:53,420 --> 00:00:54,497
+but overfit the data
+它过度拟合了数据
+
+29
+00:00:54,497 --> 00:00:57,225
+poorly, and, not generalize well.
+因此 一般性并不是很好
+
+30
+00:00:57,900 --> 00:01:00,453
+Consider the following, suppose we
+让我们考虑下面的假设
+
+31
+00:01:00,453 --> 00:01:02,088
+were to penalize, and, make
+我们想要加上惩罚项 从而使
+
+32
+00:01:02,088 --> 00:01:04,753
+the parameters theta 3 and theta 4 really small.
+参数 θ3 和 θ4 足够的小
+
+33
+00:01:04,753 --> 00:01:06,543
+Here's what I
+这里我的意思就是
+
+34
+00:01:06,543 --> 00:01:09,676
+mean, here is our optimization
+这是我们的优化目标
+
+35
+00:01:09,690 --> 00:01:10,859
+objective, or here is our
+或者客观的说 这就是我们需要
+
+36
+00:01:10,870 --> 00:01:12,574
+optimization problem, where we minimize
+优化的问题 我们需要尽量减少
+
+37
+00:01:12,580 --> 00:01:15,526
+our usual squared error cause function.
+代价函数的均方误差
+
+38
+00:01:15,526 --> 00:01:17,350
+Let's say I take this objective
+对于这个函数
+
+39
+00:01:17,370 --> 00:01:19,125
+and modify it and add
+我们对它进行一些 添加一些项
+
+40
+00:01:19,160 --> 00:01:23,291
+to it, plus 1000 theta
+加上 1000 乘以 θ3 的平方
+
+41
+00:01:23,291 --> 00:01:28,334
+3 squared, plus 1000 theta 4 squared.
+再加上 1000 乘以 θ4 的平方
+
+42
+00:01:28,334 --> 00:01:32,354
+1000 I am just writing down as some huge number.
+1000 只是我随便写的某个较大的数字而已
+
+43
+00:01:32,354 --> 00:01:33,538
+Now, if we were to
+现在 如果我们要
+
+44
+00:01:33,540 --> 00:01:35,127
+minimize this function, the
+最小化这个函数
+
+45
+00:01:35,140 --> 00:01:36,688
+only way to make this
+为了使这个
+
+46
+00:01:36,710 --> 00:01:38,620
+new cost function small is
+新的代价函数最小化
+
+47
+00:01:38,620 --> 00:01:40,769
+if theta 3 and data
+我们要让 θ3 和 θ4
+
+48
+00:01:40,769 --> 00:01:42,133
+4 are small, right?
+尽可能小 对吧?
+
+49
+00:01:42,133 --> 00:01:43,264
+Because otherwise, if you have
+因为 如果你有
+
+50
+00:01:43,264 --> 00:01:44,956
+a thousand times theta 3, this
+1000 乘以 θ3 这个
+
+51
+00:01:44,970 --> 00:01:48,103
+new cost functions gonna be big.
+新的代价函数将会是很大的
+
+52
+00:01:48,140 --> 00:01:49,245
+So when we minimize this
+所以 当我们最小化
+
+53
+00:01:49,245 --> 00:01:50,402
+new function we are going
+这个新的函数时 我们将使
+
+54
+00:01:50,402 --> 00:01:52,107
+to end up with theta 3
+θ3 的值
+
+55
+00:01:52,110 --> 00:01:53,776
+close to 0 and theta
+接近于0
+
+56
+00:01:53,776 --> 00:01:56,700
+4 close to 0, and as
+θ4 的值也接近于0
+
+57
+00:01:56,700 --> 00:01:59,691
+if we're getting rid
+就像我们忽略了
+
+58
+00:01:59,691 --> 00:02:03,206
+of these two terms over there.
+这两个值一样
+
+59
+00:02:03,710 --> 00:02:05,282
+And if we do that, well then,
+如果我们做到这一点
+
+60
+00:02:05,290 --> 00:02:06,783
+if theta 3 and theta 4
+如果 θ3 和 θ4
+
+61
+00:02:06,783 --> 00:02:07,973
+close to 0 then we are
+接近0 那么我们
+
+62
+00:02:07,973 --> 00:02:09,643
+being left with a quadratic function,
+将得到一个近似的二次函数
+
+63
+00:02:09,643 --> 00:02:11,089
+and, so, we end up with
+所以 我们最终
+
+64
+00:02:11,110 --> 00:02:13,343
+a fit to the data, that's, you know, quadratic
+恰当地拟合了数据 你知道
+
+65
+00:02:13,343 --> 00:02:15,463
+function plus maybe, tiny
+二次函数加上一些项
+
+66
+00:02:15,463 --> 00:02:17,856
+contributions from small terms,
+这些很小的项 贡献很小
+
+67
+00:02:17,860 --> 00:02:20,207
+theta 3, theta 4, that they may be very close to 0.
+因为 θ3 θ4 它们是非常接近于0的
+
+68
+00:02:20,207 --> 00:02:27,293
+And, so, we end up with
+所以 我们最终得到了
+
+69
+00:02:27,293 --> 00:02:29,386
+essentially, a quadratic function, which is good.
+实际上 很好的一个二次函数
+
+70
+00:02:29,386 --> 00:02:30,544
+Because this is a
+因为这是一个
+
+71
+00:02:30,544 --> 00:02:34,060
+much better hypothesis.
+更好的假设
+
+72
+00:02:34,104 --> 00:02:36,666
+In this particular example, we looked at the effect
+在这个具体的例子中 我们看到了
+
+73
+00:02:36,700 --> 00:02:39,023
+of penalizing two of
+惩罚这两个
+
+74
+00:02:39,023 --> 00:02:41,446
+the parameter values being large.
+大的参数值的效果
+
+75
+00:02:41,446 --> 00:02:46,510
+More generally, here is the idea behind regularization.
+更一般地 这里给出了正规化背后的思路
+
+76
+00:02:46,980 --> 00:02:48,924
+The idea is that, if we
+这种思路就是 如果我们
+
+77
+00:02:48,924 --> 00:02:50,303
+have small values for the
+的参数值
+
+78
+00:02:50,303 --> 00:02:53,083
+parameters, then, having
+对应一个较小值的话
+
+79
+00:02:53,083 --> 00:02:55,250
+small values for the parameters,
+就是说 参数值比较小
+
+80
+00:02:55,250 --> 00:02:57,866
+will somehow, will usually correspond
+那么往往我们会得到一个
+
+81
+00:02:57,866 --> 00:03:00,386
+to having a simpler hypothesis.
+形式更简单的假设
+
+82
+00:03:00,386 --> 00:03:02,279
+So, for our last example, we
+所以 我们最后一个例子中
+
+83
+00:03:02,279 --> 00:03:04,024
+penalize just theta 3 and
+我们惩罚的只是 θ3 和
+
+84
+00:03:04,024 --> 00:03:05,666
+theta 4 and when both
+θ4 使这两个
+
+85
+00:03:05,666 --> 00:03:07,046
+of these were close to zero,
+值均接近于零
+
+86
+00:03:07,046 --> 00:03:08,450
+we wound up with a much simpler
+我们得到了一个更简单的假设
+
+87
+00:03:08,480 --> 00:03:12,549
+hypothesis that was essentially a quadratic function.
+也即这个假设大抵上是一个二次函数
+
+88
+00:03:12,549 --> 00:03:13,991
+But more broadly, if we penalize all
+但更一般地说 如果我们就像这样
+
+89
+00:03:13,991 --> 00:03:15,989
+the parameters usually that, we
+惩罚的其它参数 通常我们
+
+90
+00:03:15,989 --> 00:03:17,416
+can think of that, as trying
+可以把它们都想成是
+
+91
+00:03:17,420 --> 00:03:19,076
+to give us a simpler hypothesis
+得到一个更简单的假设
+
+92
+00:03:19,110 --> 00:03:20,943
+as well because when, you
+因为你知道
+
+93
+00:03:20,943 --> 00:03:22,380
+know, these parameters are
+当这些参数越接近这个例子时
+
+94
+00:03:22,410 --> 00:03:23,700
+as close as you in this
+假设的结果越接近
+
+95
+00:03:23,700 --> 00:03:26,105
+example, that gave us a quadratic function.
+一个二次函数
+
+96
+00:03:26,105 --> 00:03:29,038
+But more generally, it is
+但更一般地
+
+97
+00:03:29,038 --> 00:03:30,493
+possible to show that having
+可以表明
+
+98
+00:03:30,530 --> 00:03:32,536
+smaller values of the parameters
+这些参数的值越小
+
+99
+00:03:32,540 --> 00:03:34,416
+corresponds to usually smoother
+通常对应于越光滑的函数
+
+100
+00:03:34,416 --> 00:03:36,780
+functions as well for the simpler.
+也就是更加简单的函数
+
+101
+00:03:36,780 --> 00:03:41,667
+And which are therefore, also, less prone to overfitting.
+因此 就不易发生过拟合的问题
+
+102
+00:03:41,680 --> 00:03:43,245
+I realize that the reasoning for
+我知道
+
+103
+00:03:43,245 --> 00:03:45,441
+why having all the parameters be small.
+为什么要所有的部分参数变小的这些原因
+
+104
+00:03:45,441 --> 00:03:46,944
+Why that corresponds to a simpler
+为什么越小的参数对应于一个简单的假设
+
+105
+00:03:46,960 --> 00:03:48,916
+hypothesis; I realize that
+我知道这些原因
+
+106
+00:03:48,916 --> 00:03:51,572
+reasoning may not be entirely clear to you right now.
+对你来说现在不一定完全理解
+
+107
+00:03:51,590 --> 00:03:52,784
+And it is kind of hard
+但现在解释起来确实比较困难
+
+108
+00:03:52,784 --> 00:03:54,477
+to explain unless you implement
+除非你自己实现一下
+
+109
+00:03:54,480 --> 00:03:56,446
+yourself and see it for yourself.
+自己亲自运行了这部分
+
+110
+00:03:56,470 --> 00:03:58,247
+But I hope that the example of
+但是我希望 这个例子中
+
+111
+00:03:58,247 --> 00:03:59,610
+having theta 3 and theta
+使 θ3 和 θ4
+
+112
+00:03:59,650 --> 00:04:01,230
+4 be small and how
+很小 并且这样做
+
+113
+00:04:01,230 --> 00:04:02,535
+that gave us a simpler
+能给我们一个更加简单的
+
+114
+00:04:02,540 --> 00:04:04,776
+hypothesis, I hope that
+假设 我希望这个例子
+
+115
+00:04:04,800 --> 00:04:06,314
+helps explain why, at least give
+有助于解释原因 至少给了
+
+116
+00:04:06,330 --> 00:04:09,320
+some intuition as to why this might be true.
+我们一些直观感受 为什么这应该是这样的
+
+117
+00:04:09,320 --> 00:04:11,476
+Lets look at the specific example.
+来让我们看看具体的例子
+
+118
+00:04:12,010 --> 00:04:13,873
+For housing price prediction we
+对于房屋价格预测我们
+
+119
+00:04:13,873 --> 00:04:15,465
+may have our hundred features
+可能有上百种特征
+
+120
+00:04:15,480 --> 00:04:17,223
+that we talked about where may
+我们谈到了一些可能的特征
+
+121
+00:04:17,250 --> 00:04:18,756
+be x1 is the size, x2
+比如说 x1 是房屋的尺寸
+
+122
+00:04:18,756 --> 00:04:20,096
+is the number of bedrooms, x3
+x2 是卧室的数目
+
+123
+00:04:20,096 --> 00:04:21,963
+is the number of floors and so on.
+x3 是房屋的层数等等
+
+124
+00:04:21,963 --> 00:04:24,502
+And we may we may have a hundred features.
+那么我们可能就有一百个特征
+
+125
+00:04:24,502 --> 00:04:26,896
+And unlike the polynomial
+跟前面的多项式例子不同
+
+126
+00:04:26,920 --> 00:04:28,459
+example, we don't know, right,
+我们是不知道的 对吧
+
+127
+00:04:28,460 --> 00:04:29,826
+we don't know that theta 3,
+我们不知道 θ3
+
+128
+00:04:29,826 --> 00:04:32,641
+theta 4, are the high order polynomial terms.
+θ4 是高阶多项式的项
+
+129
+00:04:32,641 --> 00:04:34,515
+So, if we have just a
+所以 如果我们有一个袋子
+
+130
+00:04:34,540 --> 00:04:35,863
+bag, if we have just a
+如果我们有一百个特征
+
+131
+00:04:35,863 --> 00:04:38,074
+set of a hundred features, it's hard
+在这个袋子里 我们是很难
+
+132
+00:04:38,100 --> 00:04:40,210
+to pick in advance which are
+提前选出那些
+
+133
+00:04:40,260 --> 00:04:42,729
+the ones that are less likely to be relevant.
+关联度更小的特征的
+
+134
+00:04:42,729 --> 00:04:45,773
+So we have a hundred or a hundred one parameters.
+也就是说如果我们有一百或一百零一个参数
+
+135
+00:04:45,780 --> 00:04:47,340
+And we don't know which
+我们不知道
+
+136
+00:04:47,340 --> 00:04:48,987
+ones to pick, we
+挑选哪一个
+
+137
+00:04:49,010 --> 00:04:50,445
+don't know which
+我们并不知道
+
+138
+00:04:50,450 --> 00:04:54,272
+parameters to try to pick, to try to shrink.
+如何选择参数 如何缩小参数的数目
+
+139
+00:04:54,430 --> 00:04:56,237
+So, in regularization, what we're
+因此在正规化里
+
+140
+00:04:56,237 --> 00:04:58,438
+going to do, is take our
+我们要做的事情 就是把我们的
+
+141
+00:04:58,438 --> 00:05:01,213
+cost function, here's my cost function for linear regression.
+代价函数 这里就是线性回归的代价函数
+
+142
+00:05:01,213 --> 00:05:02,656
+And what I'm going to do
+我现在要做的就是
+
+143
+00:05:02,660 --> 00:05:04,326
+is, modify this cost
+来修改这个代价函数
+
+144
+00:05:04,340 --> 00:05:06,246
+function to shrink all
+从而缩小
+
+145
+00:05:06,270 --> 00:05:07,643
+of my parameters, because, you know,
+我所有的参数值 因为你知道
+
+146
+00:05:07,643 --> 00:05:09,059
+I don't know which
+我不知道是哪个
+
+147
+00:05:09,059 --> 00:05:10,440
+one or two to try to shrink.
+哪一个或两个要去缩小
+
+148
+00:05:10,440 --> 00:05:11,690
+So I am going to modify my
+所以我就修改我的
+
+149
+00:05:11,690 --> 00:05:16,732
+cost function to add a term at the end.
+代价函数 在这后面添加一项
+
+150
+00:05:17,390 --> 00:05:20,436
+Like so we have square brackets here as well.
+就像我们在方括号里的这项
+
+151
+00:05:20,440 --> 00:05:22,212
+When I add an extra
+当我添加一个额外的
+
+152
+00:05:22,212 --> 00:05:23,516
+regularization term at the
+正则化项的时候
+
+153
+00:05:23,530 --> 00:05:25,510
+end to shrink every
+我们收缩了每个
+
+154
+00:05:25,560 --> 00:05:27,286
+single parameter and so this
+参数 并且因此
+
+155
+00:05:27,320 --> 00:05:28,745
+term we tend to shrink
+我们会使
+
+156
+00:05:28,760 --> 00:05:30,747
+all of my parameters theta 1,
+我们所有的参数 θ1
+
+157
+00:05:30,747 --> 00:05:32,746
+theta 2, theta 3 up
+θ2 θ3
+
+158
+00:05:32,746 --> 00:05:35,490
+to theta 100.
+直到 θ100 的值变小
+
+159
+00:05:36,790 --> 00:05:39,629
+By the way, by convention the summation
+顺便说一下 按照惯例来讲
+
+160
+00:05:39,629 --> 00:05:41,007
+here starts from one so I
+我们从第一个这里开始
+
+161
+00:05:41,007 --> 00:05:43,341
+am not actually going penalize theta
+所以我实际上没有去惩罚 θ0
+
+162
+00:05:43,360 --> 00:05:45,416
+zero being large.
+因此 θ0 的值是大的
+
+163
+00:05:45,470 --> 00:05:46,435
+That sort of the convention that,
+这就是一个约定
+
+164
+00:05:46,435 --> 00:05:48,664
+the sum I equals one through
+从1到 n 的求和
+
+165
+00:05:48,664 --> 00:05:50,185
+N, rather than I equals zero
+而不是从0到 n 的求和
+
+166
+00:05:50,190 --> 00:05:51,953
+through N. But in practice,
+但其实在实践中
+
+167
+00:05:51,960 --> 00:05:53,464
+it makes very little difference, and,
+这只会有非常小的差异
+
+168
+00:05:53,490 --> 00:05:54,788
+whether you include, you know,
+无论你是否包括这项
+
+169
+00:05:54,788 --> 00:05:56,221
+theta zero or not, in
+就是 θ0 这项
+
+170
+00:05:56,221 --> 00:05:59,532
+practice, make very little difference to the results.
+实际上 结果只有非常小的差异
+
+171
+00:05:59,540 --> 00:06:01,804
+But by convention, usually, we regularize
+但是按照惯例 通常情况下我们还是只
+
+172
+00:06:01,804 --> 00:06:03,356
+only theta through theta
+从 θ1 到 θ100 进行正规化
+
+173
+00:06:03,360 --> 00:06:06,084
+100. Writing down
+这里我们写下来
+
+174
+00:06:06,084 --> 00:06:08,978
+our regularized optimization objective,
+我们的正规化优化目标
+
+175
+00:06:08,978 --> 00:06:10,655
+our regularized cost function again.
+我们的正规化后的代价函数
+
+176
+00:06:10,655 --> 00:06:11,718
+Here it is. Here's J of
+就是这样的
+
+177
+00:06:11,718 --> 00:06:13,903
+theta where, this term
+J(θ) 这个项
+
+178
+00:06:13,970 --> 00:06:15,863
+on the right is a regularization
+右边的这项就是一个正则化项
+
+179
+00:06:15,863 --> 00:06:17,548
+term and lambda
+并且 λ
+
+180
+00:06:17,570 --> 00:06:23,950
+here is called the regularization parameter and
+在这里我们称做正规化参数
+
+181
+00:06:23,973 --> 00:06:26,334
+what lambda does, is it
+λ 要做的就是控制
+
+182
+00:06:26,334 --> 00:06:28,480
+controls a trade off
+在两个不同的目标中
+
+183
+00:06:28,510 --> 00:06:30,636
+between two different goals.
+的一个平衡关系
+
+184
+00:06:30,636 --> 00:06:32,478
+The first goal, capture it
+第一个目标
+
+185
+00:06:32,500 --> 00:06:34,399
+by the first goal objective, is
+第一个需要抓住的目标
+
+186
+00:06:34,399 --> 00:06:36,081
+that we would like to train,
+就是我们想要训练
+
+187
+00:06:36,090 --> 00:06:38,350
+is that we would like to fit the training data well.
+使假设更好地拟合训练数据
+
+188
+00:06:38,390 --> 00:06:41,083
+We would like to fit the training set well.
+我们希望假设能够很好的适应训练集
+
+189
+00:06:41,083 --> 00:06:42,954
+And the second goal is,
+而第二个目标是
+
+190
+00:06:42,954 --> 00:06:44,474
+we want to keep the parameters
+我们想要保持参数值较小
+
+191
+00:06:44,474 --> 00:06:46,053
+small, and that's captured by
+这就是第二项的目标
+
+192
+00:06:46,060 --> 00:06:49,103
+the second term, by the regularization objective. And by the regularization term.
+通过正则化目标函数
+
+193
+00:06:49,103 --> 00:06:53,583
+And what lambda, the regularization
+这就是λ 这个正则化
+
+194
+00:06:53,583 --> 00:06:55,937
+parameter does is the controls the trade of
+参数需要控制的
+
+195
+00:06:55,937 --> 00:06:57,694
+between these two
+它会这两者之间的平衡
+
+196
+00:06:57,694 --> 00:06:58,938
+goals, between the goal of fitting the training set well
+目标就是平衡拟合训练的目的
+
+197
+00:06:58,960 --> 00:07:00,562
+and the
+和
+
+198
+00:07:00,562 --> 00:07:02,043
+goal of keeping the parameter plan
+保持参数值较小的目的
+
+199
+00:07:02,080 --> 00:07:05,688
+small and therefore keeping the hypothesis relatively
+从而来保持假设的形式相对简单
+
+200
+00:07:05,688 --> 00:07:09,134
+simple to avoid overfitting.
+来避免过度的拟合
+
+201
+00:07:09,290 --> 00:07:11,026
+For our housing price prediction
+对于我们的房屋价格预测来说
+
+202
+00:07:11,030 --> 00:07:13,026
+example, whereas, previously, if
+这个例子 尽管我们之前有
+
+203
+00:07:13,030 --> 00:07:14,256
+we had fit a very high
+我们已经用非常高的
+
+204
+00:07:14,256 --> 00:07:15,968
+order polynomial, we may
+高阶多项式来拟合 我们将会
+
+205
+00:07:15,968 --> 00:07:17,461
+have wound up with a very,
+得到一个
+
+206
+00:07:17,480 --> 00:07:19,020
+sort of wiggly or curvy function like
+非常弯曲和复杂的曲线函数
+
+207
+00:07:19,020 --> 00:07:22,460
+this. If you still fit a high order polynomial
+就像这个 如果你还是用高阶多项式拟合
+
+208
+00:07:22,460 --> 00:07:24,120
+with all the polynomial
+就是用这里所有的多项式特征来拟合的话
+
+209
+00:07:24,120 --> 00:07:26,038
+features in there, but instead,
+但现在我们不这样了
+
+210
+00:07:26,038 --> 00:07:27,956
+you just make sure, to use
+你只需要确保使用了
+
+211
+00:07:27,970 --> 00:07:30,798
+this sole of regularized objective, then what
+正规化目标的方法
+
+212
+00:07:30,798 --> 00:07:32,272
+you can get out is in
+那么你就可以得到
+
+213
+00:07:32,272 --> 00:07:34,332
+fact a curve that isn't
+实际上是一个曲线 但这个曲线不是
+
+214
+00:07:34,340 --> 00:07:36,465
+quite a quadratic function, but is
+一个真正的二次函数
+
+215
+00:07:36,490 --> 00:07:38,510
+much smoother and much simpler
+而是更加的流畅和简单
+
+216
+00:07:38,510 --> 00:07:39,870
+and maybe a curve like the magenta
+也许就像这条紫红色的曲线一样
+
+217
+00:07:39,870 --> 00:07:42,261
+line that, you know, gives a
+那么 你知道的
+
+218
+00:07:42,261 --> 00:07:45,445
+much better hypothesis for this data.
+这样就得到了对于这个数据更好的假设
+
+219
+00:07:45,445 --> 00:07:46,613
+Once again, I realize
+再一次说明下
+
+220
+00:07:46,613 --> 00:07:47,919
+it can be a bit difficult to see why strengthening the
+我了解这部分有点难以明白 为什么加上
+
+221
+00:07:47,919 --> 00:07:50,064
+parameters can have
+参数的影响可以具有
+
+222
+00:07:50,064 --> 00:07:51,668
+this effect, but if you
+这种效果 但如果你
+
+223
+00:07:51,690 --> 00:07:54,584
+implement yourselves with regularization
+亲自实现了正规化
+
+224
+00:07:54,650 --> 00:07:56,063
+you will be able to see
+你将能够看到
+
+225
+00:07:56,090 --> 00:07:58,859
+this effect firsthand.
+这种影响的最直观的感受
+
+226
+00:08:00,620 --> 00:08:02,777
+In regularized linear regression, if
+在正规化线性回归中 如果
+
+227
+00:08:02,777 --> 00:08:05,748
+the regularization parameter monitor
+正规化参数值
+
+228
+00:08:05,748 --> 00:08:07,669
+is set to be very large,
+被设定为非常大
+
+229
+00:08:07,669 --> 00:08:09,542
+then what will happen is
+那么将会发生什么呢?
+
+230
+00:08:09,542 --> 00:08:11,698
+we will end up penalizing the
+我们将会非常大地惩罚
+
+231
+00:08:11,698 --> 00:08:13,513
+parameters theta 1, theta
+参数θ1 θ2
+
+232
+00:08:13,520 --> 00:08:15,207
+2, theta 3, theta
+θ3 θ4
+
+233
+00:08:15,230 --> 00:08:17,409
+4 very highly.
+也就是说
+
+234
+00:08:17,430 --> 00:08:21,916
+That is, if our hypothesis is this is one down at the bottom.
+如果我们的假设是底下的这个
+
+235
+00:08:21,930 --> 00:08:23,674
+And if we end up penalizing
+如果我们最终惩罚
+
+236
+00:08:23,674 --> 00:08:24,913
+theta 1, theta 2, theta
+θ1 θ2 θ3
+
+237
+00:08:24,990 --> 00:08:26,145
+3, theta 4 very heavily, then we
+θ4 在一个非常大的程度 那么我们
+
+238
+00:08:26,145 --> 00:08:29,463
+end up with all of these parameters close to zero, right?
+会使所有这些参数接近于零的 对不对?
+
+239
+00:08:29,463 --> 00:08:32,240
+Theta 1 will be close to zero; theta 2 will be close to zero.
+θ1 将接近零 θ2 将接近零
+
+240
+00:08:32,240 --> 00:08:34,410
+Theta three and theta four
+θ3 和 θ4
+
+241
+00:08:34,410 --> 00:08:36,646
+will end up being close to zero.
+最终也会接近于零
+
+242
+00:08:36,646 --> 00:08:37,810
+And if we do that, it's as
+如果我们这么做 那么就是
+
+243
+00:08:37,810 --> 00:08:39,143
+if we're getting rid of these
+我们的假设中
+
+244
+00:08:39,160 --> 00:08:41,189
+terms in the hypothesis so that
+相当于去掉了这些项 并且使
+
+245
+00:08:41,189 --> 00:08:43,597
+we're just left with a hypothesis
+我们只是留下了一个简单的假设
+
+246
+00:08:43,597 --> 00:08:44,224
+that will say that.
+这个假设只能表明
+
+247
+00:08:44,230 --> 00:08:46,020
+It says that, well, housing
+那就是 房屋价格
+
+248
+00:08:46,020 --> 00:08:48,624
+prices are equal to theta zero,
+就等于 θ0 的值
+
+249
+00:08:48,650 --> 00:08:50,830
+and that is akin to fitting
+那就是类似于拟合了
+
+250
+00:08:50,830 --> 00:08:54,679
+a flat horizontal straight line to the data.
+一条水平直线 对于数据来说
+
+251
+00:08:54,679 --> 00:08:56,533
+And this is an
+这就是一个
+
+252
+00:08:56,570 --> 00:08:58,773
+example of underfitting, and
+欠拟合 (underfitting)
+
+253
+00:08:58,773 --> 00:09:00,926
+in particular this hypothesis, this
+这种情况下这一假设
+
+254
+00:09:00,950 --> 00:09:02,552
+straight line it just fails
+它是条失败的直线
+
+255
+00:09:02,570 --> 00:09:04,063
+to fit the training set
+对于训练集来说
+
+256
+00:09:04,070 --> 00:09:05,423
+well. It's just a fat straight
+这只是一条平滑直线
+
+257
+00:09:05,423 --> 00:09:07,173
+line, it doesn't go, you know, go near.
+它没有任何趋势
+
+258
+00:09:07,173 --> 00:09:10,432
+It doesn't go anywhere near most of the training examples.
+它不会去趋向大部分训练样本的任何值
+
+259
+00:09:10,432 --> 00:09:11,592
+And another way of saying this
+这句话的另??一种方式来表达就是
+
+260
+00:09:11,592 --> 00:09:13,697
+is that this hypothesis has
+这种假设有
+
+261
+00:09:13,720 --> 00:09:15,410
+too strong a preconception or
+过于强烈的"偏见" 或者
+
+262
+00:09:15,450 --> 00:09:17,091
+too high bias that housing
+过高的偏差 (bais)
+
+263
+00:09:17,120 --> 00:09:18,446
+prices are just equal
+认为预测的价格只是
+
+264
+00:09:18,460 --> 00:09:20,183
+to theta zero, and despite
+等于 θ0 并且
+
+265
+00:09:20,230 --> 00:09:22,123
+the clear data to the contrary,
+尽管我们的数据集
+
+266
+00:09:22,123 --> 00:09:23,207
+you know chooses to fit a sort
+选择去拟合一条
+
+267
+00:09:23,207 --> 00:09:25,648
+of, flat line, just a
+扁平的直线 仅仅是一条
+
+268
+00:09:25,650 --> 00:09:28,230
+flat horizontal line. I didn't draw that very well.
+扁平的水平线 我画得不好
+
+269
+00:09:28,230 --> 00:09:30,447
+This just a horizontal flat line
+对于数据来说
+
+270
+00:09:30,447 --> 00:09:33,059
+to the data. So for
+这只是一条水平线 因此
+
+271
+00:09:33,060 --> 00:09:35,626
+regularization to work well, some
+为了使正则化运作良好
+
+272
+00:09:35,626 --> 00:09:37,835
+care should be taken,
+我们应当注意一些方面
+
+273
+00:09:37,850 --> 00:09:39,903
+to choose a good choice for
+应该去选择一个不错的
+
+274
+00:09:39,903 --> 00:09:42,991
+the regularization parameter lambda as well.
+正则化参数 λ
+
+275
+00:09:42,991 --> 00:09:44,908
+And when we talk about multi-selection
+并且当我们以后讲到多重选择时
+
+276
+00:09:44,920 --> 00:09:46,717
+later in this course, we'll talk
+在后面的课程中 我们将讨论
+
+277
+00:09:46,717 --> 00:09:48,413
+about a way, a variety
+一种方法
+
+278
+00:09:48,420 --> 00:09:50,803
+of ways for automatically choosing
+一系列的方法来自动选择
+
+279
+00:09:50,810 --> 00:09:54,833
+the regularization parameter lambda as well. So, that's
+正则化参数 λ 所以
+
+280
+00:09:54,833 --> 00:09:56,570
+the idea of the high regularization
+这就是高度正则化的思路
+
+281
+00:09:56,570 --> 00:09:58,254
+and the cost function reviews in
+回顾一下代价函数
+
+282
+00:09:58,254 --> 00:10:00,454
+order to use regularization In the
+为了使用正则化
+
+283
+00:10:00,454 --> 00:10:01,885
+next two videos, lets take
+在接下来的两段视频中 让我们
+
+284
+00:10:01,885 --> 00:10:03,736
+these ideas and apply them
+把这些概念 应用到
+
+285
+00:10:03,750 --> 00:10:05,440
+to linear regression and to
+到线性回归和
+
+286
+00:10:05,440 --> 00:10:07,111
+logistic regression, so that
+逻辑回归中去
+
+287
+00:10:07,111 --> 00:10:09,020
+we can then get them to
+那么我们就可以让他们
+
+288
+00:10:09,060 --> 00:10:10,982
+avoid overfitting.
+避免过度拟合了
+
diff --git a/srt/7 - 3 - Regularized Linear Regression (11 min).srt b/srt/7 - 3 - Regularized Linear Regression (11 min).srt
new file mode 100644
index 00000000..0aada746
--- /dev/null
+++ b/srt/7 - 3 - Regularized Linear Regression (11 min).srt
@@ -0,0 +1,1481 @@
+1
+00:00:00,260 --> 00:00:01,490
+For linear regression, we had
+对于线性回归的求解 我们之前
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,680 --> 00:00:03,130
+previously worked out two learning
+推导了两种学习算法
+
+3
+00:00:03,490 --> 00:00:05,010
+algorithms, one based on
+一种基于梯度下降
+
+4
+00:00:05,180 --> 00:00:07,650
+gradient descent and one based on the normal equation.
+一种基于正规方程
+
+5
+00:00:08,750 --> 00:00:09,740
+In this video we will take
+在这段视频中 我们将继续学习
+
+6
+00:00:09,890 --> 00:00:11,640
+those two algorithms and generalize
+这两个算法 并把它们推广
+
+7
+00:00:12,290 --> 00:00:13,380
+them to the case of regularized
+到正则化线性回归中去
+
+8
+00:00:14,330 --> 00:00:17,640
+linear regression. Here's the
+这是我们上节课推导出的
+
+9
+00:00:18,100 --> 00:00:19,540
+optimization objective, that we
+正则化线性回归的
+
+10
+00:00:20,200 --> 00:00:22,380
+came up with last time for regularized linear regression.
+优化目标
+
+11
+00:00:23,360 --> 00:00:24,580
+This first part is our
+前面的第一部分是
+
+12
+00:00:24,980 --> 00:00:27,240
+usual, objective for linear regression,
+一般线性回归的目标函数
+
+13
+00:00:28,170 --> 00:00:29,300
+and we now have this additional
+而现在我们有这个额外的
+
+14
+00:00:30,200 --> 00:00:31,750
+regularization term, where londer
+正则化项 其中 λ
+
+15
+00:00:32,450 --> 00:00:34,960
+is our regularization parameter, and
+是正则化参数
+
+16
+00:00:35,220 --> 00:00:36,690
+we like to find parameters theta,
+我们想找到参数 θ
+
+17
+00:00:37,160 --> 00:00:38,550
+that minimizes this cost function,
+能最小化代价函数
+
+18
+00:00:39,030 --> 00:00:41,280
+this regularized cost function, J of theta.
+即这个正则化代价函??数 J(θ)
+
+19
+00:00:41,840 --> 00:00:43,030
+Previously, we were using
+之前 我们使用
+
+20
+00:00:43,440 --> 00:00:45,180
+gradient descent for the original
+梯度下降求解原来
+
+21
+00:00:46,620 --> 00:00:48,060
+cost function, without the regularization
+没有正则项的代价函数
+
+22
+00:00:48,770 --> 00:00:49,820
+term, and we had
+我们用
+
+23
+00:00:50,060 --> 00:00:51,990
+the following algorithm for regular
+下面的算法求解常规的
+
+24
+00:00:52,370 --> 00:00:53,620
+linear regression, without regularization.
+没有正则项的线性回归
+
+25
+00:00:54,660 --> 00:00:56,260
+We will repeatedly update the
+我们会如此反复更新
+
+26
+00:00:56,330 --> 00:00:57,670
+parameters theta J as follows
+参数 θj
+
+27
+00:00:58,270 --> 00:01:00,030
+for J equals 1,2 up
+其中 j=0, 1, 2...n
+
+28
+00:01:00,400 --> 00:01:02,110
+through n. Let me
+让我
+
+29
+00:01:02,530 --> 00:01:03,960
+take this and just write
+照这个把
+
+30
+00:01:04,240 --> 00:01:06,580
+the case for theta zero separately.
+j=0 即 θ0 的情况单独写出来
+
+31
+00:01:07,210 --> 00:01:08,400
+So, you know, I'm just gonna
+我只是把
+
+32
+00:01:08,720 --> 00:01:09,900
+write the update for theta
+θ0 的更新
+
+33
+00:01:10,160 --> 00:01:12,500
+zero separately, then for
+分离出来
+
+34
+00:01:12,680 --> 00:01:14,380
+the update for the parameters
+剩下的这些参数θ1, θ2 到θn的更新
+
+35
+00:01:14,780 --> 00:01:17,090
+1, 2, 3, and so on up
+作为另一部分
+
+36
+00:01:17,370 --> 00:01:19,760
+to n. So, I haven't changed anything yet, right?
+所以 这样做其实没有什么变化 对吧?
+
+37
+00:01:19,970 --> 00:01:21,070
+This is just writing the update
+这只是把 θ0 的更新
+
+38
+00:01:21,300 --> 00:01:23,300
+for theta zero separately from the
+这只是把 θ0 的更新
+
+39
+00:01:23,550 --> 00:01:25,240
+updates from theta 1, theta
+和 θ1 θ2 到 θn 的更新分离开来
+
+40
+00:01:25,510 --> 00:01:26,980
+2, theta 3, up to theta n. And
+和 θ1 θ2 到 θn 的更新分离开来
+
+41
+00:01:27,040 --> 00:01:27,900
+the reason I want to do this
+我这样做的原因是
+
+42
+00:01:28,230 --> 00:01:29,320
+is you may remember
+你可能还记得
+
+43
+00:01:29,880 --> 00:01:31,260
+that for our regularized linear regression,
+对于正则化的线性回归
+
+44
+00:01:32,620 --> 00:01:33,970
+we penalize the parameters theta
+我们惩罚参数θ1
+
+45
+00:01:34,440 --> 00:01:35,540
+1, theta 2, and so
+θ2...一直到
+
+46
+00:01:35,860 --> 00:01:38,360
+on up to theta n, but we don't penalize theta zero.
+θn 但是我们不惩罚θ0
+
+47
+00:01:38,820 --> 00:01:40,250
+So when we modify this
+所以 当我们修改这个
+
+48
+00:01:40,410 --> 00:01:42,400
+algorithm for regularized
+正则化线性回归的算法时
+
+49
+00:01:42,750 --> 00:01:44,050
+linear regression, we're going to
+我们将对
+
+50
+00:01:44,710 --> 00:01:46,870
+end up treating theta zero slightly differently.
+θ0 的方式将有所不同
+
+51
+00:01:48,560 --> 00:01:50,360
+Concretely, if we
+具体地说 如果我们
+
+52
+00:01:50,500 --> 00:01:52,170
+want to take this algorithm and
+要对这个算法进行
+
+53
+00:01:52,300 --> 00:01:53,780
+modify it to use the
+修改 并用它
+
+54
+00:01:53,870 --> 00:01:55,630
+regularized objective, all we
+求解正则化的目标函数 我们
+
+55
+00:01:55,740 --> 00:01:57,170
+need to do is take this
+需要做的是
+
+56
+00:01:57,350 --> 00:02:00,010
+term at the bottom and modify as follows.
+把下边的这一项做如下的修改
+
+57
+00:02:00,460 --> 00:02:01,860
+We're gonna take this term and add
+我们要在这一项上添加一项:
+
+58
+00:02:02,670 --> 00:02:05,310
+minus londer M,
+λ 除以 m
+
+59
+00:02:06,330 --> 00:02:08,920
+times theta J. And
+再乘以 θj
+
+60
+00:02:09,100 --> 00:02:10,850
+if you implement this, then you
+如果这样做的话 那么你就有了
+
+61
+00:02:11,000 --> 00:02:13,230
+have gradient descent for trying
+用于最小化
+
+62
+00:02:13,960 --> 00:02:15,920
+to minimize the regularized cost
+正则化代价函数 J(θ)
+
+63
+00:02:16,160 --> 00:02:18,200
+function J of F theta, and concretely,
+的梯度下降算法
+
+64
+00:02:19,520 --> 00:02:20,570
+I'm not gonna do the
+我不打算用
+
+65
+00:02:20,680 --> 00:02:22,260
+calculus to prove it, but
+微积分来证明这一点
+
+66
+00:02:22,390 --> 00:02:23,480
+concretely if you look
+但如果你看这一项
+
+67
+00:02:23,690 --> 00:02:26,580
+at this term, this term that's written is square brackets.
+方括号里的这一项
+
+68
+00:02:27,730 --> 00:02:28,930
+If you know calculus, it's possible
+如果你知道微积分
+
+69
+00:02:29,380 --> 00:02:31,150
+to prove that that term is
+应该不难证明它是
+
+70
+00:02:31,370 --> 00:02:33,150
+the partial derivative, with respect of
+J(θ) 对 θj 的偏导数
+
+71
+00:02:33,980 --> 00:02:35,400
+J of theta, using the new
+这里的 J(θ) 是用的新定义的形式
+
+72
+00:02:35,660 --> 00:02:37,520
+definition of J of theta
+它的定义中
+
+73
+00:02:38,140 --> 00:02:39,330
+with the regularization term.
+包含正则化项
+
+74
+00:02:39,510 --> 00:02:42,490
+And somebody on this
+而另一项
+
+75
+00:02:42,760 --> 00:02:43,960
+term up on top,
+上面的这一项
+
+76
+00:02:44,750 --> 00:02:45,570
+which I guess I am
+我用青色的方框
+
+77
+00:02:45,680 --> 00:02:47,240
+drawing the salient box
+圈出来的这一项
+
+78
+00:02:48,000 --> 00:02:49,270
+that's still the partial derivative
+这也一个是偏导数
+
+79
+00:02:49,940 --> 00:02:52,700
+respect of theta zero for J of theta.
+是 J(θ)对 θ0 的偏导数
+
+80
+00:02:53,680 --> 00:02:54,900
+If you look at the update for
+如果你仔细看 θj 的更新
+
+81
+00:02:55,600 --> 00:02:56,710
+theta J, it's possible to
+你会发现一些
+
+82
+00:02:56,910 --> 00:02:59,190
+show's something pretty interesting, concretely
+有趣的东西 具体来说
+
+83
+00:02:59,860 --> 00:03:01,100
+theta J gets updated as
+θj 的每次更新
+
+84
+00:03:01,280 --> 00:03:03,400
+theta J, minus alpha times,
+都是 θj 自己减去 α 乘以原来的无正则项
+
+85
+00:03:04,090 --> 00:03:05,010
+and then you have this other term
+然后还有这另外的一项
+
+86
+00:03:05,380 --> 00:03:06,730
+here that depends on theta J
+这一项的大小也取决于 θj
+
+87
+00:03:06,910 --> 00:03:08,310
+. So if you
+所以 如果你
+
+88
+00:03:08,420 --> 00:03:09,410
+group all the terms together
+把所有这些
+
+89
+00:03:10,030 --> 00:03:11,690
+that depending on theta J. We
+取决于 θj 的合在一起的话
+
+90
+00:03:11,780 --> 00:03:13,190
+can show that this update can
+可以证明 这个更新
+
+91
+00:03:13,670 --> 00:03:15,100
+be written equivalently as
+可以等价地写为
+
+92
+00:03:15,200 --> 00:03:16,160
+follows and all I did
+如下的形式
+
+93
+00:03:16,470 --> 00:03:17,620
+was have, you know, theta J
+具体来讲 上面的 θj
+
+94
+00:03:18,310 --> 00:03:20,100
+here is theta J times
+对应下面的 θj 乘以括号里的1
+
+95
+00:03:20,450 --> 00:03:21,950
+1 and this term is
+而这一项是
+
+96
+00:03:22,910 --> 00:03:24,830
+londer over M. There's also an alpha
+λ 除以 m 还有一个α
+
+97
+00:03:25,140 --> 00:03:25,990
+here, so you end up
+把它们合在一起 所以你最终得到
+
+98
+00:03:26,180 --> 00:03:27,650
+with alpha londer over
+α 乘以 λ 再除以 m
+
+99
+00:03:27,970 --> 00:03:31,450
+m, multiply them to
+然后合在一起 乘以 θj
+
+100
+00:03:31,820 --> 00:03:33,660
+theta J and this term here, one minus
+而这一项
+
+101
+00:03:34,230 --> 00:03:36,300
+alpha times londer M, is
+1 减去 α 乘以 λ 除以 m
+
+102
+00:03:36,600 --> 00:03:39,470
+a pretty interesting term, it has a pretty interesting effect.
+这一项很有意思
+
+103
+00:03:42,310 --> 00:03:43,710
+Concretely, this term one
+具体来说 这一项
+
+104
+00:03:43,890 --> 00:03:45,320
+minus alpha times londer over
+1 减去 α 乘以 λ 除以 m
+
+105
+00:03:45,730 --> 00:03:46,780
+M, is going to be
+这一项的值
+
+106
+00:03:46,870 --> 00:03:48,740
+a number that's, you know, usually a number
+通常是一个具体的实数
+
+107
+00:03:48,800 --> 00:03:50,390
+that's a loop and less than 1,
+而且小于1
+
+108
+00:03:50,610 --> 00:03:51,670
+right? Because of
+对吧?由于
+
+109
+00:03:51,920 --> 00:03:53,580
+alpha times londer over M is
+α 乘以 λ 除以 m
+
+110
+00:03:54,070 --> 00:03:55,920
+going to be positive and usually, if you're learning rate is small and M is large.
+通常情况下是正的 如果你的学习速率小 而 m 很大的话
+
+111
+00:03:58,650 --> 00:03:58,860
+That's usually pretty small.
+(1 - αλ/m) 这一项通常是很小的
+
+112
+00:03:59,650 --> 00:04:00,680
+So this term here, it's going
+所以这里的一项
+
+113
+00:04:00,740 --> 00:04:03,060
+to be a number, it's usually, you know, a little bit less than one.
+一般来说将是一个比1小一点点的值
+
+114
+00:04:03,340 --> 00:04:04,150
+So think of it as
+所以我们可以把它想成
+
+115
+00:04:04,330 --> 00:04:05,860
+a number like 0.99, let's say
+一个像0.99一样的数字
+
+116
+00:04:07,380 --> 00:04:08,800
+and so, the effect of our
+所以
+
+117
+00:04:09,120 --> 00:04:10,550
+updates of theta J is we're
+对 θj 更新的结果
+
+118
+00:04:10,690 --> 00:04:11,950
+going to say that theta J
+我们可以看作是
+
+119
+00:04:12,410 --> 00:04:15,420
+gets replaced by thetata J times 0.99.
+被替换为 θj 的0.99倍
+
+120
+00:04:15,770 --> 00:04:17,500
+Alright so theta J
+也就是 θj
+
+121
+00:04:18,490 --> 00:04:20,940
+times 0.99 has the effect of
+乘以0.99
+
+122
+00:04:21,280 --> 00:04:23,560
+shrinking theta J a little bit towards 0.
+把 θj 向 0 压缩了一点点
+
+123
+00:04:23,670 --> 00:04:25,690
+So this makes theta J a bit smaller.
+所以这使得 θj 小了一点
+
+124
+00:04:26,220 --> 00:04:28,080
+More formally, this you know, this
+更正式地说
+
+125
+00:04:28,420 --> 00:04:29,750
+square norm of theta J
+θj 的平方范数
+
+126
+00:04:29,870 --> 00:04:31,580
+is smaller and then
+更小了
+
+127
+00:04:31,720 --> 00:04:33,430
+after that the second
+另外 这一项后边的第二项
+
+128
+00:04:33,910 --> 00:04:35,400
+term here, that's actually
+这实际上
+
+129
+00:04:35,980 --> 00:04:37,930
+exactly the same as the
+与我们原来的
+
+130
+00:04:38,050 --> 00:04:40,270
+original gradient descent updated that we had.
+梯度下降更新完全一样
+
+131
+00:04:40,750 --> 00:04:42,840
+Before we added all this regularization stuff.
+跟我们加入了正则项之前一样
+
+132
+00:04:44,270 --> 00:04:46,920
+So, hopefully this gradient
+好的 现在你应该对这个
+
+133
+00:04:47,380 --> 00:04:48,630
+descent, hopefully this update makes
+梯度下降的更新没有疑问了
+
+134
+00:04:48,880 --> 00:04:51,350
+sense, when we're using regularized linear
+当我们使用正则化线性回归时
+
+135
+00:04:51,550 --> 00:04:52,920
+regression what we're doing is on
+我们需要做的就是
+
+136
+00:04:53,320 --> 00:04:55,210
+every regularization were multiplying data
+在每一个被正规化的参数 θj 上
+
+137
+00:04:55,420 --> 00:04:56,310
+J by a number that
+乘以了一个
+
+138
+00:04:56,400 --> 00:04:57,300
+is a little bit less than one, so
+比1小一点点的数字
+
+139
+00:04:57,400 --> 00:04:58,900
+we're shrinking the parameter a
+也就是把参数压缩了一点
+
+140
+00:04:59,230 --> 00:05:00,340
+little bit, and then we're
+然后
+
+141
+00:05:00,500 --> 00:05:03,000
+performing a, you know, similar update as before.
+我们执行跟以前一样的更新
+
+142
+00:05:04,170 --> 00:05:05,460
+Of course that's just the
+当然 这仅仅是
+
+143
+00:05:05,610 --> 00:05:08,310
+intuition behind what this particular update is doing.
+从直观上认识 这个更新在做什么
+
+144
+00:05:08,910 --> 00:05:10,130
+Mathematically, what it's doing
+从数学上讲
+
+145
+00:05:10,580 --> 00:05:12,950
+is exactly gradient descent on
+它就是带有正则化项的 J(θ)
+
+146
+00:05:13,130 --> 00:05:14,330
+the cost function J of theta
+的梯度下降算法
+
+147
+00:05:15,150 --> 00:05:16,020
+that we defined on the previous
+我们在之前的幻灯片
+
+148
+00:05:16,480 --> 00:05:18,820
+slide that uses the regularization term.
+给出了定义
+
+149
+00:05:19,780 --> 00:05:21,210
+Gradient descent was just
+梯度下降只是
+
+150
+00:05:21,470 --> 00:05:23,050
+one our two algorithms for
+我们拟合线性回归模型的两种算法
+
+151
+00:05:24,470 --> 00:05:25,530
+fitting a linear regression model.
+的其中一个
+
+152
+00:05:26,630 --> 00:05:28,090
+The second algorithm was the
+第二种算法是
+
+153
+00:05:28,160 --> 00:05:29,130
+one based on the normal
+使用正规方程
+
+154
+00:05:29,680 --> 00:05:31,650
+equation where, what we
+我们的做法
+
+155
+00:05:31,740 --> 00:05:32,980
+did was we created the
+是建立这个
+
+156
+00:05:33,060 --> 00:05:34,770
+design matrix "x" where each
+设计矩阵 X 其中每一行
+
+157
+00:05:35,080 --> 00:05:37,830
+row corresponded to a separate training example.
+对应于一个单独的训练样本
+
+158
+00:05:38,520 --> 00:05:39,790
+And we created a vector
+然后创建了一个向量 y
+
+159
+00:05:40,170 --> 00:05:41,780
+Y, so this is
+向量 y 是一个
+
+160
+00:05:41,940 --> 00:05:43,320
+a vector that is an
+m 维的向量
+
+161
+00:05:43,590 --> 00:05:45,520
+M dimensional vector and that
+m 维的向量
+
+162
+00:05:46,010 --> 00:05:47,750
+contain the labels from a training set.
+包含了所有训练集里的标签
+
+163
+00:05:48,470 --> 00:05:49,600
+So whereas X is an
+所以 X 是一个
+
+164
+00:05:49,830 --> 00:05:52,660
+M by N plus 1 dimensional matrix.
+m × (n+1) 维矩阵
+
+165
+00:05:53,590 --> 00:05:55,220
+Y is an M dimensional
+y 是一个 m 维向量
+
+166
+00:05:55,780 --> 00:05:57,550
+vector and in order
+y 是一个 m 维向量
+
+167
+00:05:58,030 --> 00:05:59,200
+to minimize the cost
+为了最小化代价函数 J
+
+168
+00:05:59,470 --> 00:06:00,940
+function change we found
+我们发现
+
+169
+00:06:01,470 --> 00:06:03,000
+that of one way
+一个办法就是
+
+170
+00:06:03,230 --> 00:06:04,440
+to do is to set
+一个办法就是
+
+171
+00:06:04,670 --> 00:06:06,790
+theta to be equal to this.
+让 θ 等于这个式子
+
+172
+00:06:07,540 --> 00:06:09,040
+We have X transpose X,
+即 X 的转置乘以 X 再对结果取逆
+
+173
+00:06:10,860 --> 00:06:12,770
+inverse X transpose Y.
+再乘以 X 的转置乘以Y
+
+174
+00:06:13,020 --> 00:06:13,920
+I am leaving room here, to fill
+我在这里留点空间
+
+175
+00:06:14,120 --> 00:06:17,160
+in stuff of course. And what this
+等下再填满
+
+176
+00:06:17,650 --> 00:06:18,820
+value for theta does, is
+这个 θ 的值
+
+177
+00:06:19,180 --> 00:06:20,980
+this minimizes the cost
+其实就是最小化
+
+178
+00:06:21,250 --> 00:06:22,710
+function J of theta when
+代价函数 J(θ) 的θ值
+
+179
+00:06:22,840 --> 00:06:26,280
+we were not using regularization. Now
+这时的代价函数J(θ)没有正则项
+
+180
+00:06:26,460 --> 00:06:28,580
+that we are using regularization, if
+现在如果我们用了是正则化
+
+181
+00:06:28,780 --> 00:06:30,290
+you were to derive what the
+我们想要得到最小值
+
+182
+00:06:30,520 --> 00:06:31,820
+minimum is, and just to
+我们想要得到最小值
+
+183
+00:06:31,910 --> 00:06:32,760
+give you a sense of how to
+我们来看看应该怎么得到
+
+184
+00:06:32,980 --> 00:06:34,110
+derive the minimum, the way
+我们来看看应该怎么得到
+
+185
+00:06:34,220 --> 00:06:35,220
+you derive it is you know,
+推导的方法是
+
+186
+00:06:35,930 --> 00:06:37,910
+take partial derivatives in respect
+取 J 关于各个参数的偏导数
+
+187
+00:06:38,340 --> 00:06:40,600
+to each parameter, set this
+并令它们
+
+188
+00:06:40,830 --> 00:06:41,910
+to zero, and then do
+等于0 然后做些
+
+189
+00:06:42,060 --> 00:06:42,920
+a bunch of math, and you can
+数学推导 你可以
+
+190
+00:06:43,100 --> 00:06:45,060
+then show that is a formula
+得到这样的一个式子
+
+191
+00:06:45,550 --> 00:06:47,640
+like this that minimizes the cost function.
+它使得代价函数最小
+
+192
+00:06:48,590 --> 00:06:52,130
+And concretely, if you
+具体的说 如果你
+
+193
+00:06:52,240 --> 00:06:54,080
+are using regularization then this
+使用正则化
+
+194
+00:06:54,250 --> 00:06:56,320
+formula changes as follows. Inside this
+那么公式要做如下改变
+
+195
+00:06:56,480 --> 00:06:59,120
+parenthesis, you end up with a matrix like this.
+括号里结尾添这样一个矩阵
+
+196
+00:06:59,460 --> 00:07:00,940
+Zero, one, one, one
+0 1 1 1 等等
+
+197
+00:07:01,800 --> 00:07:03,520
+and so on, one until the bottom.
+直到最后一行
+
+198
+00:07:04,510 --> 00:07:05,510
+So this thing over here is
+所以这个东西在这里是
+
+199
+00:07:05,630 --> 00:07:07,810
+a matrix, who's upper leftmost entry is zero.
+一个矩阵 它的左上角的元素是0
+
+200
+00:07:08,560 --> 00:07:10,080
+There's ones on the diagonals and
+其余对角线元素都是1
+
+201
+00:07:10,190 --> 00:07:11,960
+then the zeros everywhere else on this matrix.
+剩下的元素也都是 0
+
+202
+00:07:13,050 --> 00:07:14,020
+Because I am drawing this a little bit sloppy.
+我画的比较随意
+
+203
+00:07:15,180 --> 00:07:16,790
+But as a concrete
+可以举一个例子
+
+204
+00:07:17,060 --> 00:07:18,210
+example if N equals 2,
+如果 n 等于2
+
+205
+00:07:19,090 --> 00:07:21,110
+then this matrix
+那么这个矩阵
+
+206
+00:07:21,840 --> 00:07:23,500
+is going to be a three by three matrix.
+将是一个3 × 3 矩阵
+
+207
+00:07:24,300 --> 00:07:26,210
+More generally, this matrix is
+更一般地情况 该矩阵是
+
+208
+00:07:26,360 --> 00:07:27,660
+a N plus one
+一个 (n+1) × (n+1) 维的矩阵
+
+209
+00:07:28,270 --> 00:07:30,290
+by N plus one dimensional matrix.
+一个 (n+1) × (n+1) 维的矩阵
+
+210
+00:07:31,620 --> 00:07:33,150
+So then N equals two then that
+因此 n 等于2时
+
+211
+00:07:33,370 --> 00:07:35,410
+matrix becomes something that looks like this.
+矩阵看起来会像这样
+
+212
+00:07:35,980 --> 00:07:37,360
+Zero, and then ones
+左上角是0
+
+213
+00:07:37,640 --> 00:07:39,020
+on the diagonals, and then
+然后其他对角线上是1
+
+214
+00:07:39,160 --> 00:07:41,100
+zeros on the rest of the diagonals.
+其余部分都是0
+
+215
+00:07:42,390 --> 00:07:43,990
+And once again, you know, I'm not going to those this derivation.
+同样地 我不打算对这些作数学推导
+
+216
+00:07:44,620 --> 00:07:46,280
+Which is frankly somewhat long and involved.
+坦白说这有点费时耗力
+
+217
+00:07:46,620 --> 00:07:47,530
+But it is possible to prove
+但可以证明
+
+218
+00:07:47,970 --> 00:07:49,550
+that if you are
+如果你采用新定义的 J(θ)
+
+219
+00:07:49,940 --> 00:07:50,770
+using the new definition of
+如果你采用新定义的 J(θ)
+
+220
+00:07:51,250 --> 00:07:53,730
+J of theta, with the regularization objective.
+包含正则项的目标函数
+
+221
+00:07:54,780 --> 00:07:56,070
+Then this new formula for
+那么这个计算 θ 的式子
+
+222
+00:07:56,220 --> 00:07:57,180
+theta is the one
+能使你的 J(θ)
+
+223
+00:07:57,390 --> 00:08:00,080
+that will give you the global minimum of J of theta.
+达到全局最小值
+
+224
+00:08:01,420 --> 00:08:02,460
+So finally, I want to
+所以最后
+
+225
+00:08:02,610 --> 00:08:05,460
+just quickly describe the issue of non-invertibility.
+我想快速地谈一下不可逆性的问题
+
+226
+00:08:06,800 --> 00:08:08,110
+This is relatively advanced material.
+这部分是比较高阶的内容
+
+227
+00:08:08,600 --> 00:08:09,530
+So you should consider this as
+所以这一部分还是作为选学
+
+228
+00:08:09,770 --> 00:08:11,600
+optional and feel free
+你可以跳过去
+
+229
+00:08:11,750 --> 00:08:12,520
+to skip it or if you
+或者你也可以听听
+
+230
+00:08:12,660 --> 00:08:14,180
+listen to it and you know, possibly it
+如果听不懂的话
+
+231
+00:08:14,320 --> 00:08:15,680
+don't really make sense, don't worry about it either.
+也没有关系
+
+232
+00:08:16,400 --> 00:08:18,950
+But earlier when I talked the normal equation method.
+之前当我讲正规方程的时候
+
+233
+00:08:19,700 --> 00:08:20,920
+We also had an optional video
+我们也有一段选学视频
+
+234
+00:08:21,800 --> 00:08:22,960
+on the non-invertability issue.
+讲不可逆的问题
+
+235
+00:08:23,700 --> 00:08:25,740
+So this is another optional part,
+所以这是另一个选学内容
+
+236
+00:08:26,170 --> 00:08:27,070
+that is sort of add on
+可以作为上次视频的补充
+
+237
+00:08:27,700 --> 00:08:30,100
+earlier optional video on non-invertibility.
+可以作为上次视频的补充
+
+238
+00:08:31,610 --> 00:08:33,350
+Now considering setting where M
+现在考虑 m
+
+239
+00:08:33,850 --> 00:08:35,340
+the number of examples is less
+即样本总数
+
+240
+00:08:35,690 --> 00:08:37,530
+than or equal to N the number features.
+小与或等于特征数量 n
+
+241
+00:08:38,650 --> 00:08:40,080
+If you have fewer examples then
+如果你的样本数量
+
+242
+00:08:40,200 --> 00:08:41,480
+features then this matrix
+比特征数量小的话 那么这个矩阵
+
+243
+00:08:42,170 --> 00:08:43,870
+X transpose X will be
+X 转置乘以 X 将是
+
+244
+00:08:44,070 --> 00:08:47,770
+non-invertible or singular, or
+不可逆或奇异的(singluar)
+
+245
+00:08:48,060 --> 00:08:50,120
+the other term
+或者用另一种说法是
+
+246
+00:08:50,360 --> 00:08:51,470
+for this is the matrix will
+这个矩阵是
+
+247
+00:08:51,530 --> 00:08:53,390
+be degenerate and if
+退化(degenerate)的
+
+248
+00:08:53,860 --> 00:08:54,780
+you implement this in Octave
+如果你在 Octave 里运行它
+
+249
+00:08:55,300 --> 00:08:56,380
+anyway, and you use the
+无论如何
+
+250
+00:08:56,620 --> 00:08:58,570
+P in function to take the psuedo inverse.
+你用函数 pinv 取伪逆矩阵
+
+251
+00:08:58,850 --> 00:08:59,800
+It will kind of do the
+这样计算
+
+252
+00:09:00,080 --> 00:09:01,900
+right thing that is not
+理论上方法是正确的
+
+253
+00:09:02,240 --> 00:09:03,450
+clear that it will
+但实际上
+
+254
+00:09:03,560 --> 00:09:04,570
+give you a very good hypothesis
+你不会得到一个很好的假设
+
+255
+00:09:05,410 --> 00:09:07,720
+even though numerically the octave
+尽管 Ocatve 会
+
+256
+00:09:08,370 --> 00:09:09,670
+P in function
+用 pinv 函数
+
+257
+00:09:10,020 --> 00:09:11,050
+will give you a result that
+给你一个数值解
+
+258
+00:09:11,340 --> 00:09:13,210
+kind of makes sense.
+看起来还不错
+
+259
+00:09:13,440 --> 00:09:15,460
+But, if you were doing this in a different language.
+但是 如果你是在一个不同的编程语言中
+
+260
+00:09:16,270 --> 00:09:17,590
+And if you were
+如果在 Octave 中
+
+261
+00:09:17,710 --> 00:09:19,030
+taking just the regular inverse
+你用 inv 来取常规逆
+
+262
+00:09:20,470 --> 00:09:22,070
+which an octave is denoted with the function INV.
+你用 inv 来取常规逆
+
+263
+00:09:23,240 --> 00:09:24,010
+We're trying to take the regular
+也就是我们要对
+
+264
+00:09:24,330 --> 00:09:25,620
+inverse of X transpose X,
+X 转置乘以 X 取常规逆
+
+265
+00:09:26,300 --> 00:09:28,030
+then in this setting you
+然后在这样的情况下
+
+266
+00:09:28,150 --> 00:09:30,340
+find that X transpose X
+你会发现 X 转置乘以 X
+
+267
+00:09:30,450 --> 00:09:32,750
+is singular, is non-invertible and
+是奇异的 是不可逆的
+
+268
+00:09:32,790 --> 00:09:33,740
+if you're doing this in a different
+即使你在不同的
+
+269
+00:09:33,990 --> 00:09:35,830
+programming language and using some
+编程语言里计算 并使用一些
+
+270
+00:09:36,230 --> 00:09:39,160
+linear algebra library try to take the inverse of this matrix.
+线性代数库 试图计算这个矩阵的逆矩阵
+
+271
+00:09:39,840 --> 00:09:41,080
+It just might not work because that
+都是不可行的
+
+272
+00:09:41,220 --> 00:09:43,060
+matrix is non-invertible or singular.
+因为这个矩阵是不可逆的或奇异的
+
+273
+00:09:44,650 --> 00:09:47,110
+Fortunately, regularization also takes
+幸运的是 正规化也
+
+274
+00:09:47,110 --> 00:09:49,850
+care of this for us, and concretely, so
+为我们解决了这个问题 具体地说
+
+275
+00:09:50,010 --> 00:09:53,370
+long as the regularization parameter is strictly greater than zero.
+只要正则参数是严格大于0的
+
+276
+00:09:53,870 --> 00:09:55,220
+It is actually possible to
+实际上 可以
+
+277
+00:09:55,300 --> 00:09:56,840
+prove that this matrix X
+证明该矩阵 X 转置 乘以 X
+
+278
+00:09:57,080 --> 00:09:58,690
+transpose X plus parameter time, you know,s
+加上 λ 乘以
+
+279
+00:09:59,080 --> 00:10:00,400
+this funny matrix here,
+这里这个矩阵
+
+280
+00:10:00,970 --> 00:10:02,250
+is possible to prove that this
+可以证明
+
+281
+00:10:02,470 --> 00:10:03,650
+matrix will not be
+这个矩阵将不是奇异的
+
+282
+00:10:03,760 --> 00:10:05,710
+singular and that this matrix will be invertible.
+即该矩阵将是可逆的
+
+283
+00:10:07,450 --> 00:10:09,430
+So using regularization also takes
+因此 使用正则化还可以
+
+284
+00:10:09,700 --> 00:10:11,910
+care of any non-invertibility issues
+照顾一些 X 转置乘以 X 不可逆的问题
+
+285
+00:10:12,580 --> 00:10:14,470
+of the X transpose X matrix as well.
+照顾一些 X 转置乘以 X 不可逆的问题
+
+286
+00:10:15,260 --> 00:10:18,000
+So, you now know how to implement regularize linear regression.
+好的 你现在知道了如何实现正则化线性回归
+
+287
+00:10:18,870 --> 00:10:19,910
+Using this, you'll be able
+利用它 你就可以
+
+288
+00:10:20,300 --> 00:10:21,970
+to avoid overfitting, even
+避免过度拟合
+
+289
+00:10:22,210 --> 00:10:24,720
+if you have lots of features in a relatively small training set.
+即使你在一个相对较小的训练集里有很多特征
+
+290
+00:10:25,360 --> 00:10:26,630
+And this should let you get
+这应该可以让你
+
+291
+00:10:26,980 --> 00:10:29,000
+linear regression to work much better for many problems.
+在很多问题上更好地运用线性回归
+
+292
+00:10:30,060 --> 00:10:31,190
+In the next video, we'll take
+在接下来的视频中 我们将
+
+293
+00:10:31,390 --> 00:10:34,310
+this regularization idea and apply it to logistic regression.
+把这种正则化的想法应用到逻辑回归
+
+294
+00:10:35,140 --> 00:10:36,170
+So that you'll be able to
+这样你就可以
+
+295
+00:10:36,280 --> 00:10:37,630
+get logistic impression to avoid
+让逻辑回归也避免过度拟合
+
+296
+00:10:37,920 --> 00:10:39,830
+overfitting and perform much better as well.
+并让它表现的更好
+
diff --git a/srt/7 - 4 - Regularized Logistic Regression (9 min).srt b/srt/7 - 4 - Regularized Logistic Regression (9 min).srt
new file mode 100644
index 00000000..862afa34
--- /dev/null
+++ b/srt/7 - 4 - Regularized Logistic Regression (9 min).srt
@@ -0,0 +1,1246 @@
+1
+00:00:00,160 --> 00:00:01,480
+For logistic regression, we previously
+针对逻辑回归问题
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,110 --> 00:00:04,730
+talked about two types of optimization algorithms.
+我们在之前的课程已经学习过两种优化算法
+
+3
+00:00:05,190 --> 00:00:06,190
+We talked about how to use
+我们首先学习了
+
+4
+00:00:06,560 --> 00:00:09,210
+gradient descent to optimize as cost function J of theta.
+使用梯度下降法来优化代价函数 J(θ)
+
+5
+00:00:09,690 --> 00:00:10,770
+And we also talked about
+接下来学习了
+
+6
+00:00:11,120 --> 00:00:12,730
+advanced optimization methods.
+更高级的优化算法
+
+7
+00:00:13,520 --> 00:00:14,670
+Ones that require that you
+这些高级优化算法
+
+8
+00:00:14,790 --> 00:00:16,300
+provide a way to compute
+需要你自己设计
+
+9
+00:00:16,940 --> 00:00:18,160
+your cost function J of
+代价函数 J(θ)
+
+10
+00:00:18,420 --> 00:00:20,920
+theta and that you provide a way to compute the derivatives.
+自己计算导数
+
+11
+00:00:22,450 --> 00:00:23,920
+In this video, we'll show how
+在本节课中
+
+12
+00:00:24,190 --> 00:00:25,420
+you can adapt both of
+我们将展示
+
+13
+00:00:25,500 --> 00:00:27,570
+those techniques, both gradient descent and
+如何改进梯度下降法和
+
+14
+00:00:27,720 --> 00:00:29,350
+the more advanced optimization techniques
+高级优化算法
+
+15
+00:00:30,280 --> 00:00:31,770
+in order to have them
+使其能够应用于
+
+16
+00:00:31,950 --> 00:00:33,550
+work for regularized logistic regression.
+正则化的逻辑回归
+
+17
+00:00:35,430 --> 00:00:36,670
+So, here's the idea.
+接下来我们来学习其中的原理
+
+18
+00:00:37,260 --> 00:00:38,770
+We saw earlier that Logistic
+在之前的课程中我们注意到
+
+19
+00:00:39,190 --> 00:00:40,490
+Regression can also be prone
+对于逻辑回归问题
+
+20
+00:00:40,850 --> 00:00:42,540
+to overfitting if you fit
+有可能会出现过拟合的现象
+
+21
+00:00:42,810 --> 00:00:44,090
+it with a very, sort of,
+如果你使用了
+
+22
+00:00:44,290 --> 00:00:45,890
+high order polynomial features like this.
+类似这样的高阶多项式
+
+23
+00:00:46,470 --> 00:00:48,250
+Where G is the
+g 是 S 型函数
+
+24
+00:00:48,480 --> 00:00:49,970
+sigmoid function and in
+具体来说
+
+25
+00:00:50,030 --> 00:00:51,330
+particular you end up with
+最后你会得到这样的结果
+
+26
+00:00:51,530 --> 00:00:53,020
+a hypothesis, you know,
+最后你会得到这样的结果
+
+27
+00:00:53,150 --> 00:00:54,120
+whose decision bound to be
+分类边界看起来是一个
+
+28
+00:00:54,360 --> 00:00:55,930
+just sort of an overly complex
+过于复杂并且
+
+29
+00:00:56,620 --> 00:00:58,600
+and extremely contortive function that
+十分扭曲的函数
+
+30
+00:00:58,820 --> 00:00:59,680
+really isn't such a great
+针对这个训练点集
+
+31
+00:00:59,790 --> 00:01:01,000
+hypothesis for this training
+这显然不是一个好的结果
+
+32
+00:01:01,350 --> 00:01:02,990
+set, and more generally if you have
+通常情况下
+
+33
+00:01:03,120 --> 00:01:04,890
+logistic regression with a lot of features.
+如果要解决的逻辑回归问题有很多参数
+
+34
+00:01:05,150 --> 00:01:06,630
+Not necessarily polynomial ones, but
+并且又用了过多的多项式项
+
+35
+00:01:06,790 --> 00:01:07,510
+just with a lot of
+这些项大部分都是没有必要的
+
+36
+00:01:07,670 --> 00:01:09,720
+features you can end up with overfitting.
+最终都可能出现过拟合的现象
+
+37
+00:01:11,620 --> 00:01:14,010
+This was our cost function for logistic regression.
+这是逻辑回归问题的代价函数
+
+38
+00:01:14,810 --> 00:01:16,210
+And if we want to modify
+为了将其修改为正则化形式
+
+39
+00:01:16,740 --> 00:01:18,820
+it to use regularization, all we
+为了将其修改为正则化形式
+
+40
+00:01:18,950 --> 00:01:20,630
+need to do is add to
+我们只需要在后面增加一项
+
+41
+00:01:20,820 --> 00:01:22,290
+it the following term
+我们只需要在后面增加一项
+
+42
+00:01:22,650 --> 00:01:24,860
+plus londer over 2M, sum
+加上 λ/2m
+
+43
+00:01:25,110 --> 00:01:26,580
+from J equals 1, and
+再跟过去一样
+
+44
+00:01:26,730 --> 00:01:29,670
+as usual sum from J equals 1.
+这个求和将 j 从1开始
+
+45
+00:01:29,800 --> 00:01:31,000
+Rather than the sum from J
+而不是从0开始
+
+46
+00:01:31,550 --> 00:01:33,670
+equals 0, of theta J squared.
+累积 θj 的平方
+
+47
+00:01:34,330 --> 00:01:35,470
+And this has to
+增加的这一项
+
+48
+00:01:35,750 --> 00:01:36,960
+effect therefore, of penalizing
+将惩罚参数 θ1, θ2 等等
+
+49
+00:01:37,650 --> 00:01:39,140
+the parameters theta 1 theta
+一直到 θn
+
+50
+00:01:39,570 --> 00:01:42,600
+2 and so on up to theta N from being too large.
+防止这些参数取值过大
+
+51
+00:01:43,610 --> 00:01:44,720
+And if you do this,
+增加了这一项之后
+
+52
+00:01:45,720 --> 00:01:46,450
+then it will the have the
+产生的效果是
+
+53
+00:01:46,750 --> 00:01:48,870
+effect that even though you're fitting
+即使用有很多参数的
+
+54
+00:01:49,250 --> 00:01:51,500
+a very high order polynomial with a lot of parameters.
+高阶多项式来拟合
+
+55
+00:01:52,210 --> 00:01:53,240
+So long as you apply regularization
+只要使用了正则化方法
+
+56
+00:01:53,910 --> 00:01:55,090
+and keep the parameters small
+约束这些参数使其取值很小
+
+57
+00:01:55,850 --> 00:01:57,580
+you're more likely to get a decision boundary.
+你仍有可能得到一条
+
+58
+00:01:58,830 --> 00:02:00,040
+You know, that maybe looks more like this.
+看起来是这样的分类边界
+
+59
+00:02:00,320 --> 00:02:01,460
+It looks more reasonable for separating
+显然 这条边界更合理地
+
+60
+00:02:02,500 --> 00:02:03,740
+the positive and the negative examples.
+分开了正样本和负样本
+
+61
+00:02:05,300 --> 00:02:06,970
+So, when using regularization
+因此 在使用了正则化方法以后
+
+62
+00:02:08,140 --> 00:02:09,080
+even when you have a lot
+即使你的问题有很多参数
+
+63
+00:02:09,220 --> 00:02:11,110
+of features, the regularization can
+正则化方法可以帮你
+
+64
+00:02:11,620 --> 00:02:13,500
+help take care of the overfitting problem.
+避免过拟合的现象
+
+65
+00:02:14,740 --> 00:02:15,790
+How do we actually implement this?
+这到底是怎样实现的呢?
+
+66
+00:02:16,720 --> 00:02:18,280
+Well, for the original gradient descent
+首先看看以前学过的梯度下降法
+
+67
+00:02:18,710 --> 00:02:20,380
+algorithm, this was the update we had.
+这是我们之前得到的更新式
+
+68
+00:02:20,670 --> 00:02:22,300
+We will repeatedly perform the following
+我们利用这个式子
+
+69
+00:02:22,750 --> 00:02:24,610
+update to theta J. This
+迭代更新 θj
+
+70
+00:02:24,740 --> 00:02:26,940
+slide looks a lot like the previous one for linear regression.
+这一页幻灯片看起来和上一节课的线性回归问题很像
+
+71
+00:02:27,510 --> 00:02:28,460
+But what I'm going to do is
+但是这里我将
+
+72
+00:02:29,210 --> 00:02:31,390
+write the update for theta 0 separately.
+θ0 的更新公式单独写出来
+
+73
+00:02:31,670 --> 00:02:32,930
+So, the first line is
+第一行用来更新 θ0
+
+74
+00:02:33,060 --> 00:02:34,110
+for update for theta 0 and
+第一行用来更新 θ0
+
+75
+00:02:34,230 --> 00:02:35,470
+a second line is now
+第二行用来更新
+
+76
+00:02:35,590 --> 00:02:36,730
+my update for theta 1
+θ1 到 θn
+
+77
+00:02:36,880 --> 00:02:38,470
+up to theta N.
+θ1 到 θn
+
+78
+00:02:38,900 --> 00:02:40,740
+Because I'm going to treat theta 0 separately.
+将 θ0 单独处理
+
+79
+00:02:41,700 --> 00:02:43,140
+And in order to
+为了按照
+
+80
+00:02:43,700 --> 00:02:45,370
+modify this algorithm, to use
+正则化代价函数的形式
+
+81
+00:02:46,770 --> 00:02:48,480
+a regularized cos function,
+来修改算法
+
+82
+00:02:49,100 --> 00:02:50,510
+all I need to do is
+接下来的推导
+
+83
+00:02:50,950 --> 00:02:51,810
+pretty similar to what we
+非常类似于
+
+84
+00:02:51,930 --> 00:02:53,700
+did for linear regression is
+上一节学习过的正则化线性回归
+
+85
+00:02:53,870 --> 00:02:55,620
+actually to just modify this
+只需要将第二个式子
+
+86
+00:02:55,890 --> 00:02:57,480
+second update rule as follows.
+修改成这样
+
+87
+00:02:58,510 --> 00:02:59,800
+And, once again, this, you know,
+我们又一次发现
+
+88
+00:03:00,380 --> 00:03:02,080
+cosmetically looks identical what
+修改后的式子表面上看起来
+
+89
+00:03:02,230 --> 00:03:03,720
+we had for linear regression.
+与上一节的线性回归问题很相似
+
+90
+00:03:04,580 --> 00:03:05,580
+But of course is not the
+但是实质上这与
+
+91
+00:03:05,660 --> 00:03:06,590
+same algorithm as we had,
+我们上节学过的算法并不一样
+
+92
+00:03:06,890 --> 00:03:08,370
+because now the hypothesis
+因为现在的假设 h(x)
+
+93
+00:03:08,780 --> 00:03:10,420
+is defined using this.
+是按照这个式子定义的
+
+94
+00:03:10,860 --> 00:03:12,550
+So this is not the same algorithm
+这与上一节正则化线性回归算法
+
+95
+00:03:13,130 --> 00:03:14,390
+as regularized linear regression.
+中的定义并不一样
+
+96
+00:03:14,830 --> 00:03:16,340
+Because the hypothesis is different.
+由于假设的不同
+
+97
+00:03:16,940 --> 00:03:18,360
+Even though this update that I wrote down.
+我写下的迭代公式
+
+98
+00:03:18,630 --> 00:03:20,160
+It actually looks cosmetically the
+只是表面上看起来很像
+
+99
+00:03:20,350 --> 00:03:22,130
+same as what we had earlier.
+上一节学过的
+
+100
+00:03:22,480 --> 00:03:25,310
+We're working out gradient descent for regularized linear regression.
+正则化线性回归问题中的梯度下降算法
+
+101
+00:03:26,690 --> 00:03:27,720
+And of course, just to wrap
+总结一下
+
+102
+00:03:27,830 --> 00:03:29,360
+up this discussion, this term
+总结一下
+
+103
+00:03:29,560 --> 00:03:30,860
+here in the square
+方括号中的这一项
+
+104
+00:03:31,130 --> 00:03:32,330
+brackets, so this term
+方括号中的这一项
+
+105
+00:03:32,670 --> 00:03:35,120
+here, this term is,
+这一项是
+
+106
+00:03:35,410 --> 00:03:36,750
+of course, the new partial
+新的代价函数 J(θ)
+
+107
+00:03:37,210 --> 00:03:38,590
+derivative for respect of
+关于 θj 的偏导数
+
+108
+00:03:38,660 --> 00:03:41,420
+theta J of the new cost function J of theta.
+关于 θj 的偏导数
+
+109
+00:03:42,300 --> 00:03:43,480
+Where J of theta here is
+这里的 J(θ)
+
+110
+00:03:43,700 --> 00:03:44,980
+the cost function we defined on
+是我们在上一页幻灯片中
+
+111
+00:03:45,180 --> 00:03:48,100
+a previous slide that does use regularization.
+定义的 使用了正则化的代价函数
+
+112
+00:03:49,770 --> 00:03:52,060
+So, that's gradient descent for regularized linear regression.
+以上就是正则化逻辑回归问题的梯度下降算法
+
+113
+00:03:55,200 --> 00:03:56,430
+Let's talk about how to
+接下来我们讨论
+
+114
+00:03:56,580 --> 00:03:58,290
+get regularized linear regression
+如何在更高级的优化算法中
+
+115
+00:03:58,950 --> 00:04:00,010
+to work using the more
+使用同样的
+
+116
+00:04:00,360 --> 00:04:02,070
+advanced optimization methods.
+正则化技术
+
+117
+00:04:03,180 --> 00:04:05,590
+And just to remind you for
+提醒一下
+
+118
+00:04:05,840 --> 00:04:06,800
+those methods what we needed
+对于这些高级算法
+
+119
+00:04:07,080 --> 00:04:08,390
+to do was to define the
+我们需要自己定义
+
+120
+00:04:08,450 --> 00:04:09,460
+function that's called the cost
+costFuntion 函数
+
+121
+00:04:09,640 --> 00:04:11,160
+function, that takes us
+这个函数有一个输入参数
+
+122
+00:04:11,280 --> 00:04:13,660
+input the parameter vector theta and
+向量 theta
+
+123
+00:04:13,790 --> 00:04:16,180
+once again in the equations
+theta 的内容是这样的
+
+124
+00:04:16,770 --> 00:04:19,030
+we've been writing here we used 0 index vectors.
+我们的参数索引依然从0开始
+
+125
+00:04:19,510 --> 00:04:20,690
+So we had theta 0 up
+即 θ0 到 θn
+
+126
+00:04:21,180 --> 00:04:22,810
+to theta N. But
+但是由于 Octave 中
+
+127
+00:04:23,020 --> 00:04:25,920
+because Octave indexes the vectors starting from 1.
+向量索引是从1开始
+
+128
+00:04:26,820 --> 00:04:28,240
+Theta 0 is written
+我们的参数是从 θ0 到 θn
+
+129
+00:04:28,560 --> 00:04:29,990
+in Octave as theta 1.
+在 Octave 里 是从 theta(1) 开始标号的
+
+130
+00:04:30,120 --> 00:04:31,630
+Theta 1 is written in
+而 θ1 将被记为 theta(2)
+
+131
+00:04:31,860 --> 00:04:32,930
+Octave as theta 2, and
+以此类推
+
+132
+00:04:33,280 --> 00:04:35,070
+so on down to theta
+直到 θn 被记为
+
+133
+00:04:36,270 --> 00:04:36,650
+N plus 1.
+theta(n+1)
+
+134
+00:04:36,740 --> 00:04:38,450
+And what we needed to
+而我们需要做的
+
+135
+00:04:38,600 --> 00:04:40,240
+do was provide a function.
+就是将这个自定义代价函数
+
+136
+00:04:41,170 --> 00:04:42,370
+Let's provide a function called
+这个 costFunction 函数
+
+137
+00:04:42,780 --> 00:04:44,140
+cost function that we would
+代入到我们之前学过的
+
+138
+00:04:44,360 --> 00:04:46,920
+then pass in to what we have, what we saw earlier.
+代入到我们之前学过的
+
+139
+00:04:47,300 --> 00:04:48,490
+We will use the fminunc
+fminunc函数中
+
+140
+00:04:49,060 --> 00:04:50,310
+and then
+括号里面是 @costFunction
+
+141
+00:04:50,540 --> 00:04:52,160
+you know at cost function,
+将 @costFunction 作为参数代进去
+
+142
+00:04:54,830 --> 00:04:55,430
+and so on, right.
+等等
+
+143
+00:04:55,600 --> 00:04:56,870
+But the F min, u
+fminunc返回的是
+
+144
+00:04:57,030 --> 00:04:58,060
+and c was the F min
+函数 costFunction
+
+145
+00:04:58,280 --> 00:04:59,310
+unconstrained and this will
+在无约束条件下的最小值
+
+146
+00:04:59,650 --> 00:05:01,230
+work with fminunc
+因此 这个式子
+
+147
+00:05:01,310 --> 00:05:02,300
+was what will take
+将求得代价函数的最小值
+
+148
+00:05:02,540 --> 00:05:04,340
+the cost function and minimize it for us.
+将求得代价函数的最小值
+
+149
+00:05:05,950 --> 00:05:07,050
+So the two main things that
+因此 costFunction 函数
+
+150
+00:05:07,170 --> 00:05:08,600
+the cost function needed to
+有两个返回值
+
+151
+00:05:08,700 --> 00:05:10,620
+return were first J-val.
+第一个是 jVal
+
+152
+00:05:11,280 --> 00:05:12,400
+And for that, we need
+为此 我们要在这里
+
+153
+00:05:12,720 --> 00:05:13,950
+to write code to
+补充代码
+
+154
+00:05:14,020 --> 00:05:15,710
+compute the cost function J of theta.
+来计算代价函数 J(θ)
+
+155
+00:05:17,130 --> 00:05:19,030
+Now, when we're using regularized logistic
+由于我们在这使用的是正则化逻辑回归
+
+156
+00:05:19,450 --> 00:05:20,920
+regression, of course the
+因此
+
+157
+00:05:20,990 --> 00:05:21,960
+cost function j of theta
+代价函数 J(θ) 也相应需要改变
+
+158
+00:05:22,280 --> 00:05:23,450
+changes and, in particular,
+具体来说
+
+159
+00:05:24,480 --> 00:05:25,760
+now a cost function needs to
+代价函数需要
+
+160
+00:05:25,870 --> 00:05:29,580
+include this additional regularization term at the end as well.
+增加这一正则化项
+
+161
+00:05:29,850 --> 00:05:30,930
+So, when you compute j of
+因此 当你在计算 J(θ) 时
+
+162
+00:05:31,030 --> 00:05:33,410
+theta be sure to include that term at the end.
+需要确保包含了最后这一项
+
+163
+00:05:34,590 --> 00:05:35,520
+And then, the other thing that
+另外 代价函数的
+
+164
+00:05:36,050 --> 00:05:37,240
+this cost function thing
+另一项返回值是
+
+165
+00:05:37,690 --> 00:05:39,010
+needs to derive with a gradient.
+对应的梯度导数
+
+166
+00:05:39,530 --> 00:05:41,170
+So gradient one needs
+梯度的第一个元素
+
+167
+00:05:41,400 --> 00:05:42,570
+to be set to the
+gradient(1) 就等于
+
+168
+00:05:42,660 --> 00:05:44,080
+partial derivative of J
+J(θ) 关于 θ0 的偏导数
+
+169
+00:05:44,240 --> 00:05:45,520
+of theta with respect to theta
+J(θ)关于θ0的偏导数
+
+170
+00:05:45,690 --> 00:05:47,170
+zero, gradient two needs
+梯度的第二个元素按照这个式子计算
+
+171
+00:05:47,580 --> 00:05:49,520
+to be set to that, and so on.
+剩余元素以此类推
+
+172
+00:05:49,780 --> 00:05:50,900
+Once again, the index is off by one.
+再次强调 向量元素索引是从1开始
+
+173
+00:05:51,220 --> 00:05:52,850
+Right, because of the indexing from
+这是因为 Octave 的向量索引
+
+174
+00:05:53,110 --> 00:05:54,450
+one Octave users.
+就是从1开始的
+
+175
+00:05:55,940 --> 00:05:56,780
+And looking at these terms.
+再来总结一下
+
+176
+00:05:57,850 --> 00:05:58,680
+This term over here.
+首先看第一个公式
+
+177
+00:05:59,410 --> 00:06:00,640
+We actually worked this out
+在之前的课程中
+
+178
+00:06:00,720 --> 00:06:02,840
+on a previous slide is actually equal to this.
+我们已经计算过它等于这个式子
+
+179
+00:06:03,230 --> 00:06:03,640
+It doesn't change.
+这个式子没有变化
+
+180
+00:06:04,120 --> 00:06:07,250
+Because the derivative for theta zero doesn't change.
+因为相比没有正则化的版本
+
+181
+00:06:07,650 --> 00:06:09,540
+Compared to the version without regularization.
+J(θ) 关于 θ0 的偏导数不会改变
+
+182
+00:06:10,960 --> 00:06:13,210
+And the other terms do change.
+但是其他的公式确实有变化
+
+183
+00:06:13,840 --> 00:06:16,340
+And in particular the derivative respect to theta one.
+以 θ1 的偏导数为例
+
+184
+00:06:17,010 --> 00:06:18,830
+We worked this out on the previous slide as well.
+在之前的课程里我们也计算过这一项
+
+185
+00:06:19,110 --> 00:06:20,670
+Is equal to, you know,
+它等于这个式子
+
+186
+00:06:20,890 --> 00:06:22,560
+the original term and then minus
+减去 λ 除以 m (这里应为加 校对者注)
+
+187
+00:06:23,450 --> 00:06:24,870
+londer M times theta 1.
+再乘以 θ1
+
+188
+00:06:25,310 --> 00:06:27,140
+Just so we make sure we pass this correctly.
+注意要确保这段代码编写正确
+
+189
+00:06:27,800 --> 00:06:29,370
+And we can add parentheses here.
+建议在这里添加括号
+
+190
+00:06:29,830 --> 00:06:30,980
+Right, so the summation doesn't extend.
+防止求和符号的作用域扩大
+
+191
+00:06:31,570 --> 00:06:33,160
+And similarly, you know,
+类似的
+
+192
+00:06:33,380 --> 00:06:34,800
+this other term here looks
+再来看这个式子
+
+193
+00:06:35,130 --> 00:06:36,180
+like this, with this additional
+相比于之前的幻灯片
+
+194
+00:06:37,070 --> 00:06:37,950
+term that we had on
+这里多了额外的一项
+
+195
+00:06:38,030 --> 00:06:39,770
+the previous slide, that corresponds to
+这就是正则化后的
+
+196
+00:06:39,950 --> 00:06:41,450
+the gradient from their regularization objective.
+梯度计算方法
+
+197
+00:06:42,230 --> 00:06:43,650
+So if you implement this
+当你自己定义了
+
+198
+00:06:43,820 --> 00:06:45,140
+cost function and pass
+costFunction 函数
+
+199
+00:06:45,720 --> 00:06:47,370
+this into fminunc
+并将其传递到 fminuc
+
+200
+00:06:48,190 --> 00:06:49,160
+or to one of those advanced optimization
+或者其他类似的高级优化函数中
+
+201
+00:06:50,050 --> 00:06:51,940
+techniques, that will minimize
+就可以求出
+
+202
+00:06:52,540 --> 00:06:55,990
+the new regularized cost function J of theta.
+这个新的正则化代价函数的极小值
+
+203
+00:06:56,990 --> 00:06:58,220
+And the parameters you get out
+而返回的参数值
+
+204
+00:06:59,530 --> 00:07:00,740
+will be the ones that correspond to
+即是对应的
+
+205
+00:07:01,450 --> 00:07:02,940
+logistic regression with regularization.
+逻辑回归问题的正则化解
+
+206
+00:07:04,410 --> 00:07:05,540
+So, now you know
+讲到这里 你应该已经学会了
+
+207
+00:07:05,780 --> 00:07:08,210
+how to implement regularized logistic regression.
+解决正则化逻辑回归问题的方法
+
+208
+00:07:09,780 --> 00:07:10,920
+When I walk around Silicon Valley,
+你知道吗 我住在硅谷
+
+209
+00:07:11,380 --> 00:07:12,900
+I live here in Silicon Valley, there are
+当我在硅谷晃悠时
+
+210
+00:07:13,100 --> 00:07:14,900
+a lot of engineers that are frankly, making
+我看到许多工程师
+
+211
+00:07:15,420 --> 00:07:16,490
+a ton of money for their
+运用机器学习算法
+
+212
+00:07:16,610 --> 00:07:18,090
+companies using machine learning algorithms.
+给他们公司挣来了很多金子
+
+213
+00:07:19,180 --> 00:07:20,390
+And I know we've
+课讲到这里
+
+214
+00:07:20,600 --> 00:07:22,860
+only been, you know, studying this stuff for a little while.
+大家对机器学习算法可能还只是略懂
+
+215
+00:07:23,620 --> 00:07:25,410
+But if you understand linear
+但是一旦你精通了
+
+216
+00:07:26,510 --> 00:07:28,360
+regression, the advanced optimization
+线性回归、高级优化算法
+
+217
+00:07:29,210 --> 00:07:30,710
+algorithms and regularization, by
+和正则化技术
+
+218
+00:07:30,950 --> 00:07:32,520
+now, frankly, you probably know
+坦率地说
+
+219
+00:07:32,950 --> 00:07:34,270
+quite a lot more machine learning
+你对机器学习的理解
+
+220
+00:07:35,010 --> 00:07:36,290
+than many, certainly now,
+可能已经比许多工程师深入了
+
+221
+00:07:36,750 --> 00:07:38,050
+but you probably know quite a
+现在 你已经有了
+
+222
+00:07:38,180 --> 00:07:39,580
+lot more machine learning right now
+丰富的机器学习知识
+
+223
+00:07:40,240 --> 00:07:41,670
+than frankly, many of the
+目测比那些硅谷工程师还厉害
+
+224
+00:07:41,820 --> 00:07:44,760
+Silicon Valley engineers out there having very successful careers.
+而那些工程师都混得还不错
+
+225
+00:07:45,300 --> 00:07:46,420
+You know, making tons of money for the companies.
+给他们公司挣了大钱 你懂的
+
+226
+00:07:47,050 --> 00:07:49,250
+Or building products using machine learning algorithms.
+或者用机器学习算法来做产品
+
+227
+00:07:50,370 --> 00:07:50,960
+So, congratulations.
+所以 恭喜你
+
+228
+00:07:52,080 --> 00:07:53,120
+You've actually come a long ways.
+你已经历练得差不多了
+
+229
+00:07:53,490 --> 00:07:54,550
+And you can actually, you
+已经具备足够的知识
+
+230
+00:07:54,780 --> 00:07:55,990
+actually know enough to
+足够将这些算法
+
+231
+00:07:56,310 --> 00:07:58,210
+apply this stuff and get to work for many problems.
+用于解决实际问题
+
+232
+00:07:59,260 --> 00:08:00,580
+So congratulations for that.
+所以你可以小小的骄傲一下了
+
+233
+00:08:00,780 --> 00:08:01,880
+But of course, there's
+但是
+
+234
+00:08:02,350 --> 00:08:03,280
+still a lot more that we
+我还是有很多可以教你们的
+
+235
+00:08:03,400 --> 00:08:05,180
+want to teach you, and in
+我还是有很多可以教你们的
+
+236
+00:08:05,380 --> 00:08:06,540
+the next set of videos after
+接下来的课程中
+
+237
+00:08:06,560 --> 00:08:07,850
+this, we'll start to talk
+我们将学习
+
+238
+00:08:08,030 --> 00:08:10,890
+about a very powerful cause of non-linear classifier.
+一个非常强大的非线性分类器
+
+239
+00:08:11,680 --> 00:08:13,350
+So whereas linear regression, logistic
+无论是线性回归问题
+
+240
+00:08:13,690 --> 00:08:14,940
+regression, you know, you can
+还是逻辑回归问题
+
+241
+00:08:15,080 --> 00:08:17,310
+form polynomial terms, but it
+都可以构造多项式来解决
+
+242
+00:08:17,460 --> 00:08:18,350
+turns out that there are much
+但是 你将逐渐发现还有
+
+243
+00:08:18,510 --> 00:08:21,150
+more powerful nonlinear quantifiers that
+更强大的非线性分类器
+
+244
+00:08:21,460 --> 00:08:23,650
+can then sort of polynomial regression.
+可以用来解决多项式回归问题
+
+245
+00:08:24,640 --> 00:08:25,780
+And in the next set
+在下一节课
+
+246
+00:08:25,810 --> 00:08:28,280
+of videos after this one, I'll start telling you about them.
+我将向大家介绍它们
+
+247
+00:08:28,510 --> 00:08:29,560
+So that you have even more
+你将学会
+
+248
+00:08:29,760 --> 00:08:30,440
+powerful learning algorithms than you have
+比你现在解决问题的方法
+
+249
+00:08:31,380 --> 00:08:32,870
+now to apply to different problems.
+强大N倍的学习算法
+
diff --git a/srt/8 - 1 - Non-linear Hypotheses (10 min).srt b/srt/8 - 1 - Non-linear Hypotheses (10 min).srt
new file mode 100644
index 00000000..840f3ca5
--- /dev/null
+++ b/srt/8 - 1 - Non-linear Hypotheses (10 min).srt
@@ -0,0 +1,1401 @@
+1
+00:00:00,440 --> 00:00:01,400
+In this and in the
+在这节课和接下来的课程中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,480 --> 00:00:02,640
+next set of videos, I'd like
+我将给大家介绍
+
+3
+00:00:02,780 --> 00:00:04,270
+to tell you about a learning
+一种叫“神经网络”(Neural Network)
+
+4
+00:00:04,550 --> 00:00:06,110
+algorithm called a Neural Network.
+的机器学习算法
+
+5
+00:00:07,190 --> 00:00:07,900
+We're going to first talk about
+我们将首先讨论
+
+6
+00:00:08,079 --> 00:00:09,330
+the representation and then
+神经网络的表层结构
+
+7
+00:00:09,600 --> 00:00:10,390
+in the next set of videos
+在后续课程中
+
+8
+00:00:10,410 --> 00:00:12,160
+talk about learning algorithms for it.
+再来具体讨论的学习算法
+
+9
+00:00:12,660 --> 00:00:14,070
+Neutral networks is actually
+神经网络实际上是一个
+
+10
+00:00:14,510 --> 00:00:15,870
+a pretty old idea, but had
+相对古老的算法
+
+11
+00:00:16,290 --> 00:00:17,680
+fallen out of favor for a while.
+并且后来沉寂了一段时间
+
+12
+00:00:18,200 --> 00:00:19,270
+But today, it is the
+不过到了现在
+
+13
+00:00:19,580 --> 00:00:20,820
+state of the art technique for
+它又成为许多机器学习问题
+
+14
+00:00:21,090 --> 00:00:22,390
+many different machine learning problems.
+的首选技术
+
+15
+00:00:23,740 --> 00:00:25,740
+So why do we need yet another learning algorithm?
+不过我们为什么还需要这个学习算法?
+
+16
+00:00:26,300 --> 00:00:28,030
+We already have linear regression and
+我们已经有线性回归和逻辑回归算法了
+
+17
+00:00:28,180 --> 00:00:31,260
+we have logistic regression, so why do we need, you know, neural networks?
+为什么还要研究神经网络?
+
+18
+00:00:32,280 --> 00:00:34,260
+In order to motivate the discussion
+为了阐述研究
+
+19
+00:00:34,790 --> 00:00:35,970
+of neural networks, let me
+神经网络算法的目的
+
+20
+00:00:36,120 --> 00:00:37,130
+start by showing you a few
+我们首先来看几个
+
+21
+00:00:37,310 --> 00:00:38,720
+examples of machine learning
+机器学习问题作为例子
+
+22
+00:00:38,930 --> 00:00:40,100
+problems where we need
+这几个问题的解决
+
+23
+00:00:40,300 --> 00:00:41,850
+to learn complex non-linear hypotheses.
+都依赖于研究复杂的非线性分类器
+
+24
+00:00:43,850 --> 00:00:45,650
+Consider a supervised learning classification
+考虑这个监督学习分类的问题
+
+25
+00:00:46,530 --> 00:00:48,440
+problem where you have a training set like this.
+我们已经有了对应的训练集
+
+26
+00:00:49,280 --> 00:00:50,530
+If you want to apply logistic
+如果利用逻辑回归算法
+
+27
+00:00:50,960 --> 00:00:52,710
+regression to this problem, one
+来解决这个问题
+
+28
+00:00:52,900 --> 00:00:54,250
+thing you could do is apply
+首先需要构造
+
+29
+00:00:54,660 --> 00:00:56,140
+logistic regression with a
+一个包含很多非线性项的
+
+30
+00:00:56,190 --> 00:00:57,720
+lot of nonlinear features like that.
+逻辑回归函数
+
+31
+00:00:58,170 --> 00:00:59,580
+So here, g as usual
+这里g仍是s型函数 (即f(x)=1/(1+e^-x) )
+
+32
+00:01:00,070 --> 00:01:01,710
+is the sigmoid function, and we
+我们能让函数
+
+33
+00:01:01,780 --> 00:01:04,680
+can include lots of polynomial terms like these.
+包含很多像这样的多项式项
+
+34
+00:01:05,450 --> 00:01:06,790
+And, if you include enough polynomial
+事实上 当多项式项数足够多时
+
+35
+00:01:07,370 --> 00:01:08,280
+terms then, you know, maybe
+那么可能
+
+36
+00:01:08,950 --> 00:01:10,280
+you can get a hypotheses
+你能够得到一个
+
+37
+00:01:11,600 --> 00:01:13,780
+that separates the positive and negative examples.
+分开正样本和负样本的分界线
+
+38
+00:01:14,630 --> 00:01:16,080
+This particular method works well
+当只有两项时
+
+39
+00:01:16,470 --> 00:01:18,400
+when you have only, say, two
+比如 x1 x2
+
+40
+00:01:18,620 --> 00:01:20,180
+features - x1 and x2
+这种方法确实能得到不错的结果
+
+41
+00:01:20,190 --> 00:01:20,980
+- because you can then include
+因为你可以
+
+42
+00:01:21,500 --> 00:01:22,880
+all those polynomial terms of
+把x1和x2的所有组合
+
+43
+00:01:23,400 --> 00:01:24,620
+x1 and x2.
+都包含到多项式中
+
+44
+00:01:24,810 --> 00:01:26,280
+But for many interesting machine learning
+但是对于许多
+
+45
+00:01:26,520 --> 00:01:27,730
+problems would have a
+复杂的机器学习问题
+
+46
+00:01:27,910 --> 00:01:29,230
+lot more features than just two.
+涉及的项往往多于两项
+
+47
+00:01:30,780 --> 00:01:31,760
+We've been talking for a while
+我们之前已经讨论过
+
+48
+00:01:32,320 --> 00:01:34,560
+about housing prediction, and suppose
+房价预测的问题
+
+49
+00:01:35,130 --> 00:01:36,990
+you have a housing classification
+假设现在要处理的是
+
+50
+00:01:38,020 --> 00:01:39,280
+problem rather than a
+关于住房的分类问题
+
+51
+00:01:39,390 --> 00:01:41,170
+regression problem, like maybe
+而不是一个回归问题
+
+52
+00:01:41,580 --> 00:01:43,350
+if you have different features of
+假设你对一栋房子的多方面特点
+
+53
+00:01:43,440 --> 00:01:44,760
+a house, and you want
+都有所了解
+
+54
+00:01:45,010 --> 00:01:46,000
+to predict what are the
+你想预测
+
+55
+00:01:46,050 --> 00:01:47,590
+odds that your house will
+房子在未来半年内
+
+56
+00:01:47,700 --> 00:01:48,710
+be sold within the next
+能被卖出去的概率
+
+57
+00:01:48,910 --> 00:01:51,040
+six months, so that will be a classification problem.
+这是一个分类问题
+
+58
+00:01:52,100 --> 00:01:53,060
+And as we saw we can
+我们可以想出
+
+59
+00:01:53,260 --> 00:01:55,130
+come up with quite a
+很多特征
+
+60
+00:01:55,260 --> 00:01:56,480
+lot of features, maybe a hundred
+对于不同的房子有可能
+
+61
+00:01:56,840 --> 00:01:58,270
+different features of different houses.
+就有上百个特征
+
+62
+00:02:00,130 --> 00:02:01,610
+For a problem like this, if
+对于这类问题
+
+63
+00:02:01,880 --> 00:02:03,260
+you were to include all the
+如果要包含
+
+64
+00:02:03,370 --> 00:02:04,980
+quadratic terms, all of
+所有的二次项
+
+65
+00:02:05,100 --> 00:02:06,260
+these, even all of the
+即使只包含
+
+66
+00:02:06,540 --> 00:02:07,540
+quadratic that is the second
+二项式或多项式的计算
+
+67
+00:02:07,930 --> 00:02:10,450
+or the polynomial terms, there would be a lot of them.
+最终的多项式也可能有很多项
+
+68
+00:02:10,560 --> 00:02:11,580
+There would be terms like x1 squared,
+比如x1^2
+
+69
+00:02:12,960 --> 00:02:17,610
+x1x2, x1x3, you know, x1x4
+x1x2 x1x3 x1x4
+
+70
+00:02:18,750 --> 00:02:21,880
+up to x1x100 and then
+直到x1x100
+
+71
+00:02:21,980 --> 00:02:23,620
+you have x2 squared, x2x3
+还有x2^2 x2x3
+
+72
+00:02:25,620 --> 00:02:25,980
+and so on.
+等等很多项
+
+73
+00:02:26,510 --> 00:02:27,770
+And if you include just
+因此
+
+74
+00:02:28,060 --> 00:02:29,200
+the second order terms, that
+即使只考虑二阶项
+
+75
+00:02:29,330 --> 00:02:30,750
+is, the terms that are
+也就是说
+
+76
+00:02:30,840 --> 00:02:32,090
+a product of, you know,
+两个项的乘积
+
+77
+00:02:32,220 --> 00:02:33,390
+two of these terms, x1
+x1乘以x1
+
+78
+00:02:33,510 --> 00:02:35,010
+times x1 and so on, then,
+等等类似于此的项
+
+79
+00:02:35,780 --> 00:02:36,920
+for the case of n equals
+那么 在n=100的情况下
+
+80
+00:02:38,180 --> 00:02:40,280
+100, you end up with about five thousand features.
+最终也有5000个二次项
+
+81
+00:02:41,890 --> 00:02:44,880
+And, asymptotically, the
+而且渐渐地
+
+82
+00:02:45,000 --> 00:02:46,330
+number of quadratic features grows
+随着特征个数n的增加
+
+83
+00:02:46,770 --> 00:02:48,670
+roughly as order n
+二次项的个数大约以n^2的量级增长
+
+84
+00:02:48,820 --> 00:02:50,330
+squared, where n is the
+其中
+
+85
+00:02:50,460 --> 00:02:52,790
+number of the original features,
+n是原始项的个数
+
+86
+00:02:53,370 --> 00:02:54,780
+like x1 through x100 that we had.
+即我们之前说过的x1到x100这些项
+
+87
+00:02:55,700 --> 00:02:58,750
+And its actually closer to n squared over two.
+事实上二次项的个数大约是(n^2)/2
+
+88
+00:02:59,920 --> 00:03:01,440
+So including all the
+因此要包含所有的
+
+89
+00:03:01,560 --> 00:03:02,920
+quadratic features doesn't seem
+二次项是很困难的
+
+90
+00:03:03,220 --> 00:03:04,220
+like it's maybe a good
+所以这可能
+
+91
+00:03:04,300 --> 00:03:05,380
+idea, because that is a
+不是一个好的做法
+
+92
+00:03:05,580 --> 00:03:07,050
+lot of features and you
+而且由于项数过多
+
+93
+00:03:07,220 --> 00:03:08,920
+might up overfitting the training
+最后的结果很有可能是过拟合的
+
+94
+00:03:09,330 --> 00:03:10,500
+set, and it can
+此外
+
+95
+00:03:10,740 --> 00:03:12,800
+also be computationally expensive, you know, to
+在处理这么多项时
+
+96
+00:03:14,080 --> 00:03:15,120
+be working with that many features.
+也存在运算量过大的问题
+
+97
+00:03:16,450 --> 00:03:17,540
+One thing you could do is
+当然 你也可以试试
+
+98
+00:03:17,770 --> 00:03:19,090
+include only a subset of
+只包含上边这些二次项的子集
+
+99
+00:03:19,290 --> 00:03:20,950
+these, so if you include only the
+例如 我们只考虑
+
+100
+00:03:21,050 --> 00:03:22,630
+features x1 squared, x2 squared,
+x1^2 x2^2
+
+101
+00:03:23,590 --> 00:03:25,180
+x3 squared, up to
+x3^2直到
+
+102
+00:03:25,580 --> 00:03:27,750
+maybe x100 squared, then
+x100^2 这些项
+
+103
+00:03:28,100 --> 00:03:29,500
+the number of features is much smaller.
+这样就可以将二次项的数量大幅度减少
+
+104
+00:03:29,980 --> 00:03:31,720
+Here you have only 100 such
+减少到只有100个二次项
+
+105
+00:03:32,070 --> 00:03:33,850
+quadratic features, but this
+但是由于
+
+106
+00:03:34,120 --> 00:03:35,950
+is not enough features and
+忽略了太多相关项
+
+107
+00:03:36,100 --> 00:03:37,170
+certainly won't let you fit
+在处理类似左上角的数据时
+
+108
+00:03:37,290 --> 00:03:39,330
+the data set like that on the upper left.
+不可能得到理想的结果
+
+109
+00:03:39,570 --> 00:03:40,550
+In fact, if you include
+实际上
+
+110
+00:03:41,040 --> 00:03:42,720
+only these quadratic features together
+如果只考虑x1的平方
+
+111
+00:03:43,170 --> 00:03:44,870
+with the original x1, and
+到x100的平方
+
+112
+00:03:45,350 --> 00:03:46,500
+so on, up to x100 features,
+这一百个二次项
+
+113
+00:03:47,460 --> 00:03:48,530
+then you can actually fit very
+那么你可能会
+
+114
+00:03:48,910 --> 00:03:50,210
+interesting hypotheses. So, you
+拟合出一些特别的假设
+
+115
+00:03:50,330 --> 00:03:52,350
+can fit things like, you know, access a
+比如可能拟合出
+
+116
+00:03:52,490 --> 00:03:53,860
+line of the ellipses like these, but
+一个椭圆状的曲线
+
+117
+00:03:55,080 --> 00:03:56,240
+you certainly cannot fit a more
+但是肯定不能拟合出
+
+118
+00:03:56,340 --> 00:03:57,930
+complex data set like that shown here.
+像左上角这个数据集的分界线
+
+119
+00:03:59,360 --> 00:04:00,530
+So 5000 features seems like
+所以5000个二次项看起来已经很多了
+
+120
+00:04:00,620 --> 00:04:03,090
+a lot, if you were
+而现在假设
+
+121
+00:04:03,230 --> 00:04:04,860
+to include the cubic, or
+包括三次项
+
+122
+00:04:05,140 --> 00:04:06,100
+third order known of each others,
+或者三阶项
+
+123
+00:04:06,440 --> 00:04:08,050
+the x1, x2, x3.
+例如x1 x2 x3
+
+124
+00:04:08,400 --> 00:04:09,800
+You know, x1 squared,
+x1^2
+
+125
+00:04:10,310 --> 00:04:12,240
+x2, x10 and
+x2 x10
+
+126
+00:04:12,900 --> 00:04:15,280
+x11, x17 and so on.
+x11 x17等等
+
+127
+00:04:15,700 --> 00:04:18,110
+You can imagine there are gonna be a lot of these features.
+类似的三次项有很多很多
+
+128
+00:04:19,040 --> 00:04:19,770
+In fact, they are going to be
+事实上
+
+129
+00:04:20,050 --> 00:04:21,260
+order and cube such features
+三次项的个数是以n^3的量级增加
+
+130
+00:04:22,210 --> 00:04:23,830
+and if any is 100
+当n=100时
+
+131
+00:04:24,150 --> 00:04:25,660
+you can compute that, you
+可以计算出来
+
+132
+00:04:25,740 --> 00:04:26,870
+end up with on the order
+最后能得到
+
+133
+00:04:27,730 --> 00:04:29,650
+of about 170,000 such cubic
+大概17000个三次项
+
+134
+00:04:30,040 --> 00:04:31,670
+features and so including
+所以
+
+135
+00:04:32,260 --> 00:04:34,470
+these higher auto-polynomial features when
+当初始特征个数n增大时
+
+136
+00:04:34,920 --> 00:04:36,050
+your original feature set end
+这些高阶多项式项数
+
+137
+00:04:36,230 --> 00:04:37,730
+is large this really dramatically
+将以几何级数递增
+
+138
+00:04:38,530 --> 00:04:40,440
+blows up your feature space and
+特征空间也随之急剧膨胀
+
+139
+00:04:41,070 --> 00:04:42,180
+this doesn't seem like a
+当特征个数n很大时
+
+140
+00:04:42,320 --> 00:04:43,320
+good way to come up with
+如果找出附加项
+
+141
+00:04:43,560 --> 00:04:45,050
+additional features with which
+来建立一些分类器
+
+142
+00:04:45,240 --> 00:04:48,100
+to build none many classifiers when n is large.
+这并不是一个好做法
+
+143
+00:04:49,590 --> 00:04:52,560
+For many machine learning problems, n will be pretty large.
+对于许多实际的机器学习问题 特征个数n是很大的
+
+144
+00:04:53,270 --> 00:04:53,560
+Here's an example.
+我们看看下边这个例子
+
+145
+00:04:55,000 --> 00:04:58,140
+Let's consider the problem of computer vision.
+关于计算机视觉中的一个问题
+
+146
+00:04:59,670 --> 00:05:00,770
+And suppose you want to
+假设你想要
+
+147
+00:05:01,260 --> 00:05:02,620
+use machine learning to train
+使用机器学习算法
+
+148
+00:05:02,710 --> 00:05:04,610
+a classifier to examine an
+来训练一个分类器
+
+149
+00:05:04,710 --> 00:05:05,880
+image and tell us whether
+使它检测一个图像
+
+150
+00:05:06,160 --> 00:05:08,030
+or not the image is a car.
+来判断图像是否为一辆汽车
+
+151
+00:05:09,480 --> 00:05:11,900
+Many people wonder why computer vision could be difficult.
+很多人可能会好奇 这对计算机视觉来说有什么难的
+
+152
+00:05:12,390 --> 00:05:13,140
+I mean when you and I
+当我们自己看这幅图像时
+
+153
+00:05:13,270 --> 00:05:15,670
+look at this picture it is so obvious what this is.
+里面有什么是一目了然的事情
+
+154
+00:05:15,900 --> 00:05:17,000
+You wonder how is it
+你肯定会很奇怪
+
+155
+00:05:17,190 --> 00:05:18,320
+that a learning algorithm could possibly
+为什么学习算法竟可能会不知道
+
+156
+00:05:18,910 --> 00:05:20,880
+fail to know what this picture is.
+图像是什么
+
+157
+00:05:22,110 --> 00:05:23,330
+To understand why computer vision
+为了解答这个疑问
+
+158
+00:05:23,720 --> 00:05:25,380
+is hard let's zoom
+我们取出这幅图片中的一小部分
+
+159
+00:05:25,650 --> 00:05:26,690
+into a small part of the
+将其放大
+
+160
+00:05:26,940 --> 00:05:28,180
+image like that area where the
+比如图中
+
+161
+00:05:28,510 --> 00:05:30,240
+little red rectangle is.
+这个红色方框内的部分
+
+162
+00:05:30,400 --> 00:05:31,330
+It turns out that where you
+结果表明
+
+163
+00:05:31,450 --> 00:05:34,270
+and I see a car, the computer sees that.
+当人眼看到一辆汽车时 计算机实际上看到的却是这个
+
+164
+00:05:34,780 --> 00:05:35,930
+What it sees is this matrix,
+一个数据矩阵
+
+165
+00:05:36,600 --> 00:05:38,110
+or this grid, of pixel
+或像这种格网
+
+166
+00:05:38,580 --> 00:05:40,350
+intensity values that tells
+它们表示了像素强度值
+
+167
+00:05:40,610 --> 00:05:42,930
+us the brightness of each pixel in the image.
+告诉我们图像中每个像素的亮度值
+
+168
+00:05:43,640 --> 00:05:45,170
+So the computer vision problem is
+因此 对于计算机视觉来说问题就变成了
+
+169
+00:05:45,550 --> 00:05:47,230
+to look at this matrix of
+根据这个像素点亮度矩阵
+
+170
+00:05:47,310 --> 00:05:49,140
+pixel intensity values, and tell
+来告诉我们
+
+171
+00:05:49,410 --> 00:05:52,440
+us that these numbers represent the door handle of a car.
+这些数值代表一个汽车门把手
+
+172
+00:05:54,230 --> 00:05:55,740
+Concretely, when we use
+具体而言
+
+173
+00:05:56,030 --> 00:05:57,220
+machine learning to build a
+当用机器学习算法构造
+
+174
+00:05:57,430 --> 00:05:59,060
+car detector, what we do
+一个汽车识别器时
+
+175
+00:05:59,360 --> 00:06:00,510
+is we come up with a
+我们要想出
+
+176
+00:06:00,800 --> 00:06:02,690
+label training set, with, let's
+一个带标签的样本集
+
+177
+00:06:02,890 --> 00:06:04,250
+say, a few label examples
+其中一些样本
+
+178
+00:06:04,730 --> 00:06:05,850
+of cars and a few
+是各类汽车
+
+179
+00:06:06,000 --> 00:06:07,150
+label examples of things that
+另一部分样本
+
+180
+00:06:07,380 --> 00:06:08,780
+are not cars, then we
+是其他任何东西
+
+181
+00:06:09,090 --> 00:06:10,590
+give our training set to
+将这个样本集输入给学习算法
+
+182
+00:06:10,720 --> 00:06:12,230
+the learning algorithm trained a
+以训练出一个分类器
+
+183
+00:06:12,310 --> 00:06:13,500
+classifier and then, you
+训练完毕后
+
+184
+00:06:13,680 --> 00:06:14,700
+know, we may test it and show
+我们输入一幅新的图片
+
+185
+00:06:14,890 --> 00:06:16,710
+the new image and ask, "What is this new thing?".
+让分类器判定 “这是什么东西?”
+
+186
+00:06:17,980 --> 00:06:20,030
+And hopefully it will recognize that that is a car.
+理想情况下 分类器能识别出这是一辆汽车
+
+187
+00:06:21,410 --> 00:06:24,000
+To understand why we
+为了理解引入
+
+188
+00:06:24,120 --> 00:06:26,810
+need nonlinear hypotheses, let's take
+非线性分类器的必要性
+
+189
+00:06:27,050 --> 00:06:27,940
+a look at some of the
+我们从学习算法的训练样本中
+
+190
+00:06:28,190 --> 00:06:29,360
+images of cars and maybe
+挑出一些汽车图片
+
+191
+00:06:29,480 --> 00:06:31,780
+non-cars that we might feed to our learning algorithm.
+和一些非汽车图片
+
+192
+00:06:32,960 --> 00:06:33,920
+Let's pick a couple of pixel
+让我们从其中
+
+193
+00:06:34,090 --> 00:06:35,630
+locations in our images, so
+每幅图片中挑出一组像素点
+
+194
+00:06:35,750 --> 00:06:37,040
+that's pixel one location and
+这是像素点1的位置
+
+195
+00:06:37,180 --> 00:06:39,500
+pixel two location, and let's
+这是像素点2的位置
+
+196
+00:06:39,730 --> 00:06:42,390
+plot this car, you know, at the
+在坐标系中标出这幅汽车的位置
+
+197
+00:06:42,510 --> 00:06:44,010
+location, at a certain
+在某一点上
+
+198
+00:06:44,360 --> 00:06:45,890
+point, depending on the intensities
+车的位置取决于
+
+199
+00:06:46,430 --> 00:06:47,870
+of pixel one and pixel two.
+像素点1和像素点2的亮度
+
+200
+00:06:49,260 --> 00:06:50,630
+And let's do this with a few other images.
+让我们用同样的方法标出其他图片中汽车的位置
+
+201
+00:06:51,060 --> 00:06:52,450
+So let's take a different example
+然后我们再举一个
+
+202
+00:06:52,980 --> 00:06:53,980
+of the car and you know,
+关于汽车的不同的例子
+
+203
+00:06:54,080 --> 00:06:55,010
+look at the same two pixel locations
+观察这两个相同的像素位置
+
+204
+00:06:56,160 --> 00:06:57,570
+and that image has a
+这幅图片中
+
+205
+00:06:57,770 --> 00:06:58,970
+different intensity for pixel one
+像素1有一个像素强度
+
+206
+00:06:59,230 --> 00:07:00,660
+and a different intensity for pixel two.
+像素2也有一个不同的像素强度
+
+207
+00:07:00,960 --> 00:07:02,940
+So, it ends up at a different location on the figure.
+所以在这幅图中它们两个处于不同的位置
+
+208
+00:07:03,360 --> 00:07:05,740
+And then let's plot some negative examples as well.
+我们继续画上两个非汽车样本
+
+209
+00:07:05,990 --> 00:07:07,590
+That's a non-car, that's a
+这个不是汽车
+
+210
+00:07:07,720 --> 00:07:09,470
+non-car .
+这个也不是汽车
+
+211
+00:07:09,730 --> 00:07:10,910
+And if we do this for
+然后我们继续
+
+212
+00:07:11,070 --> 00:07:12,720
+more and more examples using
+在坐标系中画上更多的新样本
+
+213
+00:07:13,280 --> 00:07:14,680
+the pluses to denote cars
+用''+"表示汽车图片
+
+214
+00:07:15,080 --> 00:07:16,310
+and minuses to denote non-cars,
+用“-”表示非汽车图片
+
+215
+00:07:16,890 --> 00:07:18,500
+what we'll find is that
+我们将发现
+
+216
+00:07:18,830 --> 00:07:20,680
+the cars and non-cars end up
+汽车样本和非汽车样本
+
+217
+00:07:20,890 --> 00:07:22,430
+lying in different regions of
+分布在坐标系中的不同区域
+
+218
+00:07:22,570 --> 00:07:24,910
+the space, and what we
+因此
+
+219
+00:07:25,180 --> 00:07:26,570
+need therefore is some sort
+我们现在需要一个
+
+220
+00:07:26,750 --> 00:07:28,790
+of non-linear hypotheses to try
+非线性分类器
+
+221
+00:07:29,000 --> 00:07:30,900
+to separate out the two classes.
+来尽量分开这两类样本
+
+222
+00:07:32,480 --> 00:07:34,300
+What is the dimension of the feature space?
+这个分类问题中特征空间的维数是多少?
+
+223
+00:07:35,290 --> 00:07:38,210
+Suppose we were to use just 50 by 50 pixel images.
+假设我们用50*50像素的图片
+
+224
+00:07:38,770 --> 00:07:40,050
+Now that suppose our images were
+我们的图片已经很小了
+
+225
+00:07:40,520 --> 00:07:42,760
+pretty small ones, just 50 pixels on the side.
+长宽只各有50个像素
+
+226
+00:07:43,470 --> 00:07:44,990
+Then we would have 2500 pixels,
+但这依然是2500个像素点
+
+227
+00:07:46,330 --> 00:07:47,650
+and so the dimension of
+因此
+
+228
+00:07:47,740 --> 00:07:49,310
+our feature size will be N
+我们的特征向量的元素数量
+
+229
+00:07:49,520 --> 00:07:51,450
+equals 2500 where our feature
+N=2500
+
+230
+00:07:51,700 --> 00:07:52,910
+vector x is a list
+特征向量X
+
+231
+00:07:53,180 --> 00:07:54,570
+of all the pixel testings, you
+包含了所有像素点的亮度值
+
+232
+00:07:54,710 --> 00:07:56,690
+know, the pixel brightness of pixel
+这是像素点1的亮度
+
+233
+00:07:57,080 --> 00:07:58,030
+one, the brightness of pixel
+这是像素点2的亮度
+
+234
+00:07:58,330 --> 00:07:59,580
+two, and so on down
+如此类推
+
+235
+00:07:59,870 --> 00:08:01,310
+to the pixel brightness of the
+直到最后一个
+
+236
+00:08:01,400 --> 00:08:03,420
+last pixel where, you know, in a
+像素点的亮度
+
+237
+00:08:03,590 --> 00:08:05,450
+typical computer representation, each of
+对于典型的计算机图片表示方法
+
+238
+00:08:05,540 --> 00:08:07,190
+these may be values between say
+如果存储的是每个像素点的灰度值 (色彩的强烈程度)
+
+239
+00:08:07,480 --> 00:08:09,020
+0 to 255 if it gives
+那么每个元素的值
+
+240
+00:08:09,230 --> 00:08:12,110
+us the grayscale value.
+应该在0到255之间
+
+241
+00:08:12,520 --> 00:08:13,290
+So we have n equals 2500,
+因此 这个问题中n=2500
+
+242
+00:08:13,950 --> 00:08:15,580
+and that's if we
+但是
+
+243
+00:08:15,740 --> 00:08:17,140
+were using grayscale images.
+这只是使用灰度图片的情况
+
+244
+00:08:17,790 --> 00:08:18,800
+If we were using RGB
+如果我们用的是RGB彩色图像
+
+245
+00:08:19,440 --> 00:08:21,140
+images with separate red, green
+每个像素点包含红、绿、蓝三个子像素
+
+246
+00:08:21,420 --> 00:08:23,870
+and blue values, we would have n equals 7500.
+那么n=7500
+
+247
+00:08:27,650 --> 00:08:28,630
+So, if we were to
+因此 如果我们非要
+
+248
+00:08:29,000 --> 00:08:29,920
+try to learn a nonlinear
+通过包含所有的二次项
+
+249
+00:08:30,370 --> 00:08:32,020
+hypothesis by including all
+来解决这个非线性问题
+
+250
+00:08:32,300 --> 00:08:33,710
+the quadratic features, that is
+那么
+
+251
+00:08:33,810 --> 00:08:34,680
+all the terms of the form, you know,
+这就是式子中的所有条件
+
+252
+00:08:35,430 --> 00:08:38,900
+Xi times Xj, while with the
+xi*xj
+
+253
+00:08:39,130 --> 00:08:40,370
+2500 pixels we would end
+连同开始的2500像素
+
+254
+00:08:40,580 --> 00:08:42,500
+up with a total of three million features.
+总共大约有300万个
+
+255
+00:08:43,050 --> 00:08:44,620
+And that's just too large to
+这数字大得有点离谱了
+
+256
+00:08:44,720 --> 00:08:46,430
+be reasonable; the computation would
+对于每个样本来说
+
+257
+00:08:46,600 --> 00:08:48,680
+be very expensive to find and
+要发现并表示
+
+258
+00:08:48,840 --> 00:08:50,070
+to represent all of these
+所有这300万个项
+
+259
+00:08:50,310 --> 00:08:52,250
+three million features per training example.
+这计算成本太高了
+
+260
+00:08:55,470 --> 00:08:57,580
+So, simple logistic regression together
+因此 只是简单的增加
+
+261
+00:08:58,100 --> 00:08:59,230
+with adding in maybe the
+二次项或者三次项
+
+262
+00:08:59,300 --> 00:09:00,510
+quadratic or the cubic features
+之类的逻辑回归算法
+
+263
+00:09:00,930 --> 00:09:01,910
+- that's just not a
+并不是一个解决
+
+264
+00:09:01,980 --> 00:09:03,950
+good way to learn complex
+复杂非线性问题的好办法
+
+265
+00:09:04,550 --> 00:09:06,090
+nonlinear hypotheses when n
+因为当n很大时
+
+266
+00:09:06,310 --> 00:09:08,410
+is large because you just end up with too many features.
+将会产生非常多的特征项
+
+267
+00:09:09,370 --> 00:09:10,620
+In the next few videos, I
+在接下来的视频课程中
+
+268
+00:09:10,840 --> 00:09:11,890
+would like to tell you about Neural
+我将为大家讲解神经网络
+
+269
+00:09:12,080 --> 00:09:13,670
+Networks, which turns out
+它在解决复杂的非线性分类问题上
+
+270
+00:09:13,980 --> 00:09:15,370
+to be a much better way to
+被证明是
+
+271
+00:09:15,650 --> 00:09:17,720
+learn complex hypotheses, complex nonlinear
+是一种好得多的算法
+
+272
+00:09:17,960 --> 00:09:19,780
+hypotheses even when your
+即使你输入特征空间
+
+273
+00:09:20,070 --> 00:09:22,080
+input feature space, even when n is large.
+或输入的特征维数n很大也能轻松搞定
+
+274
+00:09:22,860 --> 00:09:24,080
+And along the way I'll
+在后面的课程中
+
+275
+00:09:24,420 --> 00:09:25,580
+also get to show
+我将给大家展示
+
+276
+00:09:25,780 --> 00:09:26,730
+you a couple of fun videos
+一些有趣的视频
+
+277
+00:09:27,240 --> 00:09:29,030
+of historically important applications
+视频中讲述了神经网络在历史上的重要应用
+
+278
+00:09:30,300 --> 00:09:31,290
+of Neural networks as well that I
+我也希望
+
+279
+00:09:32,100 --> 00:09:33,480
+hope those videos that
+这些我们即将看到的视频
+
+280
+00:09:33,570 --> 00:09:35,460
+we'll see later will be fun for you to watch as well.
+能给你的学习过程带来一些乐趣 ^ ^
+
diff --git a/srt/8 - 2 - Neurons and the Brain (8 min).srt b/srt/8 - 2 - Neurons and the Brain (8 min).srt
new file mode 100644
index 00000000..c8219ea8
--- /dev/null
+++ b/srt/8 - 2 - Neurons and the Brain (8 min).srt
@@ -0,0 +1,1196 @@
+1
+00:00:00,170 --> 00:00:01,720
+Neural Networks are a pretty old
+神经网络是一种很古老的算法
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,070 --> 00:00:03,950
+algorithm that was originally motivated
+它最初产生的目的是
+
+3
+00:00:05,030 --> 00:00:07,330
+by the goal of having machines that can mimic the brain.
+制造能模拟大脑的机器
+
+4
+00:00:08,260 --> 00:00:09,330
+Now in this class, of course
+在这门课中
+
+5
+00:00:09,620 --> 00:00:11,000
+I'm teaching Neural Networks to you
+我将向你们介绍神经网络
+
+6
+00:00:11,170 --> 00:00:12,170
+because they work really well
+因为它能很好地解决
+
+7
+00:00:12,460 --> 00:00:14,070
+for different machine learning problems and
+不同的机器学习问题
+
+8
+00:00:14,260 --> 00:00:16,910
+not, certainly not because they're logically motivated.
+而不只因为它们在逻辑上行得通
+
+9
+00:00:18,050 --> 00:00:19,260
+In this video, I'd like
+在这段视频中 我想
+
+10
+00:00:19,440 --> 00:00:21,640
+to give you some of the background on Neural Networks.
+告诉你们一些神经网络的背景知识
+
+11
+00:00:22,510 --> 00:00:25,430
+So that we can get a sense of what we can expect them to do.
+由此我们能知道可以用它们来做什么
+
+12
+00:00:26,200 --> 00:00:27,170
+Both in the sense of
+不管是将其应用到
+
+13
+00:00:27,330 --> 00:00:28,320
+applying them to modern day
+现代的机器学习问题上
+
+14
+00:00:28,440 --> 00:00:30,470
+machinery problems, as well as for
+还是应用到
+
+15
+00:00:30,650 --> 00:00:32,000
+those of you that might
+那些你可能会
+
+16
+00:00:32,190 --> 00:00:33,870
+be interested in maybe the
+感兴趣的问题中 也许
+
+17
+00:00:34,000 --> 00:00:35,130
+big AI dream of someday
+这一伟大的人工智能梦想在未来
+
+18
+00:00:35,710 --> 00:00:36,680
+building truly intelligent machines.
+能制造出真正的智能机器
+
+19
+00:00:37,790 --> 00:00:40,650
+Also, how Neural Networks might pertain to that.
+另外 我们还将讲解神经网络是怎么涉及这些问题的
+
+20
+00:00:42,400 --> 00:00:44,260
+The origins of Neural Networks was
+神经网络产生的原因
+
+21
+00:00:44,900 --> 00:00:46,320
+as algorithms that try
+是人们想尝试设计出
+
+22
+00:00:46,600 --> 00:00:47,880
+to mimic the brain and
+模仿大脑的算法
+
+23
+00:00:47,900 --> 00:00:49,020
+those a sense that if we
+从某种意义上说如果我们
+
+24
+00:00:49,140 --> 00:00:50,750
+want to build learning systems
+想要建立学习系统
+
+25
+00:00:51,310 --> 00:00:53,110
+while why not mimic perhaps the
+那为什么不去模仿
+
+26
+00:00:53,180 --> 00:00:54,960
+most amazing learning machine we
+我们所认识的 最神奇的学习机器——
+
+27
+00:00:55,080 --> 00:00:56,070
+know about, which is perhaps the
+人类的大脑呢
+
+28
+00:00:56,840 --> 00:00:58,710
+brain. Neural Networks came to
+神经网络逐渐兴起于
+
+29
+00:00:58,820 --> 00:01:00,720
+be very widely used throughout the
+二十世纪八九十年代
+
+30
+00:01:00,960 --> 00:01:03,260
+1980's and 1990's and
+应用得非常广泛
+
+31
+00:01:03,750 --> 00:01:04,840
+for various reasons as popularity
+但由于各种原因
+
+32
+00:01:05,640 --> 00:01:06,680
+diminished in the late
+在90年代的后期应用减少了
+
+33
+00:01:06,890 --> 00:01:08,410
+90's. But more recently,
+但是最近
+
+34
+00:01:09,170 --> 00:01:10,520
+Neural Networks have had a
+神经网络
+
+35
+00:01:10,770 --> 00:01:12,060
+major recent resurgence.
+又东山再起了
+
+36
+00:01:13,350 --> 00:01:14,530
+One of the reasons for this
+其中一个原因是
+
+37
+00:01:14,770 --> 00:01:16,640
+resurgence is that Neural Networks
+神经网络
+
+38
+00:01:17,540 --> 00:01:19,010
+are computationally some what
+是计算量有些
+
+39
+00:01:19,130 --> 00:01:20,590
+more expensive algorithm and
+偏大的算法
+
+40
+00:01:20,960 --> 00:01:22,110
+so, it was only, you know,
+然而
+
+41
+00:01:22,290 --> 00:01:23,830
+maybe somewhat more recently that
+大概由于近些年
+
+42
+00:01:24,440 --> 00:01:26,190
+computers became fast enough
+计算机的运行速度变快
+
+43
+00:01:26,510 --> 00:01:27,540
+to really run large scale
+才足以真正运行起大规模的
+
+44
+00:01:27,900 --> 00:01:29,610
+Neural Networks and because of
+神经网络
+
+45
+00:01:29,690 --> 00:01:30,950
+that as well as a
+正是由于这个原因
+
+46
+00:01:30,980 --> 00:01:32,940
+few other technical reasons which we'll
+和其他一些我们后面会讨论到的
+
+47
+00:01:33,080 --> 00:01:34,840
+talk about later, modern Neural
+技术因素
+
+48
+00:01:35,110 --> 00:01:36,390
+Networks today are the state
+如今的神经网络
+
+49
+00:01:36,620 --> 00:01:38,620
+of the art technique for many applications.
+对于许多应用来说是最先进的技术
+
+50
+00:01:39,820 --> 00:01:41,000
+So, when you think about mimicking
+当你想模拟大脑时
+
+51
+00:01:41,440 --> 00:01:42,600
+the brain while one of
+是指想制造出与人类大脑
+
+52
+00:01:42,630 --> 00:01:44,860
+the human brain does tell me same things, right?
+作用效果相同的机器 对吧?
+
+53
+00:01:45,030 --> 00:01:46,660
+The brain can learn to
+大脑可以学会去
+
+54
+00:01:46,750 --> 00:01:48,170
+see process images than to
+以看而不是听的方式处理图像
+
+55
+00:01:48,400 --> 00:01:50,330
+hear, learn to process our sense of touch.
+学会处理我们的触觉
+
+56
+00:01:50,570 --> 00:01:51,360
+We can, you know, learn to
+我们能学习数学
+
+57
+00:01:51,520 --> 00:01:52,380
+do math, learn to do
+学着做微积分
+
+58
+00:01:52,710 --> 00:01:53,970
+calculus, and the brain
+而且大脑能处理
+
+59
+00:01:54,110 --> 00:01:55,560
+does so many different and amazing things.
+各种不同的令人惊奇的事情
+
+60
+00:01:55,930 --> 00:01:56,730
+It seems like if you want
+似乎如果你想要
+
+61
+00:01:57,000 --> 00:01:58,280
+to mimic the brain it seems
+模仿它
+
+62
+00:01:58,410 --> 00:01:59,630
+like you have to write lots of different
+你得写很多不同的
+
+63
+00:01:59,960 --> 00:02:01,350
+pieces of software to mimic all
+软件来模拟所有
+
+64
+00:02:01,620 --> 00:02:03,540
+of these different fascinating, amazing things
+大脑告诉我们的这些
+
+65
+00:02:03,820 --> 00:02:05,580
+that the brain tell us, but does
+五花八门的奇妙的事情
+
+66
+00:02:05,820 --> 00:02:07,780
+is this fascinating hypothesis that the
+不过能不能假设
+
+67
+00:02:08,090 --> 00:02:09,100
+way the brain does all of
+大脑做所有这些
+
+68
+00:02:09,170 --> 00:02:10,410
+these different things is not
+不同事情的方法
+
+69
+00:02:10,780 --> 00:02:12,080
+worth like a thousand different programs,
+不需要用上千个不同的程序去实现
+
+70
+00:02:13,070 --> 00:02:14,810
+but instead, the way the
+相反的 大脑处理的方法
+
+71
+00:02:14,940 --> 00:02:16,020
+brain does it is worth
+只需要
+
+72
+00:02:16,440 --> 00:02:18,890
+just a single learning algorithm.
+一个单一的学习算法就可以了?
+
+73
+00:02:19,310 --> 00:02:20,840
+This is just a hypothesis but
+尽管这只是一个假设
+
+74
+00:02:21,080 --> 00:02:22,240
+let me share with you
+不过让我和你分享
+
+75
+00:02:22,660 --> 00:02:24,440
+some of the evidence for this.
+一些这方面的证据
+
+76
+00:02:24,750 --> 00:02:25,840
+This part of the brain, that little
+大脑的这一部分
+
+77
+00:02:26,060 --> 00:02:27,270
+red part of the brain, is
+这一小片红色区域
+
+78
+00:02:27,520 --> 00:02:29,150
+your auditory cortex and
+是你的听觉皮层
+
+79
+00:02:29,240 --> 00:02:30,620
+the way you're understanding my voice
+你现在正在理解我的话
+
+80
+00:02:30,990 --> 00:02:33,340
+now is your ear is
+这靠的是耳朵
+
+81
+00:02:33,500 --> 00:02:34,940
+taking the sound signal and routing
+耳朵接收到声音信号
+
+82
+00:02:35,230 --> 00:02:36,940
+the sound signal to your auditory
+并把声音信号传递给你的
+
+83
+00:02:36,980 --> 00:02:38,180
+cortex and that's what's
+听觉皮层 正因如此
+
+84
+00:02:38,430 --> 00:02:40,100
+allowing you to understand my words.
+你才能明白我的话
+
+85
+00:02:41,330 --> 00:02:42,590
+Neuroscientists have done the
+神经系统科学家做了
+
+86
+00:02:42,750 --> 00:02:44,560
+following fascinating experiments where
+下面这个有趣的实验
+
+87
+00:02:44,790 --> 00:02:46,300
+you cut the wire from
+把耳朵到
+
+88
+00:02:46,510 --> 00:02:47,440
+the ears to the auditory
+听觉皮层的神经切断
+
+89
+00:02:47,630 --> 00:02:49,100
+cortex and you re-wire,
+在这种情况下
+
+90
+00:02:50,140 --> 00:02:52,010
+in this case an animal's brain, so
+将其重新接到一个动物的大脑上
+
+91
+00:02:52,200 --> 00:02:53,310
+that the signal from the eyes
+这样从眼睛到
+
+92
+00:02:53,620 --> 00:02:56,890
+to the optic nerve eventually gets routed to the auditory cortex.
+视神经的信号最终将传到听觉皮层
+
+93
+00:02:58,040 --> 00:02:59,140
+If you do this it turns out,
+如果这样做了 那么结果表明
+
+94
+00:02:59,350 --> 00:03:00,620
+the auditory cortex will learn
+听觉皮层将会
+
+95
+00:03:02,130 --> 00:03:02,400
+to see.
+学会“看”
+
+96
+00:03:02,730 --> 00:03:04,000
+And this is in every single sense
+这里“看”代表了
+
+97
+00:03:04,430 --> 00:03:06,270
+of the word see as we know it.
+我们所知道的每层含义
+
+98
+00:03:06,390 --> 00:03:07,470
+So, if you do this to the animals,
+所以 如果你对动物这样做
+
+99
+00:03:07,770 --> 00:03:09,790
+the animals can perform visual discrimination
+那么动物就可以完成视觉辨别任务
+
+100
+00:03:10,310 --> 00:03:12,260
+task and as they can
+它们可以
+
+101
+00:03:12,460 --> 00:03:13,570
+look at images and make appropriate
+看图像 并根据图像
+
+102
+00:03:14,100 --> 00:03:15,190
+decisions based on the
+做出适当的决定
+
+103
+00:03:15,460 --> 00:03:16,460
+images and they're doing
+它们正是通过
+
+104
+00:03:16,780 --> 00:03:18,300
+it with that piece of brain tissue.
+脑组织中的这个部分完成的
+
+105
+00:03:19,590 --> 00:03:20,150
+Here's another example.
+下面再举另一个例子
+
+106
+00:03:21,270 --> 00:03:23,430
+That red piece of brain tissue is your somatosensory cortex.
+这块红色的脑组织是你的躯体感觉皮层
+
+107
+00:03:24,070 --> 00:03:26,680
+That's how you process your sense of touch.
+这是你用来处理触觉的
+
+108
+00:03:27,440 --> 00:03:29,020
+If you do a similar re-wiring process
+如果你做一个和刚才类似的重接实验
+
+109
+00:03:30,070 --> 00:03:32,740
+then the somatosensory cortex will learn to see.
+那么躯体感觉皮层也能学会”看“
+
+110
+00:03:33,370 --> 00:03:34,710
+Because of this and other
+这个实验和其它一些
+
+111
+00:03:35,150 --> 00:03:36,670
+similar experiments, these are
+类似的实验
+
+112
+00:03:36,760 --> 00:03:38,200
+called neuro-rewiring experiments.
+被称为神经重接实验
+
+113
+00:03:39,470 --> 00:03:40,550
+There's this sense that if
+从这个意义上说 如果
+
+114
+00:03:40,670 --> 00:03:41,670
+the same piece of physical
+人体有同一块
+
+115
+00:03:42,180 --> 00:03:44,020
+brain tissue can process sight
+脑组织可以处理光、
+
+116
+00:03:44,500 --> 00:03:45,970
+or sound or touch then
+声或触觉信号
+
+117
+00:03:46,190 --> 00:03:47,480
+maybe there is one learning
+那么也许存在一种学习算法
+
+118
+00:03:47,780 --> 00:03:48,870
+algorithm that can process
+可以同时处理
+
+119
+00:03:49,280 --> 00:03:50,520
+sight or sound or touch.
+视觉、听觉和触觉
+
+120
+00:03:51,450 --> 00:03:52,660
+And instead of needing to
+而不是需要
+
+121
+00:03:52,840 --> 00:03:54,530
+implement a thousand different programs
+运行上千个不同的程序
+
+122
+00:03:55,120 --> 00:03:56,520
+or a thousand different algorithms to
+或者上千个不同的算法来做这些
+
+123
+00:03:56,660 --> 00:03:58,430
+do, you know, the thousand wonderful things
+大脑所完成的
+
+124
+00:03:58,780 --> 00:04:00,510
+that the brain does, maybe what
+成千上万的美好事情
+
+125
+00:04:00,670 --> 00:04:01,980
+we need to do is figure out
+也许我们需要做的就是找出
+
+126
+00:04:02,220 --> 00:04:04,900
+some approximation or to whatever
+一些近似的或
+
+127
+00:04:05,160 --> 00:04:07,220
+the brain's learning algorithm is and
+实际的大脑学习算法
+
+128
+00:04:07,340 --> 00:04:08,470
+implement that and that the
+然后实现它
+
+129
+00:04:08,690 --> 00:04:10,130
+brain learned by itself how to
+大脑通过自学掌握如何
+
+130
+00:04:10,240 --> 00:04:11,860
+process these different types of data.
+处理这些不同类型的数据
+
+131
+00:04:13,000 --> 00:04:14,180
+To a surprisingly large extent,
+在很大的程度上
+
+132
+00:04:14,640 --> 00:04:15,730
+it seems as if we can
+可以猜想如果我们
+
+133
+00:04:16,270 --> 00:04:17,440
+plug in almost any sensor
+把几乎任何一种传感器
+
+134
+00:04:17,890 --> 00:04:19,020
+to almost any part of
+接入到大脑的
+
+135
+00:04:19,080 --> 00:04:21,030
+the brain and so, within the
+几乎任何一个部位的话
+
+136
+00:04:21,100 --> 00:04:22,990
+reason, the brain will learn to deal with it.
+大脑就会学会处理它
+
+137
+00:04:25,730 --> 00:04:26,470
+Here are a few more examples.
+下面再举几个例子
+
+138
+00:04:26,660 --> 00:04:28,680
+On the upper left is
+左上角的这张图是
+
+139
+00:04:29,010 --> 00:04:31,220
+an example of learning to see with your tongue.
+用舌头学会“看”的一个例子
+
+140
+00:04:32,100 --> 00:04:33,630
+The way it works is--this is
+它的原理是 这实际上是
+
+141
+00:04:33,830 --> 00:04:35,700
+actually a system called BrainPort undergoing
+一个名为BrainPort的系统
+
+142
+00:04:36,500 --> 00:04:37,700
+FDA trials now to help
+它现在正在FDA (美国食品和药物管理局) 的临床试验阶段
+
+143
+00:04:37,870 --> 00:04:39,380
+blind people see--but the
+它能帮助失明人士看见事物
+
+144
+00:04:39,470 --> 00:04:41,300
+way it works is, you strap
+它的原理是
+
+145
+00:04:42,080 --> 00:04:43,360
+a grayscale camera to your
+你在前额上带一个灰度摄像头
+
+146
+00:04:43,580 --> 00:04:45,320
+forehead, facing forward, that takes
+面朝前 它就能
+
+147
+00:04:45,620 --> 00:04:47,680
+the low resolution grayscale image
+获取你面前事物的
+
+148
+00:04:48,120 --> 00:04:49,230
+of what's in front of you
+低分辨率的灰度图像
+
+149
+00:04:49,530 --> 00:04:50,630
+and you then run a wire
+你连一根线
+
+150
+00:04:51,750 --> 00:04:52,710
+to an array of electrodes
+到舌头上安装的
+
+151
+00:04:53,420 --> 00:04:54,540
+that you place on your tongue
+电极阵列上
+
+152
+00:04:55,090 --> 00:04:56,370
+so that each pixel gets mapped
+那么每个像素都被映射到
+
+153
+00:04:56,730 --> 00:04:58,720
+to a location on your
+你舌头的
+
+154
+00:04:58,830 --> 00:05:00,300
+tongue where maybe a
+某个位置上
+
+155
+00:05:00,430 --> 00:05:01,850
+high voltage corresponds to a
+可能电压值高的点对应一个
+
+156
+00:05:02,020 --> 00:05:03,620
+dark pixel and a low voltage
+暗像素 电压值低的点
+
+157
+00:05:04,160 --> 00:05:05,780
+corresponds to a bright
+对应于亮像素
+
+158
+00:05:06,140 --> 00:05:08,320
+pixel and, even as
+即使依靠
+
+159
+00:05:08,480 --> 00:05:09,580
+it does today, with this sort
+它现在的功能
+
+160
+00:05:09,880 --> 00:05:10,840
+of system you and I will
+使用这种系统就能让你我
+
+161
+00:05:10,900 --> 00:05:12,240
+be able to learn to see, you know,
+在几十分钟里就学会
+
+162
+00:05:12,490 --> 00:05:15,120
+in tens of minutes with our tongues.
+用我们的舌头“看”东西
+
+163
+00:05:15,270 --> 00:05:16,790
+Here's a second example of human
+这是第二个例子
+
+164
+00:05:17,210 --> 00:05:18,600
+echo location or human sonar.
+关于人体回声定位或者说人体声纳
+
+165
+00:05:19,790 --> 00:05:20,990
+So there are two ways you can do this.
+你有两种方法可以实现
+
+166
+00:05:21,330 --> 00:05:22,810
+You can either snap your fingers,
+你可以弹响指
+
+167
+00:05:24,490 --> 00:05:27,600
+or click your tongue.
+或者咂舌头
+
+168
+00:05:28,120 --> 00:05:28,570
+I can't do that very well.
+这个我做不好
+
+169
+00:05:29,430 --> 00:05:30,480
+But there are blind people
+不过现在有失明人士
+
+170
+00:05:30,760 --> 00:05:31,970
+today that are actually being
+确实在学校里
+
+171
+00:05:32,140 --> 00:05:33,420
+trained in schools to do this
+接受这样的培训
+
+172
+00:05:33,910 --> 00:05:35,600
+and learn to interpret the pattern
+并学会解读
+
+173
+00:05:36,040 --> 00:05:38,380
+of sounds bouncing off your environment - that's sonar.
+从环境反弹回来的声波模式—这就是声纳
+
+174
+00:05:39,190 --> 00:05:39,860
+So, if after you search
+如果你搜索
+
+175
+00:05:39,940 --> 00:05:42,310
+on YouTube, there are
+YouTube之后 就会发现
+
+176
+00:05:42,420 --> 00:05:44,040
+actually videos of this amazing kid who
+有些视频讲述了一个令人称奇的孩子
+
+177
+00:05:44,320 --> 00:05:45,770
+tragically because of cancer
+他因为癌症眼球惨遭移除
+
+178
+00:05:46,410 --> 00:05:49,020
+had his eyeballs removed, so this is a kid with no eyeballs.
+虽然失去了眼球
+
+179
+00:05:49,890 --> 00:05:51,740
+But by snapping his fingers, he
+但是通过打响指
+
+180
+00:05:51,890 --> 00:05:53,660
+can walk around and never hit anything.
+他可以四处走动而不撞到任何东西
+
+181
+00:05:54,440 --> 00:05:55,390
+He can ride a skateboard.
+他能滑滑板
+
+182
+00:05:56,320 --> 00:05:57,480
+He can shoot a basketball into a
+他可以将篮球投入篮框中
+
+183
+00:05:57,550 --> 00:05:59,360
+hoop and this is a kid with no eyeballs.
+注意这是一个没有眼球的孩子
+
+184
+00:06:00,510 --> 00:06:02,120
+Third example is the
+第三个例子是
+
+185
+00:06:02,370 --> 00:06:05,000
+Haptic Belt where if
+触觉皮带
+
+186
+00:06:05,240 --> 00:06:07,250
+you have a strap
+如果你把它
+
+187
+00:06:07,510 --> 00:06:08,900
+around your waist, ring up
+戴在腰上 蜂鸣器会响
+
+188
+00:06:09,060 --> 00:06:11,710
+buzzers and always have the northmost one buzzing.
+而且总是朝向北时发出嗡嗡声
+
+189
+00:06:12,090 --> 00:06:13,450
+You can give a human a
+它可以使人拥有
+
+190
+00:06:13,560 --> 00:06:14,780
+direction sense similar to
+方向感 用类似于
+
+191
+00:06:15,240 --> 00:06:18,760
+maybe how birds can, you know, sense where north is.
+鸟类感知方向的方式
+
+192
+00:06:19,570 --> 00:06:21,530
+And, some of the bizarre example, but
+还有一些离奇的例子
+
+193
+00:06:21,680 --> 00:06:22,820
+if you plug a third eye
+如果你在青蛙身上
+
+194
+00:06:23,110 --> 00:06:24,080
+into a frog, the frog
+插入第三只眼
+
+195
+00:06:24,460 --> 00:06:25,830
+will learn to use that eye as well.
+青蛙也能学会使用那只眼睛
+
+196
+00:06:27,410 --> 00:06:29,220
+So, it's pretty amazing to
+因此 这将会非常令人惊奇
+
+197
+00:06:29,440 --> 00:06:31,300
+what extent is as if
+如果你能
+
+198
+00:06:31,360 --> 00:06:32,640
+you can plug in almost any sensor
+把几乎任何传感器
+
+199
+00:06:32,830 --> 00:06:34,150
+to the brain and the brain's
+接入到大脑中
+
+200
+00:06:34,570 --> 00:06:35,940
+learning algorithm will just figure
+大脑的学习算法就能
+
+201
+00:06:36,170 --> 00:06:37,520
+out how to learn from that
+找出学习数据的方法
+
+202
+00:06:37,710 --> 00:06:39,170
+data and deal with that data.
+并处理这些数据
+
+203
+00:06:40,290 --> 00:06:41,280
+And there's a sense that
+从某种意义上来说
+
+204
+00:06:41,560 --> 00:06:42,840
+if we can figure out what
+如果我们能找出
+
+205
+00:06:43,060 --> 00:06:45,360
+the brain's learning algorithm is, and,
+大脑的学习算法
+
+206
+00:06:45,510 --> 00:06:46,780
+you know, implement it or implement some approximation
+然后在计算机上执行
+
+207
+00:06:47,550 --> 00:06:49,400
+to that algorithm on a computer, maybe
+大脑学习算法或与之相似的算法 也许
+
+208
+00:06:49,700 --> 00:06:50,760
+that would be our best shot
+这将是我们
+
+209
+00:06:51,190 --> 00:06:52,060
+at, you know, making real progress
+向人工智能迈进
+
+210
+00:06:52,680 --> 00:06:54,320
+towards the AI, the
+做出的最好的尝试
+
+211
+00:06:54,380 --> 00:06:55,920
+artificial intelligence dream of
+人工智能的梦想就是
+
+212
+00:06:55,990 --> 00:06:58,060
+someday building truly intelligent machines.
+有一天能制造出真正的智能机器
+
+213
+00:06:59,370 --> 00:07:00,410
+Now, of course, I'm not
+当然我不是
+
+214
+00:07:00,830 --> 00:07:02,310
+teaching Neural Networks, you know,
+教神经网络的
+
+215
+00:07:02,410 --> 00:07:03,590
+just because they might give us
+介绍它只因为它可能为我们
+
+216
+00:07:03,710 --> 00:07:04,740
+a window into this far-off
+打开一扇进入遥远的
+
+217
+00:07:05,200 --> 00:07:06,180
+AI dream, even though I'm
+人工智能梦的窗户
+
+218
+00:07:06,290 --> 00:07:07,500
+personally, that's one of the things
+对于我个人来说
+
+219
+00:07:07,760 --> 00:07:09,890
+that I personally work on in my research life.
+它也是我研究生涯中致力于的一个项目
+
+220
+00:07:10,650 --> 00:07:11,680
+But the main reason I'm
+但我在这节课中
+
+221
+00:07:11,840 --> 00:07:12,890
+teaching Neural Networks in this
+讲授神经网络的原因
+
+222
+00:07:13,140 --> 00:07:14,520
+class is because it's actually a
+主要是对于
+
+223
+00:07:14,670 --> 00:07:15,830
+very effective state of the
+现代机器学习应用
+
+224
+00:07:16,050 --> 00:07:18,340
+art technique for modern day machine learning applications.
+它是最有效的技术方法
+
+225
+00:07:18,990 --> 00:07:20,340
+So, in the next
+因此在接下来的
+
+226
+00:07:20,630 --> 00:07:22,160
+few videos, we'll start diving into
+一些课程中 我们将开始深入到
+
+227
+00:07:22,460 --> 00:07:23,830
+the technical details of Neural
+神经网络的技术细节
+
+228
+00:07:24,130 --> 00:07:25,280
+Networks so that you
+那么你就可以
+
+229
+00:07:25,460 --> 00:07:26,460
+can apply them to modern-day
+将它们应用到现代
+
+230
+00:07:26,490 --> 00:07:28,570
+machine learning applications and get
+机器学习的应用中
+
+231
+00:07:28,730 --> 00:07:30,860
+them to work well on problems.
+并利用它们很好地解决问题
+
+232
+00:07:31,160 --> 00:07:32,180
+But for me, you know, one
+但对我来说
+
+233
+00:07:32,430 --> 00:07:33,720
+of the reasons the excite me is
+使我兴奋的原因之一
+
+234
+00:07:33,850 --> 00:07:35,450
+that maybe they give
+就是它或许能
+
+235
+00:07:35,550 --> 00:07:37,000
+us this window into
+给我们一些启示
+
+236
+00:07:37,550 --> 00:07:38,660
+what we might do if
+让我们知道
+
+237
+00:07:38,890 --> 00:07:41,700
+we're also thinking of
+当我们在思考
+
+238
+00:07:41,920 --> 00:07:43,600
+what algorithms might someday be
+未来有什么样的算法
+
+239
+00:07:43,730 --> 00:07:46,000
+able to learn in a manner similar to humankind.
+能以与人类相似的方式学习时 我们能做些什么
+
diff --git a/srt/8 - 3 - Model Representation I (12 min).srt b/srt/8 - 3 - Model Representation I (12 min).srt
new file mode 100644
index 00000000..fe7f6c64
--- /dev/null
+++ b/srt/8 - 3 - Model Representation I (12 min).srt
@@ -0,0 +1,1611 @@
+1
+00:00:00,780 --> 00:00:01,870
+In this video, I want
+在这个视频中 我想
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:02,070 --> 00:00:03,210
+to start telling you about how
+开始向你介绍
+
+3
+00:00:03,470 --> 00:00:04,970
+we represent Neural Networks,
+我们该如何表示神经网络
+
+4
+00:00:05,520 --> 00:00:06,690
+in other words how we represent
+换句话说 当我们在
+
+5
+00:00:07,050 --> 00:00:08,130
+our hypotheses or how
+运用神经网络时
+
+6
+00:00:08,350 --> 00:00:11,270
+we represent our model when using your Neural Networks.
+我们该如何表示我们的假设或模型
+
+7
+00:00:12,050 --> 00:00:13,750
+Neural Networks were developed at
+神经网络是在模仿
+
+8
+00:00:14,320 --> 00:00:17,650
+simulating neurons or networks of neurons in the brain.
+大脑中的神经元或者神经网络时发明的
+
+9
+00:00:18,540 --> 00:00:19,830
+So, to explain the hypotheses
+因此 要解释如何表示
+
+10
+00:00:20,400 --> 00:00:22,330
+representation. Let's start by
+模型假设 我们先来看单个
+
+11
+00:00:22,580 --> 00:00:23,590
+looking at what a single
+神经元在大脑中
+
+12
+00:00:24,050 --> 00:00:25,250
+neuron in the brain looks like.
+是什么样的
+
+13
+00:00:26,390 --> 00:00:27,630
+Your brain and mine is jam-packed
+我们的大脑中充满了
+
+14
+00:00:28,160 --> 00:00:29,610
+full of neurons like these
+这样的神经元
+
+15
+00:00:30,170 --> 00:00:31,300
+and neurons are cells in
+神经元是大脑中的细胞
+
+16
+00:00:31,380 --> 00:00:32,740
+the brain and the two
+其中有两点
+
+17
+00:00:33,000 --> 00:00:34,740
+things to draw attention to are
+值得我们注意
+
+18
+00:00:34,970 --> 00:00:36,590
+that first that the
+一是神经元有
+
+19
+00:00:36,780 --> 00:00:37,820
+neuron has a cell body
+像这样的细胞主体
+
+20
+00:00:38,360 --> 00:00:40,320
+like so and moreover, the
+二是神经元有
+
+21
+00:00:40,500 --> 00:00:41,480
+neuron has a number of
+一定数量的
+
+22
+00:00:41,680 --> 00:00:43,060
+input wires and these are
+输入神经
+
+23
+00:00:43,260 --> 00:00:44,360
+called the dendrites who think of
+这些输入神经叫做树突
+
+24
+00:00:44,670 --> 00:00:47,370
+them as input wires and
+可以把它们想象成输入电线
+
+25
+00:00:48,180 --> 00:00:49,500
+these receive inputs from other
+它们接收来自其他
+
+26
+00:00:49,660 --> 00:00:51,330
+locations and the neuron
+神经元的信息
+
+27
+00:00:51,600 --> 00:00:54,270
+also has an output wire called the axon.
+神经元的输出神经叫做轴突
+
+28
+00:00:55,140 --> 00:00:56,710
+And this output wire
+这些输出神经
+
+29
+00:00:57,290 --> 00:00:58,910
+is what it uses to send
+是用来
+
+30
+00:00:59,140 --> 00:01:00,690
+signal to other neurons
+给其他神经元传递信号
+
+31
+00:01:01,290 --> 00:01:04,130
+or to send messages to other neurons.
+或者传送信息的
+
+32
+00:01:05,280 --> 00:01:07,220
+So, at a simplistic level, what
+简而言之
+
+33
+00:01:07,410 --> 00:01:08,740
+a neuron is is a computational
+神经元是一个计算单元
+
+34
+00:01:09,430 --> 00:01:10,470
+unit that gets a number
+它从输入神经接受一定数目的信息
+
+35
+00:01:10,650 --> 00:01:13,220
+of inputs through its input wires, does some computation.
+并做一些计算
+
+36
+00:01:14,430 --> 00:01:15,700
+and then it sends outputs, via its
+然后将结果通过它的
+
+37
+00:01:15,830 --> 00:01:17,640
+axon to other nodes
+轴突传送到其他节点
+
+38
+00:01:18,150 --> 00:01:19,540
+or other neurons in the brain.
+或者大脑中的其他神经元
+
+39
+00:01:20,460 --> 00:01:23,370
+Here's an illustration of a group of neurons.
+下面是一组神经元的示意图
+
+40
+00:01:24,260 --> 00:01:25,350
+The way that neurons communicate
+神经元利用微弱的电流
+
+41
+00:01:26,120 --> 00:01:28,410
+with each other is with little pulses of electricities.
+进行沟通
+
+42
+00:01:29,230 --> 00:01:31,820
+They're also called spikes, but they're just means of little pulse of electricity.
+这些弱电流也称作动作电位 其实就是一些微弱的电流
+
+43
+00:01:33,140 --> 00:01:35,000
+So, here's one neuron and what
+所以如果
+
+44
+00:01:35,680 --> 00:01:37,060
+it does is if it
+神经元想要
+
+45
+00:01:37,190 --> 00:01:38,260
+wants to send a message,
+传递一个消息
+
+46
+00:01:38,500 --> 00:01:39,280
+what it does is it sends
+它就会就通过它的轴突
+
+47
+00:01:39,710 --> 00:01:41,190
+the little pulse of electricity via its
+发送一段微弱电流
+
+48
+00:01:41,820 --> 00:01:44,110
+axon to some difference
+给其他神经元
+
+49
+00:01:44,970 --> 00:01:46,610
+neuron and here this axon.
+这就是轴突
+
+50
+00:01:47,250 --> 00:01:48,310
+There is this open wire.
+这里是一条
+
+51
+00:01:49,190 --> 00:01:50,840
+Connects to the input wire or
+连接到输入神经
+
+52
+00:01:51,030 --> 00:01:52,270
+connects to the dendrite of this
+或者连接另一个神经元
+
+53
+00:01:52,550 --> 00:01:54,300
+second neuron over here, which
+树突的神经
+
+54
+00:01:54,560 --> 00:01:55,860
+then accepts this incoming message
+接下来这个神经元接收这条消息
+
+55
+00:01:56,830 --> 00:01:58,510
+does some computation and may
+做一些计算
+
+56
+00:01:58,720 --> 00:01:59,710
+in turn decide to send
+它有可能会反过来将
+
+57
+00:02:00,030 --> 00:02:01,450
+out its O messages on its
+在轴突上的
+
+58
+00:02:02,020 --> 00:02:04,090
+axon to other neurons.
+自己的消息传给其他神经元
+
+59
+00:02:04,400 --> 00:02:05,740
+And this is the process by
+这就是所有
+
+60
+00:02:05,940 --> 00:02:07,570
+which all human thought
+人类思考的模型:
+
+61
+00:02:08,060 --> 00:02:09,540
+happens as these neurons doing
+我们的神经元把
+
+62
+00:02:09,730 --> 00:02:11,150
+computations and passing messages
+自己的收到的消息进行计算
+
+63
+00:02:11,630 --> 00:02:13,120
+to other neurons as a
+并向其他神经元
+
+64
+00:02:13,380 --> 00:02:15,560
+result of what other inputs they've got.
+传递消息
+
+65
+00:02:16,530 --> 00:02:17,560
+And by the way, this is how
+顺便说一下 这也是
+
+66
+00:02:18,340 --> 00:02:21,030
+our senses and our muscles work as well.
+我们的感觉和肌肉运转的原理
+
+67
+00:02:21,680 --> 00:02:23,340
+If you want to move one
+如果你想活动一块肌肉
+
+68
+00:02:23,500 --> 00:02:24,460
+of your muscles, the way that
+就会触发一个神经元
+
+69
+00:02:24,760 --> 00:02:26,110
+works is that a neuron may
+给你的肌肉
+
+70
+00:02:26,240 --> 00:02:27,370
+send these pulses of electricities
+发送脉冲
+
+71
+00:02:28,470 --> 00:02:29,590
+to your muscle and that causes
+并引起
+
+72
+00:02:30,160 --> 00:02:32,440
+your muscles to contract and your
+你的肌肉收缩
+
+73
+00:02:32,710 --> 00:02:34,030
+eyes - if some
+如果一些感官
+
+74
+00:02:34,330 --> 00:02:35,510
+sensor like your eye
+比如说眼睛
+
+75
+00:02:35,650 --> 00:02:36,710
+wants to send a message to
+想要给大脑传递
+
+76
+00:02:36,950 --> 00:02:37,810
+your brain, what it does
+一个消息
+
+77
+00:02:38,360 --> 00:02:39,900
+is it sends its pulses of
+那么它就像这样发送
+
+78
+00:02:40,670 --> 00:02:42,670
+electricity to a neuron in your brain like so.
+电脉冲给大脑的
+
+79
+00:02:43,460 --> 00:02:45,490
+In a neural network, or
+在一个神经网络里
+
+80
+00:02:46,040 --> 00:02:47,700
+rather in an artificial neural
+或者说在我们在电脑上
+
+81
+00:02:48,040 --> 00:02:49,250
+network that we implement in
+实现的人工神经网络里
+
+82
+00:02:49,290 --> 00:02:50,980
+a computer, we're going to
+我们将使用
+
+83
+00:02:51,200 --> 00:02:52,560
+use a very simple model
+一个非常简单的模型
+
+84
+00:02:53,160 --> 00:02:54,380
+of what a neuron does.
+来模拟神经元的工作
+
+85
+00:02:54,510 --> 00:02:57,720
+We're going to model a neuron as just a logistic unit.
+我们将神经元模拟成一个逻辑单元
+
+86
+00:02:58,590 --> 00:02:59,480
+So, when I draw a yellow
+当我画一个这样的
+
+87
+00:02:59,770 --> 00:03:01,130
+circle like that, you should hink of
+黄色圆圈时 你应该
+
+88
+00:03:01,240 --> 00:03:03,130
+that as playing a
+把它想象成
+
+89
+00:03:03,280 --> 00:03:04,710
+role analogous to maybe the
+作用类似于
+
+90
+00:03:04,870 --> 00:03:06,480
+body of a neuron, and
+神经元的东西
+
+91
+00:03:07,210 --> 00:03:08,840
+we then feed the neuron a
+然后我们通过
+
+92
+00:03:09,670 --> 00:03:11,670
+few inputs via its dendrites or
+它的树突或者说它的输入神经
+
+93
+00:03:11,910 --> 00:03:16,150
+its input wires and the neuron does some computation
+传递给它一些信息 然后神经元做一些计算
+
+94
+00:03:17,390 --> 00:03:19,050
+and output some value on
+并通过它的输出神经
+
+95
+00:03:19,200 --> 00:03:21,260
+this output wire or in
+即它的轴突
+
+96
+00:03:21,820 --> 00:03:23,400
+a biological neuron that sorts
+输出计算结果
+
+97
+00:03:23,530 --> 00:03:25,160
+the axon and whenever I
+当我画一个
+
+98
+00:03:25,310 --> 00:03:26,660
+draw a diagram like this, what
+像这样的图表时
+
+99
+00:03:26,830 --> 00:03:28,020
+this means is that this represents
+就表示对h(x)的计算
+
+100
+00:03:28,550 --> 00:03:30,040
+a computations of, you know, h of x equals 1
+h(x)等于1除以
+
+101
+00:03:32,780 --> 00:03:34,290
+over 1 + e to
+1加e的
+
+102
+00:03:35,290 --> 00:03:37,590
+the negative theta transpose x where, as
+负θ转置乘以 x
+
+103
+00:03:37,930 --> 00:03:39,330
+usual, x and theta
+通常 x和θ
+
+104
+00:03:39,650 --> 00:03:42,610
+are our parameter vectors like so.
+是我们的参数向量
+
+105
+00:03:42,920 --> 00:03:44,410
+So, this is a very simple maybe
+这是一个简单的模型
+
+106
+00:03:44,780 --> 00:03:46,490
+a vastly over simplified model of
+甚至说是一个过于简单的
+
+107
+00:03:46,670 --> 00:03:48,050
+the computation that the neuron
+模拟神经元的模型
+
+108
+00:03:48,320 --> 00:03:49,200
+does where it gets the
+它被输入 x1 x2和 x3
+
+109
+00:03:49,260 --> 00:03:50,790
+number of inputs, x1, x2,
+然后输出一些
+
+110
+00:03:51,650 --> 00:03:54,150
+x3 and it outputs some value computed like so.
+类似这样的结果
+
+111
+00:03:59,960 --> 00:04:01,250
+When I draw a neural network,
+当我绘制一个神经网络时
+
+112
+00:04:01,900 --> 00:04:03,430
+usually I draw only the
+通常我只绘制
+
+113
+00:04:03,720 --> 00:04:04,770
+input nose x1, x2, x3,
+输入节点 x1 x2 x3
+
+114
+00:04:06,330 --> 00:04:07,740
+sometimes when it's useful to do so.
+但有时也可以这样做:
+
+115
+00:04:08,170 --> 00:04:09,780
+I draw an extra node for x zero.
+我增加一个额外的节点 x0
+
+116
+00:04:11,050 --> 00:04:12,200
+This x zero node is
+这个 x0 节点
+
+117
+00:04:12,370 --> 00:04:13,960
+sometimes called the bias unit
+有时也被称作偏置单位
+
+118
+00:04:14,960 --> 00:04:17,970
+or the bias neuron but because
+或偏置神经元 但因为
+
+119
+00:04:18,500 --> 00:04:21,350
+x0 is already equal to 1.
+x0 总是等于1
+
+120
+00:04:21,530 --> 00:04:22,320
+Sometimes, I draw with it, sometimes
+所以有时候 我会画出它
+
+121
+00:04:22,820 --> 00:04:24,280
+I won't just depending on whether
+有时我不会画出
+
+122
+00:04:24,800 --> 00:04:27,560
+there's more the rotationally convenient for that example.
+这取决于它是否对例子有利
+
+123
+00:04:28,080 --> 00:04:32,810
+Finally, one last
+现在来讨论
+
+124
+00:04:33,270 --> 00:04:34,800
+bit of terminology when we
+最后一个关于
+
+125
+00:04:34,900 --> 00:04:36,690
+talk about neural networks, sometimes
+神经网络的术语
+
+126
+00:04:36,810 --> 00:04:38,500
+we'll say that this
+有时我们会说
+
+127
+00:04:38,790 --> 00:04:40,330
+is a neuron - an
+这是一个神经元
+
+128
+00:04:40,440 --> 00:04:42,720
+artificial neuron with a sigmoid or a logistic
+一个有s型函数或者逻辑函数作为激励函数的
+
+129
+00:04:43,090 --> 00:04:44,250
+activation function.
+人工神经元
+
+130
+00:04:44,760 --> 00:04:48,030
+So this activation function in the neuronetro
+在神经网络术语中
+
+131
+00:04:48,140 --> 00:04:49,200
+terminology, this is just
+激励函数只是对类似非线性
+
+132
+00:04:49,540 --> 00:04:51,210
+another term for that
+函数g(z)的另一个术语称呼
+
+133
+00:04:51,560 --> 00:04:53,190
+function for that non-linearity g
+g(z)等于
+
+134
+00:04:53,430 --> 00:04:55,170
+of z, equals 1
+1除以1
+
+135
+00:04:55,260 --> 00:04:56,020
+over 1 plus e to the negative z.
+加e的-z次方
+
+136
+00:04:56,660 --> 00:04:58,410
+And whereas so far
+到目前为止
+
+137
+00:04:58,930 --> 00:05:00,090
+I've been calling theta the parameters
+我一直称θ为
+
+138
+00:05:00,600 --> 00:05:02,500
+of the model are mostly continued
+模型的参数
+
+139
+00:05:02,940 --> 00:05:04,790
+to use that terminology to conjugate
+以后大概会继续将这个术语与
+
+140
+00:05:05,480 --> 00:05:06,480
+to the parameters, but the neural networks.
+“参数”相对应 而不是与神经网络
+
+141
+00:05:07,680 --> 00:05:08,960
+In the neural networks literature and
+在关于神经网络的文献里
+
+142
+00:05:09,400 --> 00:05:10,290
+sometimes you might hear people
+有时你可能会看到人们
+
+143
+00:05:10,620 --> 00:05:12,160
+talk about weights of a
+谈论一个模型的权重
+
+144
+00:05:12,400 --> 00:05:13,760
+model and weights just means
+权重其实和
+
+145
+00:05:13,950 --> 00:05:15,490
+exactly the same thing as
+模型的参数
+
+146
+00:05:15,750 --> 00:05:17,470
+parameters of the model.
+是一样的东西
+
+147
+00:05:17,830 --> 00:05:18,890
+But almost to use the terminology
+在视频中
+
+148
+00:05:19,900 --> 00:05:21,010
+parameters in these videos,
+我会继续使用“参数”这个术语
+
+149
+00:05:21,620 --> 00:05:24,180
+but sometimes you may hear others use the weights terminology.
+但有时你可能听到别人用“权重”这个术语
+
+150
+00:05:27,890 --> 00:05:29,290
+So, this little
+这个小圈
+
+151
+00:05:29,430 --> 00:05:31,340
+diagram represents a single neuron.
+代表一个单一的神经元
+
+152
+00:05:34,470 --> 00:05:35,790
+What a neural network is
+神经网络其实就是
+
+153
+00:05:36,560 --> 00:05:38,590
+Is just a proof of
+这些不同的神经元
+
+154
+00:05:38,780 --> 00:05:40,500
+these different neurons strung together.
+组合在一起的集合
+
+155
+00:05:41,630 --> 00:05:42,770
+Concretely, here we have
+具体来说 这里是我们的
+
+156
+00:05:43,530 --> 00:05:45,070
+input units x1, x2, and x3
+输入单元 x1 x2和 x3
+
+157
+00:05:45,410 --> 00:05:47,170
+and once again,
+再说一次
+
+158
+00:05:47,540 --> 00:05:49,070
+sometimes can draw this
+有时也可以画上
+
+159
+00:05:49,370 --> 00:05:50,760
+extra node x0 or sometimes
+额外的节点 x0
+
+160
+00:05:51,340 --> 00:05:52,490
+not. So, I just draw that in here.
+我把 x0 画在这了
+
+161
+00:05:53,620 --> 00:05:54,950
+And here we have
+这里有
+
+162
+00:05:55,300 --> 00:05:56,800
+three neurons, which I
+3个神经元
+
+163
+00:05:56,930 --> 00:05:58,890
+have written, you know, a(2)1, a(2)2 and
+我在里面写了a(2)1 a(2)2
+
+164
+00:05:59,060 --> 00:06:00,250
+a(2)3 around top bottles indices
+和a(2)3
+
+165
+00:06:00,700 --> 00:06:02,140
+later and once again,
+然后再次说明
+
+166
+00:06:02,730 --> 00:06:03,790
+we can if we want
+我们可以在这里
+
+167
+00:06:04,500 --> 00:06:05,440
+adding this a0 and
+添加一个a0
+
+168
+00:06:05,620 --> 00:06:08,840
+add an extra bias unit there.
+和一个额外的偏度单元
+
+169
+00:06:10,240 --> 00:06:12,030
+It always outputs the value of 1.
+它的值永远是1
+
+170
+00:06:12,390 --> 00:06:13,680
+Then finally we have this
+最后 我们在
+
+171
+00:06:13,880 --> 00:06:15,450
+third node at the final
+最后一层有第三个节点
+
+172
+00:06:15,710 --> 00:06:16,800
+layer, and it's this
+正是这第三个节点
+
+173
+00:06:16,990 --> 00:06:18,600
+third node that opens the value
+输出
+
+174
+00:06:19,210 --> 00:06:21,020
+that the hypotheses h of x computes.
+假设函数h(x)计算的结果
+
+175
+00:06:22,330 --> 00:06:23,480
+To introduce a bit
+再多说一点关于
+
+176
+00:06:23,610 --> 00:06:25,250
+more terminology in a neural
+神经网络的术语
+
+177
+00:06:25,530 --> 00:06:27,340
+network, the first layer, this
+网络中的第一层
+
+178
+00:06:27,480 --> 00:06:28,610
+is also called the input
+也被称为输入层
+
+179
+00:06:29,040 --> 00:06:30,160
+layer because this is where
+因为我们在这一层
+
+180
+00:06:30,400 --> 00:06:33,510
+we input our features, x1 x2 x3.
+输入我们的特征项 x1 x2 x3
+
+181
+00:06:33,770 --> 00:06:35,560
+The final layer is
+最后一层
+
+182
+00:06:35,850 --> 00:06:37,190
+also called the output layer
+也称为输出层
+
+183
+00:06:37,640 --> 00:06:39,550
+because that layer has
+因为这一层的
+
+184
+00:06:39,840 --> 00:06:41,010
+the neuron - this one over
+神经元—我指的这个
+
+185
+00:06:41,150 --> 00:06:42,340
+here - that outputs the
+输出
+
+186
+00:06:42,400 --> 00:06:43,980
+final value computed by a
+假设的最终计算结果
+
+187
+00:06:44,370 --> 00:06:46,180
+hypotheses and then layer
+中间的两层
+
+188
+00:06:46,420 --> 00:06:48,900
+two in between, this is called the hidden layer.
+也被称作隐藏层
+
+189
+00:06:49,830 --> 00:06:51,300
+The term hidden layer isn't a
+隐藏层不是一个
+
+190
+00:06:51,450 --> 00:06:53,290
+great terminology, but the
+很合适的术语 但是
+
+191
+00:06:54,160 --> 00:06:55,680
+intuition is that, you know, in
+直觉上我们知道
+
+192
+00:06:56,020 --> 00:06:57,450
+supervised learning where you
+在监督学习中
+
+193
+00:06:57,530 --> 00:06:59,820
+get to see the inputs, and you get to see the correct outputs.
+你能看到输入 也能看到正确的输出
+
+194
+00:07:00,640 --> 00:07:02,530
+Whereas the hidden layer are values you
+而隐藏层的值
+
+195
+00:07:02,660 --> 00:07:04,260
+don't get to observe in the training set.
+你在训练集里是看不到的
+
+196
+00:07:04,520 --> 00:07:07,280
+If it's not x and it's not y and so we call those hidden.
+它的值不是 x 也不是y 所以我们叫它隐藏层
+
+197
+00:07:08,170 --> 00:07:09,860
+and later on we'll see neural
+稍后我们会看到神经网络
+
+198
+00:07:10,050 --> 00:07:11,260
+networks with more than
+可以有不止一个的
+
+199
+00:07:11,370 --> 00:07:12,690
+one hidden layer, but in
+隐藏层 但在
+
+200
+00:07:13,020 --> 00:07:14,290
+this example we have one
+这个例子中 我们有一个
+
+201
+00:07:14,480 --> 00:07:16,010
+input layer, layer 1; one hidden
+输入层—第1层 一个隐藏层—
+
+202
+00:07:16,260 --> 00:07:18,900
+layer, layer 2; and one output layer, layer 3.
+第2层 和一个输出层—第3层
+
+203
+00:07:19,390 --> 00:07:20,530
+But basically anything that isn't
+但实际上任何
+
+204
+00:07:20,990 --> 00:07:22,260
+an input layer and isn't a
+非输入层或非输出层的层
+
+205
+00:07:22,410 --> 00:07:24,480
+output layer is called a hidden layer.
+就被称为隐藏层
+
+206
+00:07:26,710 --> 00:07:29,620
+So, I
+接下来
+
+207
+00:07:29,710 --> 00:07:30,610
+want to be really clear
+我希望你们明白神经网络
+
+208
+00:07:31,090 --> 00:07:33,130
+about what this neural network is doing.
+究竟在做什么
+
+209
+00:07:33,970 --> 00:07:34,840
+Let's step through the computational
+让我们逐步分析
+
+210
+00:07:35,760 --> 00:07:37,600
+steps that are embodied
+这个图表所呈现的
+
+211
+00:07:38,050 --> 00:07:40,410
+by this, represented by this diagram.
+计算步骤
+
+212
+00:07:41,560 --> 00:07:42,800
+To explain the specific computations
+为了解释这个神经网络
+
+213
+00:07:43,660 --> 00:07:44,960
+represented by a neural network,
+具体的计算步骤
+
+214
+00:07:45,580 --> 00:07:46,910
+here's a little bit more notation.
+这里还有些记号要解释
+
+215
+00:07:47,270 --> 00:07:48,400
+I'm going to use a superscript
+我要使用a上标(j)
+
+216
+00:07:48,950 --> 00:07:50,520
+j subscript i to denote
+下标i表示
+
+217
+00:07:51,090 --> 00:07:53,640
+the activation of neuron i
+第j层的
+
+218
+00:07:54,060 --> 00:07:55,390
+or of unit i in layer
+第i个神经元或单元
+
+219
+00:07:55,720 --> 00:07:58,290
+j. So concretely, this
+具体来说 这里
+
+220
+00:07:59,390 --> 00:08:01,280
+a superscript 2 subscript 1
+a上标(2) 下标1
+
+221
+00:08:01,380 --> 00:08:03,930
+does the activation of the
+表示第2层的
+
+222
+00:08:04,010 --> 00:08:05,320
+first unit in layer 2,
+第一个激励
+
+223
+00:08:05,450 --> 00:08:07,140
+in our hidden layer.
+即隐藏层的第一个激励
+
+224
+00:08:07,280 --> 00:08:08,640
+And by activation, I just mean,
+所谓激励(activation) 是指
+
+225
+00:08:08,970 --> 00:08:10,360
+you know, the value that is computed
+由一个具体神经元读入
+
+226
+00:08:10,710 --> 00:08:12,530
+by and that is output by a specific.
+计算并输出的值
+
+227
+00:08:13,200 --> 00:08:14,320
+In addition, our neural network
+此外 我们的神经网络
+
+228
+00:08:14,850 --> 00:08:17,050
+is parametrized by these matrices,
+被这些矩阵参数化
+
+229
+00:08:17,470 --> 00:08:19,520
+theta superscript j where
+θ上标(j)
+
+230
+00:08:19,690 --> 00:08:20,600
+our theta j is going to
+它将成为
+
+231
+00:08:20,820 --> 00:08:21,820
+be a matrix of waves
+一个波矩阵
+
+232
+00:08:22,140 --> 00:08:23,770
+controlling the function mapping from
+控制着
+
+233
+00:08:24,130 --> 00:08:25,780
+one layer, maybe the first
+从一层 比如说从第一层
+
+234
+00:08:25,990 --> 00:08:28,360
+layer to the second layer or from the second layer to the third layer.
+到第二层或者第二层到第三层的作用
+
+235
+00:08:29,580 --> 00:08:32,990
+So, here are the computations that are represented by this diagram.
+所以 这就是这张图所表示的计算
+
+236
+00:08:34,520 --> 00:08:35,770
+This first hidden unit here,
+这里的第一个隐藏单元
+
+237
+00:08:37,060 --> 00:08:38,600
+has its value computed as
+是这样计算它的值的:
+
+238
+00:08:38,840 --> 00:08:40,020
+follows: is a a(2)1 is
+a(2)1等于
+
+239
+00:08:40,260 --> 00:08:41,980
+equal to the sigmoid
+s函数
+
+240
+00:08:42,400 --> 00:08:44,240
+function, or the sigmoid activation function
+或者说s激励函数
+
+241
+00:08:45,210 --> 00:08:46,550
+also called the logistic activation function,
+也叫做逻辑激励函数
+
+242
+00:08:47,760 --> 00:08:49,730
+applied to this sort
+作用在这种
+
+243
+00:08:49,990 --> 00:08:52,360
+of linear combination of its inputs.
+输入的线性组合上的结果
+
+244
+00:08:53,840 --> 00:08:56,560
+And then this second hidden
+第二个隐藏单元
+
+245
+00:08:56,820 --> 00:08:58,330
+unit has this activation
+等于s函数作用在这个
+
+246
+00:08:59,010 --> 00:09:01,400
+value computed as sigmoid of this.
+线性组合上的值
+
+247
+00:09:02,470 --> 00:09:04,110
+And similarly, for this
+同样 对于第三个
+
+248
+00:09:04,260 --> 00:09:07,010
+third hidden unit, it's computed by that formula.
+隐藏的单元 它是通过这个公式计算的
+
+249
+00:09:08,330 --> 00:09:10,060
+So here, we have three
+在这里 我们有三个
+
+250
+00:09:10,780 --> 00:09:13,960
+input units and the three hidden units.
+输入单元和三个隐藏单元
+
+251
+00:09:16,830 --> 00:09:18,840
+And so the dimension
+这样一来
+
+252
+00:09:19,590 --> 00:09:21,530
+of theta one which the
+参数矩阵控制了
+
+253
+00:09:22,060 --> 00:09:23,590
+matrix of parameters governing our
+我们来自
+
+254
+00:09:23,740 --> 00:09:24,870
+mapping from all three input
+三个输入单元
+
+255
+00:09:25,170 --> 00:09:26,530
+units, about three hidden units
+三个隐藏单元的映射
+
+256
+00:09:27,080 --> 00:09:28,210
+theta 1 is going to
+因此θ1的维数
+
+257
+00:09:29,880 --> 00:09:35,390
+be a 3,
+将变成3
+
+258
+00:09:35,640 --> 00:09:36,870
+theta 1 is going to
+θ1将变成一个
+
+259
+00:09:38,130 --> 00:09:39,640
+be a 3 by 4 dimensional
+3乘4维的
+
+260
+00:09:40,650 --> 00:09:42,620
+matrix and more generally,
+矩阵 更一般的
+
+261
+00:09:43,870 --> 00:09:45,440
+if a network has Sj
+如果一个网络在第j
+
+262
+00:09:45,710 --> 00:09:46,710
+units and their j
+层有sj个单元
+
+263
+00:09:47,210 --> 00:09:48,440
+and Sj + 1 units
+在j+1层有
+
+264
+00:09:48,650 --> 00:09:49,980
+in their j + 1 then
+sj+1个单元
+
+265
+00:09:50,310 --> 00:09:51,700
+the matrix theta j which
+那么矩阵θ(j)
+
+266
+00:09:52,010 --> 00:09:53,560
+governs the function mapping from
+即控制第j层到
+
+267
+00:09:53,780 --> 00:09:55,390
+layer j to layer j +
+第j+1层映射
+
+268
+00:09:55,640 --> 00:09:56,660
+1 that we'll have to mention
+的矩阵的
+
+269
+00:09:57,280 --> 00:10:00,160
+Sj + 1 by Sj + 1.
+维度为s(j+1) * (sj+1)
+
+270
+00:10:00,580 --> 00:10:02,390
+Just be clear about this notation, right?
+这里要搞清楚
+
+271
+00:10:02,580 --> 00:10:04,440
+This is S subscript j
+这个是s下标j+1
+
+272
+00:10:04,440 --> 00:10:05,810
++ 1 and that's S
+而这个是
+
+273
+00:10:06,100 --> 00:10:07,260
+subscript j, and then
+s下标j 然后
+
+274
+00:10:07,380 --> 00:10:09,060
+this whole thing, plus 1.
+整体加上1
+
+275
+00:10:09,430 --> 00:10:11,860
+Of this whole thing, that's j + 1, okay?
+整体加1 明白了吗
+
+276
+00:10:12,260 --> 00:10:13,730
+So that's S subscript j plus
+所以θ(j)的维度是
+
+277
+00:10:14,080 --> 00:10:22,400
+1 plus, by So,
+s(j+1)行 sj+1列
+
+278
+00:10:22,560 --> 00:10:24,090
+that's S subscript j plus
+这里sj+1
+
+279
+00:10:24,400 --> 00:10:26,230
+1 by Sj
+当中的1
+
+280
+00:10:27,220 --> 00:10:30,460
++ 1 where as plus 1 is not part of the subscript.
+不是下标的一部分
+
+281
+00:10:32,400 --> 00:10:33,520
+So, we talked about what
+以上我们讨论了
+
+282
+00:10:33,690 --> 00:10:36,120
+the three hidden units do to compute their values.
+三个隐藏单位是怎么计算它们的值
+
+283
+00:10:37,180 --> 00:10:41,240
+Finally, this last, the spinal in optimal
+最后 在输出层
+
+284
+00:10:41,370 --> 00:10:42,280
+layer, we have one more
+我们还有一个
+
+285
+00:10:42,540 --> 00:10:44,270
+units which computes h of
+单元 它计算
+
+286
+00:10:44,350 --> 00:10:46,090
+x and that's equal, can
+h(x) 这个也可以
+
+287
+00:10:46,230 --> 00:10:47,210
+also be written as a(3)1
+写成a(3)1
+
+288
+00:10:47,270 --> 00:10:50,830
+and that's equal to this.
+就等于后面这块
+
+289
+00:10:52,030 --> 00:10:53,110
+And you notice that I've
+注意到我这里
+
+290
+00:10:53,290 --> 00:10:54,480
+written this with a superscript
+写了个上标2
+
+291
+00:10:54,670 --> 00:10:56,380
+2 here because theta superscript
+因为θ上标2
+
+292
+00:10:57,130 --> 00:10:58,340
+2 is the matrix of parameters,
+是参数矩阵
+
+293
+00:10:59,080 --> 00:11:01,170
+or the matrix of weights that
+或着说是权重矩阵 该矩阵
+
+294
+00:11:01,380 --> 00:11:02,830
+controls the function that maps
+控制从第二层
+
+295
+00:11:03,240 --> 00:11:05,090
+from the hidden units, that
+即隐藏层的3个单位
+
+296
+00:11:05,600 --> 00:11:06,850
+is the layer 2 units,
+到第三层
+
+297
+00:11:07,720 --> 00:11:09,230
+to the 1 layer 3
+的一个单元
+
+298
+00:11:09,590 --> 00:11:10,840
+unit that is the output
+即输出单元
+
+299
+00:11:12,360 --> 00:11:12,360
+unit.
+的映射
+
+300
+00:11:12,550 --> 00:11:13,460
+To summarize, what we've done
+总之 以上我们
+
+301
+00:11:13,830 --> 00:11:14,900
+is shown how a picture
+展示了像这样一张图是
+
+302
+00:11:15,230 --> 00:11:16,670
+like this over here defines
+怎样定义
+
+303
+00:11:17,350 --> 00:11:20,280
+an artificial neural network which defines
+一个人工神经网络的
+
+304
+00:11:20,920 --> 00:11:22,160
+a function h that maps
+这个神经网络定义了函数h:
+
+305
+00:11:23,090 --> 00:11:24,880
+your x's input values to hopefully
+从输入 x
+
+306
+00:11:25,140 --> 00:11:26,650
+to some space and provisions y?
+到输出y的映射
+
+307
+00:11:27,500 --> 00:11:29,430
+And these hypotheses after parametrized
+我将这些假设的参数
+
+308
+00:11:30,190 --> 00:11:31,600
+by parameters that I
+记为大写的θ
+
+309
+00:11:31,690 --> 00:11:33,070
+am denoting with a capital
+这样一来
+
+310
+00:11:33,460 --> 00:11:35,020
+theta so that as
+不同的θ
+
+311
+00:11:35,170 --> 00:11:36,920
+we be vary theta we get different hypotheses.
+对应了不同的假设
+
+312
+00:11:37,650 --> 00:11:38,930
+So we get different functions mapping
+所以我们有不同的函数
+
+313
+00:11:39,490 --> 00:11:42,490
+say from x to y. So
+比如说从 x到y的映射
+
+314
+00:11:42,940 --> 00:11:44,000
+this gives us a mathematical
+以上就是我们怎么
+
+315
+00:11:44,790 --> 00:11:45,980
+definition of how to
+从数学上定义
+
+316
+00:11:46,140 --> 00:11:48,400
+represent the hypotheses in the neural network.
+神经网络的假设
+
+317
+00:11:49,430 --> 00:11:50,750
+In the next few videos, what
+在接下来的视频中
+
+318
+00:11:50,780 --> 00:11:51,930
+I'd like to do is give
+我想要做的就是
+
+319
+00:11:52,090 --> 00:11:53,580
+you more intuition about what
+让你对这些假设的作用
+
+320
+00:11:53,760 --> 00:11:56,280
+these hypotheses representations do, as
+有更深入的理解
+
+321
+00:11:56,410 --> 00:11:57,290
+well as go through a
+并且讲解几个例子
+
+322
+00:11:57,370 --> 00:12:00,280
+few examples and talk about how to compute them efficiently.
+然后谈谈如何有效的计算它们
+
diff --git a/srt/8 - 4 - Model Representation II (12 min).srt b/srt/8 - 4 - Model Representation II (12 min).srt
new file mode 100644
index 00000000..ce58d8c9
--- /dev/null
+++ b/srt/8 - 4 - Model Representation II (12 min).srt
@@ -0,0 +1,1671 @@
+1
+00:00:00,280 --> 00:00:01,330
+In the last video, we gave
+在前面的视频里 我们
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,570 --> 00:00:03,540
+a mathematical definition of how
+解释了怎样用数学来
+
+3
+00:00:03,700 --> 00:00:04,990
+to represent or how to
+定义或者计算
+
+4
+00:00:05,090 --> 00:00:07,160
+compute the hypotheses used by Neural Network.
+神经网络算法的假设
+
+5
+00:00:08,420 --> 00:00:09,620
+In this video, I like
+在这段视频中 我想
+
+6
+00:00:09,730 --> 00:00:11,280
+show you how to actually
+告诉你如何
+
+7
+00:00:11,450 --> 00:00:14,040
+carry out that computation efficiently, and
+高效地进行计算
+
+8
+00:00:14,710 --> 00:00:16,050
+that is show you a vector rise implementation.
+并展示一个向量化的实现方法
+
+9
+00:00:17,660 --> 00:00:18,930
+And second, and more importantly, I want
+更重要的是 我想
+
+10
+00:00:19,100 --> 00:00:21,110
+to start giving you intuition about
+让你们明白为什么
+
+11
+00:00:21,390 --> 00:00:22,590
+why these neural network representations
+这样表示神经网络
+
+12
+00:00:23,360 --> 00:00:24,640
+might be a good idea and how
+是一个好的方法 并且明白
+
+13
+00:00:25,010 --> 00:00:27,290
+they can help us to learn complex nonlinear hypotheses.
+它们怎样帮助我们学习复杂的非线性假设
+
+14
+00:00:28,970 --> 00:00:29,880
+Consider this neural network.
+以这个神经网络为例
+
+15
+00:00:30,520 --> 00:00:31,720
+Previously we said that
+以前我们说
+
+16
+00:00:32,010 --> 00:00:33,070
+the sequence of steps that we
+计算出假设输出
+
+17
+00:00:33,170 --> 00:00:34,090
+need in order to compute
+的步骤
+
+18
+00:00:34,650 --> 00:00:35,850
+the output of a hypotheses
+是左边的这些
+
+19
+00:00:36,320 --> 00:00:37,780
+is these equations given on
+方程 通过这些方程
+
+20
+00:00:37,950 --> 00:00:38,770
+the left where we compute
+我们计算出
+
+21
+00:00:39,540 --> 00:00:41,330
+the activation values of the
+三个隐藏单元的激励值
+
+22
+00:00:41,450 --> 00:00:43,220
+three hidden uses and then
+然后利用
+
+23
+00:00:43,420 --> 00:00:44,580
+we use those to compute the
+这些值来计算
+
+24
+00:00:44,650 --> 00:00:45,710
+final output of our hypotheses
+假设h(x)的最终输出
+
+25
+00:00:46,680 --> 00:00:48,410
+h of x. Now, I'm going
+接下来 我要
+
+26
+00:00:48,480 --> 00:00:50,200
+to define a few extra terms.
+定义一些额外的项
+
+27
+00:00:50,570 --> 00:00:52,210
+So, this term that I'm
+因此 这里
+
+28
+00:00:52,410 --> 00:00:54,090
+underlining here, I'm going to
+我画线的项
+
+29
+00:00:54,180 --> 00:00:55,560
+define that to be
+把它定义为
+
+30
+00:00:56,230 --> 00:00:58,410
+z superscript 2 subscript 1.
+z上标(2) 下标1
+
+31
+00:00:58,790 --> 00:00:59,830
+So that we have that
+这样一来 就有了
+
+32
+00:01:00,650 --> 00:01:02,310
+a(2)1, which is this
+a(2)1 这个项
+
+33
+00:01:02,470 --> 00:01:03,930
+term is equal to
+等于
+
+34
+00:01:04,170 --> 00:01:06,020
+g of z to 1.
+g(z(2)1)
+
+35
+00:01:06,130 --> 00:01:08,100
+And by the
+顺便说一下
+
+36
+00:01:08,180 --> 00:01:09,750
+way, these superscript 2, you
+这些上标2
+
+37
+00:01:10,570 --> 00:01:11,580
+know, what that means is that
+的意思是
+
+38
+00:01:11,870 --> 00:01:12,960
+the z2 and this a2
+在z(2)和a(2)中
+
+39
+00:01:13,080 --> 00:01:14,140
+as well, the superscript
+括号中的
+
+40
+00:01:14,840 --> 00:01:16,450
+2 in parentheses means that these
+2表示这些值
+
+41
+00:01:16,740 --> 00:01:18,330
+are values associated with layer
+与第二层相关
+
+42
+00:01:18,570 --> 00:01:19,810
+2, that is with the hidden
+即与神经网络中的
+
+43
+00:01:20,100 --> 00:01:21,390
+layer in the neural network.
+隐藏层有关
+
+44
+00:01:22,820 --> 00:01:25,200
+Now this term here
+接下来 这里的项
+
+45
+00:01:25,990 --> 00:01:27,640
+I'm going to similarly define as
+我将同样定义为
+
+46
+00:01:29,530 --> 00:01:30,140
+z(2)2.
+z(2)2
+
+47
+00:01:30,490 --> 00:01:31,860
+And finally, this last
+最后这个
+
+48
+00:01:32,170 --> 00:01:33,100
+term here that I'm underlining,
+我画线的项
+
+49
+00:01:34,160 --> 00:01:37,040
+let me define that as z(2)3.
+我把它定义为z(2)3
+
+50
+00:01:37,090 --> 00:01:38,710
+So that similarly we have a(2)3
+这样 我们有a(2)3
+
+51
+00:01:38,850 --> 00:01:43,200
+equals g of
+等于
+
+52
+00:01:44,990 --> 00:01:45,360
+z(2)3.
+g(z(2)3)
+
+53
+00:01:45,480 --> 00:01:46,760
+So these z values are just
+所以这些z值都是
+
+54
+00:01:47,290 --> 00:01:48,940
+a linear combination, a weighted
+一个线性组合
+
+55
+00:01:49,360 --> 00:01:51,200
+linear combination, of the
+是输入值x0 x1 x2 x3的
+
+56
+00:01:51,490 --> 00:01:52,800
+input values x0, x1,
+加权线性组合
+
+57
+00:01:53,060 --> 00:01:55,350
+x2, x3 that go into a particular neuron.
+它将会进入一个特定的神经元
+
+58
+00:01:57,090 --> 00:01:58,260
+Now if you look at
+现在 看一下
+
+59
+00:01:58,900 --> 00:02:00,470
+this block of numbers,
+这一堆数字
+
+60
+00:02:01,990 --> 00:02:03,310
+you may notice that that block
+你可能会注意到这块
+
+61
+00:02:03,490 --> 00:02:05,880
+of numbers corresponds suspiciously similar
+对应了
+
+62
+00:02:06,950 --> 00:02:08,330
+to the matrix vector
+矩阵向量运算
+
+63
+00:02:08,800 --> 00:02:10,260
+operation, matrix vector multiplication
+类似于矩阵向量乘法
+
+64
+00:02:11,070 --> 00:02:12,710
+of x1 times the
+x1乘以向量x
+
+65
+00:02:12,790 --> 00:02:14,840
+vector x. Using this observation
+观察到一点
+
+66
+00:02:15,580 --> 00:02:18,730
+we're going to be able to vectorize this computation
+我们就能将
+
+67
+00:02:19,700 --> 00:02:20,280
+of the neural network.
+神经网络的计算向量化了
+
+68
+00:02:21,470 --> 00:02:23,510
+Concretely, let's define the
+具体而言 我们定义
+
+69
+00:02:23,680 --> 00:02:24,810
+feature vector x as usual
+特征向量x
+
+70
+00:02:25,290 --> 00:02:27,020
+to be the vector of x0, x1,
+为x0 x1
+
+71
+00:02:27,260 --> 00:02:28,550
+x2, x3 where x0
+x2 x3组成的向量 其中x0
+
+72
+00:02:29,010 --> 00:02:30,280
+as usual is always equal
+仍然等于1
+
+73
+00:02:30,610 --> 00:02:31,860
+1 and that defines
+并定义
+
+74
+00:02:32,390 --> 00:02:33,420
+z2 to be the vector
+z(2)为
+
+75
+00:02:34,360 --> 00:02:37,250
+of these z-values, you know, of z(2)1 z(2)2, z(2)3.
+这些z值组成的向量 即z(2)1 z(2)2 z(2)3
+
+76
+00:02:38,560 --> 00:02:40,210
+And notice that, there, z2 this
+注意 在这里 z(2)
+
+77
+00:02:40,440 --> 00:02:42,500
+is a three dimensional vector.
+是一个三维向量
+
+78
+00:02:43,910 --> 00:02:47,200
+We can now vectorize the computation
+下面 我们可以这样
+
+79
+00:02:48,270 --> 00:02:48,860
+of a(2)1, a(2)2, a(2)3 as follows.
+向量化a(2)1 a(2)2 a(2)3的计算
+
+80
+00:02:49,490 --> 00:02:50,690
+We can just write this in two steps.
+我们只用两个步骤
+
+81
+00:02:51,500 --> 00:02:53,400
+We can compute z2 as theta
+z(2)等于θ(1)
+
+82
+00:02:53,950 --> 00:02:55,490
+1 times x and that
+乘以x
+
+83
+00:02:55,790 --> 00:02:57,020
+would give us this vector z2;
+这样就有了向量z(2)
+
+84
+00:02:57,400 --> 00:02:59,360
+and then a2 is
+然后 a(2)等于
+
+85
+00:02:59,860 --> 00:03:02,180
+g of z2 and just
+g(z(2))
+
+86
+00:03:02,440 --> 00:03:03,860
+to be clear z2 here, This
+需要明白 这里的z(2)是
+
+87
+00:03:04,200 --> 00:03:05,880
+is a three-dimensional vector and
+三维向量 并且
+
+88
+00:03:06,060 --> 00:03:08,150
+a2 is also a three-dimensional
+a(2)也是一个三维
+
+89
+00:03:08,810 --> 00:03:10,410
+vector and thus this
+向量 因此这
+
+90
+00:03:10,690 --> 00:03:12,680
+activation g. This applies the
+里的激励g 将s函数
+
+91
+00:03:12,950 --> 00:03:15,290
+sigmoid function element-wise to each
+逐元素作用于
+
+92
+00:03:15,550 --> 00:03:18,290
+of the z2's elements. And
+z(2)中的每个元素
+
+93
+00:03:18,380 --> 00:03:19,270
+by the way, to make our notation
+顺便说一下 为了让我们
+
+94
+00:03:19,950 --> 00:03:21,260
+a little more consistent with
+的符号和接下来的
+
+95
+00:03:21,440 --> 00:03:23,330
+what we'll do later, in this
+工作相一致
+
+96
+00:03:23,590 --> 00:03:24,600
+input layer we have the
+在输入层 虽然我们有
+
+97
+00:03:24,670 --> 00:03:25,840
+inputs x, but we
+输入x 但我们
+
+98
+00:03:25,960 --> 00:03:26,950
+can also thing it is
+还可以把这些想成
+
+99
+00:03:27,300 --> 00:03:29,270
+as in activations of the first layers.
+是第一层的激励
+
+100
+00:03:29,680 --> 00:03:30,430
+So, if I defined a1 to
+所以 我可以定义a(1)
+
+101
+00:03:30,470 --> 00:03:32,510
+be equal to x. So,
+等于x 因此
+
+102
+00:03:32,660 --> 00:03:34,270
+the a1 is vector, I can
+a(1)就是一个向量了
+
+103
+00:03:34,500 --> 00:03:35,520
+now take this x here
+我就可以把这里的x
+
+104
+00:03:36,230 --> 00:03:38,850
+and replace this with z2 equals theta1
+替换成a(1)
+
+105
+00:03:39,570 --> 00:03:40,680
+times a1 just by defining
+z(2)就等于θ(1)乘以a(1)
+
+106
+00:03:41,410 --> 00:03:43,350
+a1 to be activations in my input layer.
+这都是通过在输入层定义a(1)做到的
+
+107
+00:03:44,990 --> 00:03:46,000
+Now, with what I've written
+现在 就我目前所写的
+
+108
+00:03:46,280 --> 00:03:47,500
+so far I've now gotten
+我得到了
+
+109
+00:03:47,900 --> 00:03:49,940
+myself the values for a1,
+a1 a2 a3的值
+
+110
+00:03:50,820 --> 00:03:52,690
+a2, a3, and really
+并且
+
+111
+00:03:52,780 --> 00:03:53,980
+I should put the
+我应该把
+
+112
+00:03:54,290 --> 00:03:55,600
+superscripts there as well.
+上标加上去
+
+113
+00:03:56,430 --> 00:03:57,530
+But I need one more
+但我还需要一个值
+
+114
+00:03:57,940 --> 00:03:59,810
+value, which is I also want this a(0)2
+我同样需要这个a(2)0
+
+115
+00:04:00,050 --> 00:04:02,050
+and that corresponds to
+它对应于
+
+116
+00:04:02,250 --> 00:04:04,350
+a bias unit in the
+隐藏层的
+
+117
+00:04:04,550 --> 00:04:06,420
+hidden layer that goes to the output there.
+得到这个输出的偏置单元
+
+118
+00:04:06,990 --> 00:04:07,780
+Of course, there was a
+当然 这里也有一个
+
+119
+00:04:07,810 --> 00:04:08,850
+bias unit here too that,
+偏置单元
+
+120
+00:04:09,000 --> 00:04:10,060
+you know, it just didn't draw
+我只是没有
+
+121
+00:04:10,270 --> 00:04:11,820
+under here but to
+画出来 为了
+
+122
+00:04:11,970 --> 00:04:13,100
+take care of this extra bias unit,
+注意这额外的偏置单元
+
+123
+00:04:13,870 --> 00:04:15,650
+what we're going to do is add
+接下来我们
+
+124
+00:04:16,320 --> 00:04:18,720
+an extra a0 superscript 2,
+要额外加上一个a0 上标(2)
+
+125
+00:04:18,890 --> 00:04:20,870
+that's equal to one, and after
+它等于1 这样一来
+
+126
+00:04:21,010 --> 00:04:21,990
+taking this step we now have
+现在
+
+127
+00:04:22,290 --> 00:04:23,860
+that a2 is going to
+a(2)就是一个
+
+128
+00:04:24,010 --> 00:04:25,390
+be a four dimensional feature
+四维的特征向量
+
+129
+00:04:25,690 --> 00:04:26,820
+vector because we just added
+因为我们刚添加了
+
+130
+00:04:27,300 --> 00:04:28,490
+this extra, you know,
+这个额外的
+
+131
+00:04:28,620 --> 00:04:30,260
+a0 which is equal to
+a0 它等于
+
+132
+00:04:30,500 --> 00:04:31,700
+1 corresponding to the bias unit
+1并且它是隐藏层的
+
+133
+00:04:32,080 --> 00:04:33,550
+in the hidden layer. And finally,
+一个偏置单元 最后
+
+134
+00:04:35,080 --> 00:04:37,620
+to compute the actual
+为了计算假设的
+
+135
+00:04:38,070 --> 00:04:40,100
+value output of our hypotheses, we
+实际输出值 我们
+
+136
+00:04:40,250 --> 00:04:41,190
+then simply need to compute
+只需要计算
+
+137
+00:04:42,470 --> 00:04:44,980
+z3. So z3 is
+z(3) z(3)等于
+
+138
+00:04:45,350 --> 00:04:47,940
+equal to this term here that I'm just underlining.
+这里我画线的项
+
+139
+00:04:48,800 --> 00:04:51,450
+This inner term there is z3.
+这个方框里的项就是z(3)
+
+140
+00:04:53,980 --> 00:04:55,160
+And z3 is stated
+z(3)等于θ(2)
+
+141
+00:04:55,500 --> 00:04:57,120
+2 times a2 and finally
+乘以a(2) 最后
+
+142
+00:04:57,810 --> 00:04:59,560
+my hypotheses output h of x which
+假设输出为h(x)
+
+143
+00:04:59,750 --> 00:05:01,210
+is a3 that is
+它等于a(3)
+
+144
+00:05:01,360 --> 00:05:03,910
+the activation of my
+a(3)是输出层
+
+145
+00:05:04,750 --> 00:05:06,040
+one and only unit in
+唯一的单元
+
+146
+00:05:06,290 --> 00:05:09,500
+the output layer. So, that's just the real number. You can write it as a3
+它是一个实数 你可以写成a(3)
+
+147
+00:05:10,050 --> 00:05:12,390
+or as a(3)1 and that's g of z3.
+或a(3)1 这就是g(z(3))
+
+148
+00:05:13,240 --> 00:05:15,020
+This process of computing h of x
+这个计算h(x)的过程
+
+149
+00:05:15,940 --> 00:05:18,110
+is also called forward propagation
+也称为前向传播(forward propagation)
+
+150
+00:05:19,130 --> 00:05:20,440
+and is called that because we
+这样命名是因为
+
+151
+00:05:20,550 --> 00:05:21,310
+start of with the activations
+我们从
+
+152
+00:05:22,010 --> 00:05:24,400
+of the input-units and then
+输入层的激励开始
+
+153
+00:05:24,940 --> 00:05:26,770
+we sort of forward-propagate that to the
+然后进行前向传播给
+
+154
+00:05:26,860 --> 00:05:29,390
+hidden layer and compute the activations of the
+隐藏层并计算
+
+155
+00:05:29,580 --> 00:05:30,400
+hidden layer and then we
+隐藏层的激励 然后
+
+156
+00:05:30,540 --> 00:05:32,040
+sort of forward propagate that
+我们继续前向传播
+
+157
+00:05:32,760 --> 00:05:36,270
+and compute the activations of
+并计算输出层的激励
+
+158
+00:05:37,480 --> 00:05:39,170
+the output layer, but this process of computing the activations from the input then
+这个从输入层到
+
+159
+00:05:39,290 --> 00:05:40,400
+the hidden then the output layer,
+隐藏层再到输出层依次计算激励的
+
+160
+00:05:40,940 --> 00:05:42,030
+and that's also called forward propagation
+过程叫前向传播
+
+161
+00:05:43,320 --> 00:05:44,150
+and what we just did is
+我们刚刚得到了
+
+162
+00:05:44,310 --> 00:05:45,370
+we just worked out a vector
+这一过程的向量化
+
+163
+00:05:45,740 --> 00:05:47,140
+wise implementation of this
+实现方法
+
+164
+00:05:47,280 --> 00:05:48,890
+procedure. So, if you
+如果你
+
+165
+00:05:48,970 --> 00:05:50,260
+implement it using these equations
+使用右边这些公式实现它
+
+166
+00:05:50,800 --> 00:05:51,740
+that we have on the right, these
+就会得到
+
+167
+00:05:51,850 --> 00:05:53,280
+would give you an efficient way
+一个有效的
+
+168
+00:05:53,460 --> 00:05:54,980
+or both of the efficient way of
+计算h(x)
+
+169
+00:05:55,120 --> 00:05:56,130
+computing h of x.
+的方法
+
+170
+00:05:58,250 --> 00:05:59,860
+This forward propagation view also
+这种前向传播的角度
+
+171
+00:06:00,860 --> 00:06:02,270
+helps us to understand what
+也可以帮助我们了解
+
+172
+00:06:02,550 --> 00:06:03,640
+Neural Networks might be doing
+神经网络的原理
+
+173
+00:06:04,110 --> 00:06:05,290
+and why they might help us to
+和它为什么能够
+
+174
+00:06:05,510 --> 00:06:07,170
+learn interesting nonlinear hypotheses.
+帮助我们学习非线性假设
+
+175
+00:06:08,670 --> 00:06:09,760
+Consider the following neural network
+看一下这个神经网络
+
+176
+00:06:10,500 --> 00:06:11,820
+and let's say I cover up
+我会暂时盖住
+
+177
+00:06:12,040 --> 00:06:13,810
+the left path of this picture for now.
+图片的左边部分
+
+178
+00:06:14,650 --> 00:06:16,170
+If you look at what's left in this picture.
+如果你观察图中剩下的部分
+
+179
+00:06:17,030 --> 00:06:18,020
+This looks a lot like
+这看起来很像
+
+180
+00:06:18,260 --> 00:06:19,520
+logistic regression where what
+逻辑回归
+
+181
+00:06:19,660 --> 00:06:20,570
+we're doing is we're using
+在逻辑回归中 我们用
+
+182
+00:06:20,990 --> 00:06:22,000
+that note, that's just the
+这个节点 即
+
+183
+00:06:22,130 --> 00:06:23,770
+logistic regression unit and we're
+这个逻辑回归单元
+
+184
+00:06:24,120 --> 00:06:26,060
+using that to make a
+来预测
+
+185
+00:06:26,380 --> 00:06:28,290
+prediction h of x. And concretely,
+h(x)的值 具体来说
+
+186
+00:06:28,440 --> 00:06:30,340
+what the hypotheses is outputting
+假设输出的
+
+187
+00:06:30,710 --> 00:06:31,830
+is h of x is going
+h(x)将
+
+188
+00:06:31,890 --> 00:06:33,760
+to be equal to g which
+等于s型激励函数
+
+189
+00:06:33,980 --> 00:06:38,110
+is my sigmoid activation function times theta 0
+g(θ0
+
+190
+00:06:38,560 --> 00:06:40,450
+times a0 is equal
+xa0
+
+191
+00:06:41,270 --> 00:06:43,380
+to 1 plus theta 1
++θ1xa1
+
+192
+00:06:45,220 --> 00:06:49,080
+plus theta 2
++θ2xa2
+
+193
+00:06:49,260 --> 00:06:52,090
+times a2 plus theta
++θ3xa3)
+
+194
+00:06:52,830 --> 00:06:55,180
+3 times a3 whether
+其中
+
+195
+00:06:55,370 --> 00:06:56,910
+values a1, a2, a3
+a1 a2 a3
+
+196
+00:06:57,050 --> 00:06:59,860
+are those given by these three given units.
+由这三个单元给出
+
+197
+00:07:01,060 --> 00:07:02,790
+Now, to be actually consistent
+为了和我之前的定义
+
+198
+00:07:03,490 --> 00:07:05,000
+to my early notation. Actually, we
+保持一致 需要
+
+199
+00:07:05,170 --> 00:07:06,360
+need to, you know, fill in
+在这里
+
+200
+00:07:06,470 --> 00:07:10,700
+these superscript 2's here everywhere
+还有这些地方都填上上标(2)
+
+201
+00:07:12,260 --> 00:07:13,920
+and I also have these
+同样还要加上这些下标1
+
+202
+00:07:14,160 --> 00:07:16,800
+indices 1 there because I
+因为我只有
+
+203
+00:07:16,930 --> 00:07:20,610
+have only one output unit, but if you focus on the blue parts of the notation.
+一个输出单元 但如果你只观察蓝色的部分
+
+204
+00:07:20,930 --> 00:07:21,900
+This is, you know, this looks
+这看起来
+
+205
+00:07:22,150 --> 00:07:23,680
+awfully like the standard logistic
+非常像标准的
+
+206
+00:07:23,870 --> 00:07:25,530
+regression model, except that
+逻辑回归模型 不同之处在于
+
+207
+00:07:25,600 --> 00:07:28,060
+I now have a capital theta instead of lower case theta.
+我现在用的是大写的θ 而不是小写的θ
+
+208
+00:07:29,170 --> 00:07:30,690
+And what this is
+这样做完
+
+209
+00:07:30,850 --> 00:07:32,520
+doing is just logistic regression.
+我们只得到了逻辑回归
+
+210
+00:07:33,660 --> 00:07:35,240
+But where the features fed into
+但是 逻辑回归的
+
+211
+00:07:35,590 --> 00:07:37,250
+logistic regression are these
+输入特征值
+
+212
+00:07:38,200 --> 00:07:40,170
+values computed by the hidden layer.
+是通过隐藏层计算的
+
+213
+00:07:41,340 --> 00:07:42,690
+Just to say that again, what
+再说一遍
+
+214
+00:07:42,910 --> 00:07:44,420
+this neural network is doing is
+神经网络所做的
+
+215
+00:07:45,130 --> 00:07:47,050
+just like logistic regression, except
+就像逻辑回归 但是它
+
+216
+00:07:47,440 --> 00:07:48,900
+that rather than using the
+不是使用
+
+217
+00:07:49,110 --> 00:07:50,770
+original features x1, x2, x3,
+x1 x2 x3作为输入特征
+
+218
+00:07:52,400 --> 00:07:54,260
+is using these new features a1, a2, a3.
+而是用a1 a2 a3作为新的输入特征
+
+219
+00:07:54,440 --> 00:07:56,810
+Again, we'll put the superscripts
+同样 我们需要把
+
+220
+00:07:58,130 --> 00:08:00,380
+there, you know, to be consistent with the notation.
+上标加上来和之前的记号保持一致
+
+221
+00:08:02,820 --> 00:08:04,610
+And the cool thing about this,
+有趣的是
+
+222
+00:08:05,040 --> 00:08:06,220
+is that the features a1, a2,
+特征项a1 a2
+
+223
+00:08:06,720 --> 00:08:08,310
+a3, they themselves are learned
+a3它们是作为
+
+224
+00:08:08,760 --> 00:08:09,930
+as functions of the input.
+输入的函数来学习的
+
+225
+00:08:10,960 --> 00:08:12,640
+Concretely, the function mapping from
+具体来说 就是从第一层
+
+226
+00:08:13,320 --> 00:08:14,540
+layer 1 to layer 2,
+映射到第二层的函数
+
+227
+00:08:14,810 --> 00:08:16,390
+that is determined by some
+这个函数由其他
+
+228
+00:08:16,750 --> 00:08:18,550
+other set of parameters, theta 1.
+一组参数θ(1)决定
+
+229
+00:08:19,380 --> 00:08:20,210
+So it's as if the
+所以 在神经网络中
+
+230
+00:08:20,270 --> 00:08:22,030
+neural network, instead of being
+它没有用
+
+231
+00:08:22,240 --> 00:08:24,050
+constrained to feed the
+输入特征x1 x2 x3
+
+232
+00:08:24,120 --> 00:08:25,760
+features x1, x2, x3 to logistic regression.
+来训练逻辑回归
+
+233
+00:08:26,210 --> 00:08:27,440
+It gets to
+而是自己
+
+234
+00:08:27,720 --> 00:08:29,320
+learn its own features, a1,
+训练逻辑回归
+
+235
+00:08:29,810 --> 00:08:32,010
+a2, a3, to feed into the
+的输入
+
+236
+00:08:32,130 --> 00:08:33,950
+logistic regression and as
+a1 a2 a3
+
+237
+00:08:34,650 --> 00:08:36,270
+you can imagine depending on
+可以想象 如果
+
+238
+00:08:36,360 --> 00:08:37,690
+what parameters it chooses for
+在θ1中选择不同的参数
+
+239
+00:08:37,900 --> 00:08:39,880
+theta 1. You can learn some pretty interesting
+有时可以学习到一些
+
+240
+00:08:40,390 --> 00:08:42,460
+and complex features and therefore
+很有趣和复杂的特征 就可以
+
+241
+00:08:43,780 --> 00:08:44,830
+you can end up with a
+得到一个
+
+242
+00:08:45,050 --> 00:08:46,650
+better hypotheses than if
+更好的假设
+
+243
+00:08:46,840 --> 00:08:47,870
+you were constrained to use
+比使用原始输入
+
+244
+00:08:48,020 --> 00:08:50,520
+the raw features x1, x2 or x3 or if
+x1 x2或x3时得到的假设更好
+
+245
+00:08:50,640 --> 00:08:52,530
+you will constrain to say choose the
+你也可以
+
+246
+00:08:52,620 --> 00:08:53,730
+polynomial terms, you know,
+选择多项式项
+
+247
+00:08:53,920 --> 00:08:55,550
+x1, x2, x3, and so on.
+x1 x2 x3等作为输入项
+
+248
+00:08:55,790 --> 00:08:57,250
+But instead, this algorithm has
+但这个算法可以
+
+249
+00:08:57,530 --> 00:08:59,130
+the flexibility to try
+灵活地
+
+250
+00:08:59,420 --> 00:09:01,990
+to learn whatever features at once, using
+快速学习任意的特征项
+
+251
+00:09:02,680 --> 00:09:03,990
+these a1, a2, a3 in
+把这些a1 a2 a3
+
+252
+00:09:04,110 --> 00:09:05,190
+order to feed into this
+输入这个
+
+253
+00:09:05,510 --> 00:09:07,830
+last unit that's essentially
+最后的单元 实际上
+
+254
+00:09:09,240 --> 00:09:11,920
+a logistic regression here. I realized
+它是逻辑回归
+
+255
+00:09:12,550 --> 00:09:13,970
+this example is described as
+我觉得现在描述的这个例子
+
+256
+00:09:14,060 --> 00:09:15,500
+a somewhat high level and so
+有点高端 所以
+
+257
+00:09:15,750 --> 00:09:16,520
+I'm not sure if this intuition
+我不知道
+
+258
+00:09:17,440 --> 00:09:18,870
+of the neural network, you know, having
+你是否能理解
+
+259
+00:09:19,720 --> 00:09:21,420
+more complex features will quite
+这个具有更复杂特征项的
+
+260
+00:09:21,630 --> 00:09:23,120
+make sense yet, but if
+神经网络 但是
+
+261
+00:09:23,210 --> 00:09:24,440
+it doesn't yet in the next
+如果你没理解
+
+262
+00:09:24,810 --> 00:09:25,860
+two videos I'm going to
+在接下来的两个视频里
+
+263
+00:09:25,970 --> 00:09:27,300
+go through a specific example
+我会讲解一个具体的例子
+
+264
+00:09:28,250 --> 00:09:29,590
+of how a neural network can
+来说明神经网络
+
+265
+00:09:29,830 --> 00:09:30,860
+use this hidden there to compute
+如何利用这个隐藏层
+
+266
+00:09:31,250 --> 00:09:32,880
+more complex features to feed
+计算更复杂的特征
+
+267
+00:09:33,130 --> 00:09:34,520
+into this final output layer
+并输入到最后的输出层
+
+268
+00:09:35,060 --> 00:09:37,100
+and how that can learn more complex hypotheses.
+以及为什么这样就可以学习更复杂的假设
+
+269
+00:09:37,920 --> 00:09:39,120
+So, in case what I'm
+所以 如果我
+
+270
+00:09:39,180 --> 00:09:40,090
+saying here doesn't quite make
+现在讲的
+
+271
+00:09:40,230 --> 00:09:41,650
+sense, stick with me
+你没理解 请继续
+
+272
+00:09:41,810 --> 00:09:42,960
+for the next two videos and
+观看接下来的两个视频
+
+273
+00:09:43,190 --> 00:09:44,370
+hopefully out there working through
+希望它们
+
+274
+00:09:44,580 --> 00:09:46,690
+those examples this explanation will
+提供的例子能够
+
+275
+00:09:47,030 --> 00:09:48,640
+make a little bit more sense.
+让你更加理解神经网络
+
+276
+00:09:49,020 --> 00:09:49,740
+But just the point O. You
+但有一点
+
+277
+00:09:49,820 --> 00:09:51,120
+can have neural networks with
+你还可以用其他类型的图来
+
+278
+00:09:51,470 --> 00:09:52,990
+other types of diagrams as
+表示神经网络
+
+279
+00:09:53,080 --> 00:09:54,270
+well, and the way that
+神经网络中神经元
+
+280
+00:09:54,450 --> 00:09:58,000
+neural networks are connected, that's called the architecture.
+相连接的方式 称为神经网络的架构
+
+281
+00:09:58,390 --> 00:10:00,150
+So the term architecture refers to
+所以说 架构是指
+
+282
+00:10:00,490 --> 00:10:02,380
+how the different neurons are connected to each other.
+不同的神经元是如何相互连接的
+
+283
+00:10:03,220 --> 00:10:04,180
+This is an example
+这里有一个不同的
+
+284
+00:10:04,840 --> 00:10:06,300
+of a different neural network architecture
+神经网络架构的例子
+
+285
+00:10:07,480 --> 00:10:08,750
+and once again you may
+你可以
+
+286
+00:10:09,260 --> 00:10:10,770
+be able to get this intuition of
+意识到这个第二层
+
+287
+00:10:10,940 --> 00:10:12,180
+how the second layer,
+是如何工作的
+
+288
+00:10:12,900 --> 00:10:14,120
+here we have three heading units
+在这里 我们有三个隐藏单元
+
+289
+00:10:14,910 --> 00:10:16,200
+that are computing some complex
+它们根据输入层
+
+290
+00:10:16,660 --> 00:10:17,900
+function maybe of the
+计算一个复杂的函数
+
+291
+00:10:17,990 --> 00:10:19,530
+input layer, and then the
+然后第三层
+
+292
+00:10:19,730 --> 00:10:20,750
+third layer can take the
+可以将第二层
+
+293
+00:10:20,840 --> 00:10:22,260
+second layer's features and compute
+训练出的特征项作为输入
+
+294
+00:10:22,550 --> 00:10:24,070
+even more complex features in layer three
+并在第三层计算一些更复杂的函数
+
+295
+00:10:24,980 --> 00:10:25,880
+so that by the time you get
+这样 在你到达
+
+296
+00:10:25,960 --> 00:10:27,160
+to the output layer, layer four,
+输出层之前 即第四层
+
+297
+00:10:27,900 --> 00:10:29,130
+you can have even more
+就可以利用第三层
+
+298
+00:10:29,370 --> 00:10:30,690
+complex features of what
+训练出的更复杂的
+
+299
+00:10:30,860 --> 00:10:32,040
+you are able to compute in
+特征项作为输入
+
+300
+00:10:32,280 --> 00:10:34,710
+layer three and so get very interesting nonlinear hypotheses.
+以此得到非常有趣的非线性假设
+
+301
+00:10:36,730 --> 00:10:37,580
+By the way, in a network
+顺便说一下 在这样的
+
+302
+00:10:37,810 --> 00:10:38,980
+like this, layer one, this is
+网络里 第一层
+
+303
+00:10:39,130 --> 00:10:40,670
+called an input layer. Layer four
+被称为输入层 第四层
+
+304
+00:10:41,360 --> 00:10:43,170
+is still our output layer, and
+仍然是我们的输出层
+
+305
+00:10:43,340 --> 00:10:45,040
+this network has two hidden layers.
+这个网络有两个隐藏层
+
+306
+00:10:46,000 --> 00:10:47,440
+So anything that's not an
+所以 任何一个不是
+
+307
+00:10:48,000 --> 00:10:49,020
+input layer or an output
+输入层或输出层的
+
+308
+00:10:49,340 --> 00:10:50,590
+layer is called a hidden layer.
+都被称为隐藏层
+
+309
+00:10:53,390 --> 00:10:54,470
+So, hopefully from this video
+我希望从这个视频中
+
+310
+00:10:54,760 --> 00:10:55,840
+you've gotten a sense of
+你已经大致理解
+
+311
+00:10:56,140 --> 00:10:58,360
+how the feed forward propagation step
+前向传播在
+
+312
+00:10:58,830 --> 00:11:00,230
+in a neural network works where you
+神经网络里的工作原理:
+
+313
+00:11:00,390 --> 00:11:01,670
+start from the activations of
+从输入层的激励
+
+314
+00:11:01,720 --> 00:11:03,150
+the input layer and forward
+开始 向前
+
+315
+00:11:03,450 --> 00:11:04,480
+propagate that to the
+传播到
+
+316
+00:11:04,570 --> 00:11:05,560
+first hidden layer, then the second
+第一隐藏层 然后传播到第二
+
+317
+00:11:06,070 --> 00:11:08,200
+hidden layer, and then finally the output layer.
+隐藏层 最终到达输出层
+
+318
+00:11:08,990 --> 00:11:10,250
+And you also saw how
+并且你也知道了如何
+
+319
+00:11:10,560 --> 00:11:12,010
+we can vectorize that computation.
+向量化这些计算
+
+320
+00:11:13,660 --> 00:11:14,830
+In the next, I realized
+我发现
+
+321
+00:11:15,240 --> 00:11:16,680
+that some of the intuitions in this
+这个视频里我讲了
+
+322
+00:11:16,850 --> 00:11:19,220
+video of how, you know, other certain
+某些层是如何
+
+323
+00:11:19,550 --> 00:11:22,570
+layers are computing complex features of the early layers.
+计算前面层的复杂特征项
+
+324
+00:11:22,910 --> 00:11:23,540
+I realized some of that intuition
+我意识到这可能
+
+325
+00:11:24,190 --> 00:11:26,660
+may be still slightly abstract and kind of a high level.
+仍然有点抽象 显得比较高端
+
+326
+00:11:27,450 --> 00:11:28,240
+And so what I would like
+所以 我将
+
+327
+00:11:28,350 --> 00:11:29,460
+to do in the two videos
+在接下来的两个视频中
+
+328
+00:11:30,210 --> 00:11:31,540
+is work through a detailed example
+讨论具体的例子
+
+329
+00:11:32,510 --> 00:11:33,810
+of how a neural network can
+它描述了怎样用神经网络
+
+330
+00:11:33,960 --> 00:11:35,740
+be used to compute nonlinear
+来计算
+
+331
+00:11:36,710 --> 00:11:38,030
+functions of the input and
+输入的非线性函数
+
+332
+00:11:38,330 --> 00:11:39,450
+hope that will give you a
+希望能使你
+
+333
+00:11:39,540 --> 00:11:40,860
+good sense of the sorts of
+更好的理解
+
+334
+00:11:41,010 --> 00:11:44,630
+complex nonlinear hypotheses we can get out of Neural Networks.
+从神经网络中得到的复杂非线性假设
+
diff --git a/srt/8 - 5 - Examples and Intuitions I (7 min).srt b/srt/8 - 5 - Examples and Intuitions I (7 min).srt
new file mode 100644
index 00000000..f7729fb5
--- /dev/null
+++ b/srt/8 - 5 - Examples and Intuitions I (7 min).srt
@@ -0,0 +1,1006 @@
+1
+00:00:00,130 --> 00:00:00,980
+In this and the next
+在接下来两节视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,240 --> 00:00:02,030
+video I want to work
+我要通过讲解
+
+3
+00:00:02,140 --> 00:00:03,650
+through a detailed example, showing
+一个具体的例子来解释
+
+4
+00:00:04,530 --> 00:00:05,920
+how a neural network can compute
+神经网络是如何计算
+
+5
+00:00:06,220 --> 00:00:07,740
+a complex nonlinear function of
+关于输入的复杂的非线性函数
+
+6
+00:00:07,970 --> 00:00:09,780
+the input and hopefully, this will
+希望这个例子可以
+
+7
+00:00:09,950 --> 00:00:10,950
+give you a good sense of why
+让你了解为什么
+
+8
+00:00:11,510 --> 00:00:12,470
+Neural Networks can be used
+神经网络可以用来
+
+9
+00:00:13,050 --> 00:00:14,810
+to learn complex, nonlinear hypotheses.
+学习复杂的非线性假设
+
+10
+00:00:16,790 --> 00:00:18,210
+Consider the following problem where
+考虑下面的问题
+
+11
+00:00:18,900 --> 00:00:20,560
+we have input features x1
+我们有二进制的
+
+12
+00:00:20,770 --> 00:00:21,680
+and x2 that are binary
+输入特征
+
+13
+00:00:22,310 --> 00:00:23,760
+values, so either zero or one.
+x1 x2 要么取0 要么取1
+
+14
+00:00:23,990 --> 00:00:25,320
+So x1 and x2 can each
+所以x1和x2只能
+
+15
+00:00:25,510 --> 00:00:27,160
+take on only one of two possible values.
+有两种取值
+
+16
+00:00:28,660 --> 00:00:29,670
+In this example, I've drawn
+在这个例子中 我只画出了
+
+17
+00:00:29,990 --> 00:00:31,420
+only two positive examples and
+两个正样本和
+
+18
+00:00:31,530 --> 00:00:33,240
+two negative examples, but you
+两个负样本
+
+19
+00:00:33,320 --> 00:00:34,370
+can think of this as a
+但你可以认为这是一个
+
+20
+00:00:34,540 --> 00:00:36,210
+simplified version of a
+更复杂的学习问题的
+
+21
+00:00:36,290 --> 00:00:37,710
+more complex learning problem where
+简化版本
+
+22
+00:00:37,920 --> 00:00:38,910
+we may have a bunch
+在这个复杂问题中 我们可能
+
+23
+00:00:39,030 --> 00:00:40,320
+of positive examples in the upper
+在右上角有一堆正样本
+
+24
+00:00:40,480 --> 00:00:41,350
+right and the lower left and
+在左下方有
+
+25
+00:00:41,470 --> 00:00:43,090
+a bunch of negative examples to notify
+一堆用圆圈
+
+26
+00:00:43,580 --> 00:00:46,110
+the circles, and what
+表示的负样本
+
+27
+00:00:46,140 --> 00:00:46,900
+we'd like to do is learn a nonlinear, you know,
+我们想要学习一种非线性的
+
+28
+00:00:48,330 --> 00:00:50,090
+decision boundary that we
+决策边界来
+
+29
+00:00:50,210 --> 00:00:52,210
+need to separate the positive and the negative examples.
+区分正负样本
+
+30
+00:00:53,750 --> 00:00:54,590
+So how can a neural
+那么 神经网络是
+
+31
+00:00:55,070 --> 00:00:56,160
+network do this and rather than
+如何做到的呢?
+
+32
+00:00:56,710 --> 00:00:57,550
+use an example on the
+为了描述方便我不用右边这个例子
+
+33
+00:00:57,600 --> 00:00:59,260
+right. I'm going to use this, maybe easier
+我用左边这个例子
+
+34
+00:00:59,680 --> 00:01:01,670
+to examine example on the left.
+这样更容易说明
+
+35
+00:01:02,620 --> 00:01:03,940
+Concretely, what this is
+具体来讲 这里需要计算的是
+
+36
+00:01:04,110 --> 00:01:05,570
+is really computing the target
+目标函数y
+
+37
+00:01:05,990 --> 00:01:09,810
+label y equals x1 XOR x2.
+等于x1异或x2
+
+38
+00:01:10,070 --> 00:01:11,650
+Or this is actually the
+或者 y也可以等于
+
+39
+00:01:11,910 --> 00:01:13,880
+x1 XNOR x2 function
+x1 异或非 x2
+
+40
+00:01:14,700 --> 00:01:15,750
+where XNOR is the alternative
+其中异或非表示
+
+41
+00:01:16,400 --> 00:01:18,420
+notation for "not x1 or x2".
+x1异或x2后取反
+
+42
+00:01:19,350 --> 00:01:20,730
+So x1, XOR or
+X1异或X2
+
+43
+00:01:20,760 --> 00:01:22,730
+x2 - that's true only
+为真当且仅当
+
+44
+00:01:23,210 --> 00:01:24,820
+if exactly one of
+这两个值
+
+45
+00:01:25,190 --> 00:01:27,900
+x1 or x2 is equal to 1.
+X1或者X2中有且仅有一个为1
+
+46
+00:01:27,960 --> 00:01:29,160
+It turns out that the specific
+如果我
+
+47
+00:01:29,450 --> 00:01:30,680
+example I'm going to use works out
+用XNOR作为例子
+
+48
+00:01:30,810 --> 00:01:32,840
+a little bit better if we
+比用NOT作为例子
+
+49
+00:01:33,020 --> 00:01:35,000
+use the XNOR example, instead.
+结果会好一些
+
+50
+00:01:35,460 --> 00:01:36,290
+These two are the same, of course.
+但这两个其实是相同的
+
+51
+00:01:36,720 --> 00:01:38,540
+It means not x1 XOR
+这就意味着在x1
+
+52
+00:01:38,780 --> 00:01:40,140
+x2, and so we're going
+异或x2后再取反 即
+
+53
+00:01:40,320 --> 00:01:42,360
+to have positive examples
+当它们同时为真
+
+54
+00:01:42,950 --> 00:01:44,150
+if either both are true or
+或者同时为假的时候
+
+55
+00:01:44,530 --> 00:01:46,470
+both are false and we'll
+我们将获得
+
+56
+00:01:46,620 --> 00:01:49,600
+have that's y equals 1, y equals 1 and
+y等于1
+
+57
+00:01:49,990 --> 00:01:51,480
+we're going to have y equals 0 if
+y为0的结果
+
+58
+00:01:51,860 --> 00:01:52,650
+only one of them is
+如果它们中仅有一个
+
+59
+00:01:52,760 --> 00:01:53,830
+true and we want
+为真 y则为0
+
+60
+00:01:54,000 --> 00:01:54,710
+to figure out if we can
+我们想要知道是否能
+
+61
+00:01:54,860 --> 00:01:57,210
+get a neural network to fit to this sort of training set.
+找到一个神经网络模型来拟合这种训练集
+
+62
+00:01:59,160 --> 00:02:00,200
+In order to build up
+为了建立
+
+63
+00:02:00,450 --> 00:02:01,610
+to a network that fits the
+能拟合XNOR运算
+
+64
+00:02:02,080 --> 00:02:04,900
+XNOR example, we're going
+的神经网络 我们先
+
+65
+00:02:05,350 --> 00:02:06,590
+to start to a slightly simpler one
+讲解一个稍微简单
+
+66
+00:02:07,050 --> 00:02:09,710
+and show a network that fits the AND function.
+的神经网络 它拟合了“且运算”
+
+67
+00:02:10,760 --> 00:02:12,150
+Concretely, lets say we
+假设我们
+
+68
+00:02:12,310 --> 00:02:14,070
+have inputs x1 and
+有输入x1和
+
+69
+00:02:14,240 --> 00:02:17,190
+x2 that are again binary. So, it's either zero or one.
+x2 并且都是二进制 即要么为0要么为1
+
+70
+00:02:17,820 --> 00:02:18,680
+And let's say our target
+我们的目标
+
+71
+00:02:18,760 --> 00:02:20,980
+labels y are you know, is
+函数y正如你所知道的
+
+72
+00:02:21,910 --> 00:02:23,470
+equal to x1 and x2.
+等于x1且x2
+
+73
+00:02:23,860 --> 00:02:24,870
+This is a logical AND.
+这是一个逻辑与
+
+74
+00:02:30,740 --> 00:02:31,820
+So can we get a
+那么 我们怎样得到一个
+
+75
+00:02:32,060 --> 00:02:34,330
+one unit network to compute
+具有单个神经元的神经网络来计算
+
+76
+00:02:35,060 --> 00:02:36,120
+this logical AND function?
+这个逻辑与呢
+
+77
+00:02:37,400 --> 00:02:38,530
+In order to do so, I'm
+为了做到这一点
+
+78
+00:02:38,690 --> 00:02:40,000
+going to actually draw in
+我也需要画出偏置单元
+
+79
+00:02:40,580 --> 00:02:42,780
+the bias unit as well, the plus one unit.
+即这个里面有个+1的单元
+
+80
+00:02:45,030 --> 00:02:46,500
+Now, let me just assign some
+现在 让我给这个网络
+
+81
+00:02:46,770 --> 00:02:48,050
+values to the weights or
+分配一些权重
+
+82
+00:02:48,160 --> 00:02:50,130
+the parameters of this network.
+或参数
+
+83
+00:02:50,450 --> 00:02:52,220
+I am going to write down the parameters on this diagram.
+我在图上写出这些参数
+
+84
+00:02:52,820 --> 00:02:54,090
+Write minus 30 here
+这里是-30
+
+85
+00:02:56,360 --> 00:02:57,740
+plus 20 and plus
+正20
+
+86
+00:02:58,710 --> 00:02:59,600
+20 and what this means
+正20 即我给
+
+87
+00:02:59,970 --> 00:03:01,320
+is that I'm assigning a value
+x0前面的
+
+88
+00:03:01,860 --> 00:03:03,790
+of minus thirty to the
+系数赋值
+
+89
+00:03:04,120 --> 00:03:05,770
+value associated with x0.
+为-30.
+
+90
+00:03:06,120 --> 00:03:07,230
+This is plus 1 going
+这个正1会
+
+91
+00:03:07,530 --> 00:03:08,840
+to this unit and a
+作为这个单元的值
+
+92
+00:03:09,420 --> 00:03:10,890
+parameter value of plus 20
+关于20的参数值
+
+93
+00:03:11,250 --> 00:03:12,960
+that multiplies in x1 in
+且x1乘以+20
+
+94
+00:03:13,070 --> 00:03:14,300
+a value of plus 20 for
+以及x2乘以+20
+
+95
+00:03:14,680 --> 00:03:15,980
+the parameter that multiplies into x2.
+都是这个单元的输入
+
+96
+00:03:17,190 --> 00:03:18,860
+So, concretely, this is saying
+所以
+
+97
+00:03:19,060 --> 00:03:20,340
+that my hypotheses h of
+我的假设?(x)
+
+98
+00:03:20,420 --> 00:03:21,780
+x is equal to
+等于
+
+99
+00:03:22,410 --> 00:03:24,500
+g of -30 + 20x1 + 20x2.
+g(-30 + 20x1 + 20x2)
+
+100
+00:03:25,490 --> 00:03:31,390
+So sometimes it's just
+在图上画出
+
+101
+00:03:31,640 --> 00:03:33,240
+convenient to draw these
+这些参数和
+
+102
+00:03:33,810 --> 00:03:34,880
+weights and draw these parameters
+权重是很方便很直观的
+
+103
+00:03:35,620 --> 00:03:38,250
+up here, you know, in the diagram of the neural network.
+其实 在这幅神经网络图中
+
+104
+00:03:38,790 --> 00:03:40,230
+And of course this minus 30
+这个-30
+
+105
+00:03:40,390 --> 00:03:42,500
+this is actually theta 1
+其实是θ(1)10
+
+106
+00:03:43,670 --> 00:03:44,830
+of 1,0.
+这个是
+
+107
+00:03:45,290 --> 00:03:47,390
+This is theta 1
+θ(1)11
+
+108
+00:03:47,600 --> 00:03:50,550
+of 1,1 and that's theta
+这是
+
+109
+00:03:51,560 --> 00:03:52,990
+1 of 1,2
+θ(1)12
+
+110
+00:03:53,290 --> 00:03:54,320
+but it's just easier think about
+但把它想成
+
+111
+00:03:54,590 --> 00:03:56,660
+it as, you know, associating these
+这些边的
+
+112
+00:03:56,840 --> 00:03:58,430
+parameters with the edges of the network.
+权重会更容易理解
+
+113
+00:04:01,170 --> 00:04:04,170
+Let's look at what this little single neuron network will compute.
+让我们来看看这个小神经元是怎样计算的
+
+114
+00:04:05,050 --> 00:04:06,290
+Just to remind you, the sigmoid
+回忆一下 s型
+
+115
+00:04:06,720 --> 00:04:08,820
+activation function g of z looks like this.
+激励函数g(z)看起来是这样的
+
+116
+00:04:09,110 --> 00:04:10,810
+It starts from 0, rises
+它从0开始 光滑
+
+117
+00:04:11,160 --> 00:04:12,270
+smoothly, crosses 0.5, and
+上升 穿过0.5
+
+118
+00:04:12,750 --> 00:04:14,720
+then it asymptotes at one.
+渐进到1.
+
+119
+00:04:15,730 --> 00:04:16,510
+And to give you some landmarks,
+我们给出一些坐标
+
+120
+00:04:17,350 --> 00:04:18,850
+if the horizontal axis value
+如果横轴值
+
+121
+00:04:19,460 --> 00:04:21,770
+z is equal to 4.6, then
+z等于4.6 则
+
+122
+00:04:23,840 --> 00:04:25,910
+the sigmoid function is equal to 0.99.
+S形函数等于0.99
+
+123
+00:04:26,220 --> 00:04:27,950
+This is very close
+这是非常接近
+
+124
+00:04:28,150 --> 00:04:29,560
+to 1 and kind of symmetrically
+1的 并且由于对称性
+
+125
+00:04:30,350 --> 00:04:32,270
+if it is negative 4.6, then
+如果z为-4.6
+
+126
+00:04:33,090 --> 00:04:34,970
+the sigmoid function there is
+S形函数
+
+127
+00:04:35,080 --> 00:04:37,820
+equal to 0.01 which is very close to 0.
+等于0.01 非常接近0
+
+128
+00:04:39,440 --> 00:04:40,700
+Let's look at the four possible input
+让我们来看看四种可能的输入值
+
+129
+00:04:41,040 --> 00:04:41,680
+values for x1 and x2
+x1和x2的四种可能输入
+
+130
+00:04:41,730 --> 00:04:43,470
+and look at whether the hypothesis will
+看看我们的假设
+
+131
+00:04:43,620 --> 00:04:47,090
+open in that case.
+在各种情况下的输出
+
+132
+00:04:47,220 --> 00:04:47,910
+If x1 and x2 are both
+如果X1和X2均为
+
+133
+00:04:48,150 --> 00:04:49,160
+equal to 0 - if
+0 那么
+
+134
+00:04:49,460 --> 00:04:50,560
+you look at this, if
+你看看这个 如果
+
+135
+00:04:50,710 --> 00:04:51,650
+x1 and x2 are both equal
+x1和x2都等于
+
+136
+00:04:52,010 --> 00:04:54,780
+to 0 then the hypotheses of point g of -30.
+为0 则假设会输出g(-30)
+
+137
+00:04:55,120 --> 00:04:56,790
+So, it's like very
+g(-30)在图的
+
+138
+00:04:57,290 --> 00:04:58,510
+far to the left of this diagram.
+很左边的地方
+
+139
+00:04:58,750 --> 00:05:01,380
+This will be very close to 0.
+非常接近于0
+
+140
+00:05:01,590 --> 00:05:03,160
+If x1 equals 0 and
+如果x1等于0且
+
+141
+00:05:03,330 --> 00:05:05,100
+x2 equals 1 then this
+x2等于1 那么
+
+142
+00:05:05,550 --> 00:05:07,610
+formula here evaluates to
+此公式等于
+
+143
+00:05:07,830 --> 00:05:09,470
+g, thus the sigmoid function applied
+g关于
+
+144
+00:05:09,890 --> 00:05:12,000
+to -10 and again,
+-10取值
+
+145
+00:05:12,450 --> 00:05:13,640
+that's, you know, to the far left
+也在很左边的位置
+
+146
+00:05:13,880 --> 00:05:14,970
+of this plot and so,
+所以
+
+147
+00:05:15,150 --> 00:05:16,540
+that's again very close to 0.
+也是非常接近0
+
+148
+00:05:16,660 --> 00:05:19,180
+This is also g of -10.
+这个也是g(-10)
+
+149
+00:05:19,270 --> 00:05:21,320
+That is if x1
+也就是说 如果x1
+
+150
+00:05:22,000 --> 00:05:24,110
+is equal to 1 and
+等于1并且
+
+151
+00:05:24,560 --> 00:05:26,110
+x(2)0, this is -30 plus 20, which is -10.
+x2等于0 这就是-30加20等于-10
+
+152
+00:05:26,230 --> 00:05:28,450
+And finally if
+最后 如??果
+
+153
+00:05:28,590 --> 00:05:29,940
+x1 equals 1, x2 equals
+x1等于1 x2等于
+
+154
+00:05:30,670 --> 00:05:31,970
+1, then you have g of
+1 那么这等于
+
+155
+00:05:32,770 --> 00:05:34,020
+-30 +20 +20,
+-30 +20 +20
+
+156
+00:05:34,190 --> 00:05:35,370
+so that's g of
+所以这是
+
+157
+00:05:35,440 --> 00:05:36,480
++10, which is
+取+10时
+
+158
+00:05:36,710 --> 00:05:38,140
+therefore very close to 1.
+非常接近1
+
+159
+00:05:39,040 --> 00:05:40,210
+And if you look
+如果你看看
+
+160
+00:05:40,490 --> 00:05:42,700
+in this column, this is
+在这一列 这就是
+
+161
+00:05:43,010 --> 00:05:45,280
+exactly the logical "and" function.
+逻辑“与”的计算结果
+
+162
+00:05:45,820 --> 00:05:47,790
+So, this is computing h of
+所以 这里得到的h
+
+163
+00:05:47,890 --> 00:05:49,870
+x is, you know,
+h关于x取值
+
+164
+00:05:50,260 --> 00:05:54,910
+approximately x1 and x2.
+近似等于x1和x2的与运算的值
+
+165
+00:05:55,200 --> 00:05:56,210
+In other words, it outputs
+换句话说 假设输出
+
+166
+00:05:56,650 --> 00:05:57,820
+1 if and only
+1 当且仅当
+
+167
+00:05:58,270 --> 00:05:59,470
+if x1 and x2 are
+x1 x2
+
+168
+00:06:00,950 --> 00:06:02,410
+both equal to 1.
+都等于1
+
+169
+00:06:03,360 --> 00:06:04,840
+So by writing out our little
+所以 通过写出
+
+170
+00:06:05,320 --> 00:06:07,070
+truth table like this,
+这张真值表
+
+171
+00:06:07,780 --> 00:06:09,060
+we manage to figure out what's
+我们就弄清楚了
+
+172
+00:06:09,350 --> 00:06:11,170
+the logical function that our
+神经网络
+
+173
+00:06:11,650 --> 00:06:12,870
+neural network computes.
+计算出的逻辑函数
+
+174
+00:06:16,990 --> 00:06:18,350
+This network shown here computes
+这里的神经网络
+
+175
+00:06:18,880 --> 00:06:20,280
+the OR function just to
+实现了或函数的功能
+
+176
+00:06:20,370 --> 00:06:21,810
+show you how I worked that out.
+接下来我告诉你是怎么看出来的
+
+177
+00:06:22,530 --> 00:06:23,230
+If you are to write out
+如果你把
+
+178
+00:06:23,680 --> 00:06:25,240
+the hypotheses you find that
+假设写出来
+
+179
+00:06:25,360 --> 00:06:26,690
+it's computing g of
+会发现它等于
+
+180
+00:06:27,110 --> 00:06:29,980
+-10 +20 x1
+g关于-10 +20x1
+
+181
+00:06:30,170 --> 00:06:32,040
++20 x2. And so
++20x2的取值
+
+182
+00:06:32,270 --> 00:06:33,380
+if you fill in these values you
+如果把这些值都填上
+
+183
+00:06:33,520 --> 00:06:35,110
+find that's g of
+会发现
+
+184
+00:06:35,460 --> 00:06:37,080
+-10 which is approximately 0,
+这是g(-10) 约等于0
+
+185
+00:06:37,820 --> 00:06:38,840
+g of 10 which is
+这是g(10)
+
+186
+00:06:39,040 --> 00:06:40,550
+approximately 1, and so on.
+约等于1
+
+187
+00:06:40,930 --> 00:06:42,650
+These are approximately 1, and approximately
+这个也约等于1
+
+188
+00:06:43,550 --> 00:06:45,410
+1, and these numbers is
+这些数字
+
+189
+00:06:46,160 --> 00:06:47,650
+essentially the logical OR
+本质上就是逻辑或
+
+190
+00:06:47,860 --> 00:06:50,210
+function. So, hopefully
+运算得到的值 所以 我希望
+
+191
+00:06:50,590 --> 00:06:52,010
+with this, you now understand how
+通过这个例子 你现在明白了
+
+192
+00:06:52,350 --> 00:06:53,930
+single neurons in a
+神经网络里单个的
+
+193
+00:06:54,020 --> 00:06:54,980
+neural network can be used
+神经元在计算
+
+194
+00:06:55,180 --> 00:06:58,390
+to compute logical functions like AND and OR and so on.
+如AND和OR逻辑运算时是怎样发挥作用的
+
+195
+00:06:59,000 --> 00:07:00,280
+In the next video, we'll continue
+在接下来的视频中 我们将继续
+
+196
+00:07:00,790 --> 00:07:03,870
+building on these examples and work through a more complex example.
+讲解一个更复杂的例子
+
+197
+00:07:04,730 --> 00:07:05,610
+We'll get to show you how
+我们将告诉你
+
+198
+00:07:06,170 --> 00:07:07,570
+a neural network, now with
+一个多层的神经网络
+
+199
+00:07:07,820 --> 00:07:09,780
+multiple layers of units can
+怎样被用于
+
+200
+00:07:09,960 --> 00:07:10,960
+be used to compute more complex
+计算更复杂的函数
+
+201
+00:07:11,400 --> 00:07:13,870
+functions like the XOR function or the XNOR function.
+如 XOR 函数或 XNOR 函数
+
diff --git a/srt/8 - 6 - Examples and Intuitions II (10 min).srt b/srt/8 - 6 - Examples and Intuitions II (10 min).srt
new file mode 100644
index 00000000..c039bb8c
--- /dev/null
+++ b/srt/8 - 6 - Examples and Intuitions II (10 min).srt
@@ -0,0 +1,1286 @@
+1
+00:00:00,420 --> 00:00:01,540
+In this video, I'd like you
+在这段视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,700 --> 00:00:02,680
+to work in through our example
+我想通过例子来向大家展示
+
+3
+00:00:03,390 --> 00:00:04,480
+to show how a neural
+一个神经网络
+
+4
+00:00:04,730 --> 00:00:07,140
+network can compute complex nonlinear hypotheses.
+是怎样计算非线性的假设函数
+
+5
+00:00:10,110 --> 00:00:11,240
+In the last video, we saw
+在上一段视频中
+
+6
+00:00:11,490 --> 00:00:12,790
+how a neural network can
+我们学习了
+
+7
+00:00:13,020 --> 00:00:13,900
+be used to compute the functions
+怎样运用神经网络
+
+8
+00:00:14,420 --> 00:00:16,120
+x1 and x2 and the
+来计算x1和x2的与运算
+
+9
+00:00:16,230 --> 00:00:18,410
+function x1 or x2 when
+以及x1和x2的或运算
+
+10
+00:00:18,750 --> 00:00:20,250
+x1 and x2 are binary.
+其中x1和x2都是二进制数
+
+11
+00:00:20,870 --> 00:00:23,080
+That is, when they take on values of 0, 1.
+也就是说 它们的值只能为0或1
+
+12
+00:00:23,230 --> 00:00:24,580
+We can also have a
+同时 我们也学习了
+
+13
+00:00:24,620 --> 00:00:27,130
+network to compute negation, that
+怎样进行逻辑非运算
+
+14
+00:00:27,330 --> 00:00:30,040
+is to compute the function "not x1".
+也就是计算 "非x1"
+
+15
+00:00:30,280 --> 00:00:31,670
+Let me just write
+我先写出这个神经网络中
+
+16
+00:00:31,890 --> 00:00:33,670
+down the ways associated with this network.
+相连接的各权值
+
+17
+00:00:33,970 --> 00:00:35,350
+We have only one input feature, x1
+这里我们只有一个输入量x1
+
+18
+00:00:35,450 --> 00:00:36,550
+in this case and the
+在这里我们也加上了
+
+19
+00:00:36,620 --> 00:00:38,210
+bias unit plus 1 and
+表示偏差的单位元 +1
+
+20
+00:00:38,680 --> 00:00:40,130
+if I associate this with
+如果我将输入单元和两个权数相连
+
+21
+00:00:41,070 --> 00:00:42,610
+the weights +10 and
+也就是+10和-20
+
+22
+00:00:43,120 --> 00:00:45,700
+-20 then my hypotheses is computing this.
+则可用以下假设方程来计算
+
+23
+00:00:46,080 --> 00:00:47,740
+H of x equals sigmoid of
+h(x)=g(10-20x1)
+
+24
+00:00:47,880 --> 00:00:49,600
+10 minus 20 times x1
+其中g是一个S型函数
+
+25
+00:00:50,390 --> 00:00:51,710
+so when x1 is
+那么
+
+26
+00:00:51,940 --> 00:00:52,880
+equal to 0, my
+当x1等于0时
+
+27
+00:00:52,960 --> 00:00:54,060
+hypothesis will be computing
+计算出假设函数
+
+28
+00:00:55,160 --> 00:00:57,340
+g of 10 minus
+g(10-20*0)
+
+29
+00:00:57,970 --> 00:00:59,910
+20 times 0 which is just 10.
+也就是g(10)
+
+30
+00:01:00,090 --> 00:01:01,600
+And so that's approximately
+这个值近似的等于1
+
+31
+00:01:02,440 --> 00:01:03,390
+1, and when is x is
+而当x等于1时
+
+32
+00:01:03,500 --> 00:01:04,300
+equal to 1 this will
+计算出的假设函数则变成
+
+33
+00:01:04,380 --> 00:01:05,740
+be g of -10, which
+g(-10)
+
+34
+00:01:06,210 --> 00:01:09,380
+is therefore approximately equal to 0.
+也就是约等于0
+
+35
+00:01:09,550 --> 00:01:10,320
+And if you look at what
+如果你观察这两个值
+
+36
+00:01:10,450 --> 00:01:11,720
+these values are, that's essentially
+你会发现这实际上计算的
+
+37
+00:01:12,230 --> 00:01:13,470
+the "not x1" function.
+就是“非x1”函数
+
+38
+00:01:14,560 --> 00:01:16,410
+So to include negations, the
+所以要计算逻辑非运算
+
+39
+00:01:16,700 --> 00:01:18,640
+general idea is to put
+总体思路是
+
+40
+00:01:19,080 --> 00:01:20,460
+a large negative weight in front
+在你希望取非运算的变量前面
+
+41
+00:01:20,650 --> 00:01:22,870
+of the variable you want to negate.
+放上一个绝对值大的负数作为权值
+
+42
+00:01:23,100 --> 00:01:24,710
+So if it's -20, multiplied by
+因此 如果放一个-20
+
+43
+00:01:25,590 --> 00:01:26,780
+x1 and, you know, that's the general
+那么和x1相乘
+
+44
+00:01:27,230 --> 00:01:28,110
+idea of how you end
+很显然 最终的结果
+
+45
+00:01:28,320 --> 00:01:30,500
+up negating x1.
+就得到了对x1进行非运算的效果
+
+46
+00:01:30,700 --> 00:01:32,210
+And so, in an example that
+另外 我再给出一个例子
+
+47
+00:01:32,580 --> 00:01:33,410
+I hope you will figure out
+计算这样一个函数
+
+48
+00:01:33,480 --> 00:01:35,090
+yourself, if you want
+(非x1)与(非x2)
+
+49
+00:01:35,280 --> 00:01:36,410
+to compute a function like this:
+我希望大家思考一下
+
+50
+00:01:36,580 --> 00:01:38,870
+"not x1 and not x2"
+自己动手算一算
+
+51
+00:01:39,090 --> 00:01:40,100
+you know, while part of that would
+你大概应该知道
+
+52
+00:01:40,390 --> 00:01:41,860
+probably be putting large negative
+至少应该在x1和x2前面
+
+53
+00:01:42,290 --> 00:01:44,150
+weights in front of x1
+放一个绝对值比较大的负数作为权值
+
+54
+00:01:44,500 --> 00:01:45,330
+and x2, but it should
+不过
+
+55
+00:01:45,580 --> 00:01:47,320
+be feasible to get a
+还有一种可行的方法
+
+56
+00:01:47,490 --> 00:01:49,910
+neural network with just
+是建立一个神经网络来计算
+
+57
+00:01:50,420 --> 00:01:52,810
+one output unit to compute this as well.
+用只有一个输出单元的神经网络
+
+58
+00:01:52,990 --> 00:01:53,460
+All right?
+没问题吧?
+
+59
+00:01:53,680 --> 00:01:55,130
+So, this large fill
+因此
+
+60
+00:01:55,300 --> 00:01:56,290
+function "not x1 and not
+这个看起来很长的逻辑函数
+
+61
+00:01:56,590 --> 00:01:57,990
+x2" is going to
+“(非x1)与(非x2)”的值
+
+62
+00:01:58,210 --> 00:02:00,450
+be equal to 1
+将等于1
+
+63
+00:02:00,780 --> 00:02:06,960
+if, and only if, x1
+当且仅当
+
+64
+00:02:07,350 --> 00:02:09,860
+equals x2 equals zero, right?
+x1等于x2等于0
+
+65
+00:02:10,420 --> 00:02:11,480
+So this is a logical function, this
+所以 这是个逻辑函数
+
+66
+00:02:11,680 --> 00:02:14,290
+is not x1, that means x1 must be zero and not x2.
+这里是非x1 也就是说x1必为0
+
+67
+00:02:14,530 --> 00:02:17,130
+That means x2 must be equal to zero as well.
+然后是非x2 这表示x2也必为0
+
+68
+00:02:17,800 --> 00:02:19,210
+So this logical function is
+因此这个逻辑函数等于1
+
+69
+00:02:19,450 --> 00:02:20,210
+equal to 1 if, and only
+当且仅当
+
+70
+00:02:20,540 --> 00:02:22,900
+if, both x1 and x2 are equal to zero.
+x1和x2的值都为0时成立
+
+71
+00:02:23,910 --> 00:02:25,600
+And hopefully, you should be
+现在你应该也清楚了
+
+72
+00:02:25,710 --> 00:02:26,630
+able to figure out how to make a
+怎样建立一个小规模的神经网络
+
+73
+00:02:26,950 --> 00:02:28,240
+small neural network to compute
+来计算
+
+74
+00:02:28,640 --> 00:02:29,830
+this logical function as well.
+这个逻辑函数的值
+
+75
+00:02:33,430 --> 00:02:34,350
+Now, taking the three pieces
+把以上我们介绍的
+
+76
+00:02:34,820 --> 00:02:36,720
+that we have put together, the
+这三个部分内容放在一起
+
+77
+00:02:37,400 --> 00:02:38,710
+network for computing x1 and
+"x1与x2"与运算的网络
+
+78
+00:02:38,910 --> 00:02:40,620
+x2 and the network for
+以及计算
+
+79
+00:02:40,960 --> 00:02:42,070
+computing not x1 and
+"(非x1)与(非x2)"的网络
+
+80
+00:02:42,340 --> 00:02:44,170
+not x2 and one last
+还有最后一个是
+
+81
+00:02:44,620 --> 00:02:45,910
+network for computing x1 or
+"x1或x2"的或运算网络
+
+82
+00:02:46,570 --> 00:02:47,700
+x2, we should be
+把这三个网络放在一起
+
+83
+00:02:47,760 --> 00:02:49,420
+able to put these three pieces together
+我们就应该能计算
+
+84
+00:02:49,840 --> 00:02:51,270
+to compute this x1, XNOR
+"x1 XNOR x2"
+
+85
+00:02:51,470 --> 00:02:52,810
+x2 function.
+也就是同或门运算
+
+86
+00:02:53,860 --> 00:02:54,930
+And just to remind you, if
+提醒一下
+
+87
+00:02:55,100 --> 00:02:57,130
+this was x1, x2, this
+如果这是x1 x2
+
+88
+00:02:58,080 --> 00:02:58,830
+function that we want to
+那么我们想要计算的这个函数
+
+89
+00:02:59,090 --> 00:03:00,900
+compute would have negative examples
+在这里和这里是负样本
+
+90
+00:03:01,520 --> 00:03:02,690
+here and here and we'd
+而在这里和这里
+
+91
+00:03:02,830 --> 00:03:04,370
+have positive examples there and there.
+函数有正样本值
+
+92
+00:03:04,730 --> 00:03:06,270
+And so clearly this, you know, we'll
+那么很显然
+
+93
+00:03:06,570 --> 00:03:08,400
+need a nonlinear decision boundary
+为了分隔开正样本和负样本
+
+94
+00:03:08,940 --> 00:03:10,540
+in order to separate the positive and negative examples.
+我们需要一个非线性的判别边界
+
+95
+00:03:12,950 --> 00:03:13,460
+Let's draw the network.
+这里我们用以下这个网络来解决
+
+96
+00:03:14,260 --> 00:03:15,820
+I'm going to take my input
+取输入单元
+
+97
+00:03:16,570 --> 00:03:18,610
+plus 1, x1, x2, and create
++1 x1和x2
+
+98
+00:03:19,150 --> 00:03:20,390
+my first hidden unit here.
+建立第一个隐藏层单元
+
+99
+00:03:20,660 --> 00:03:22,010
+I'm going to call this a(2)1
+我们称其为a(2)1
+
+100
+00:03:22,770 --> 00:03:24,060
+because that's my first hidden unit.
+因为它是第一个隐藏单元
+
+101
+00:03:24,510 --> 00:03:25,660
+And I'm going to copy
+接下来我要从红色的网络
+
+102
+00:03:25,920 --> 00:03:27,410
+the weights over from the Red
+也就是"x1与x2"这个网络
+
+103
+00:03:27,740 --> 00:03:30,020
+Network, x1 and x2 networks.
+复制出权值
+
+104
+00:03:30,820 --> 00:03:32,410
+So now minus 30, 20, 20.
+也就是-30 20 20
+
+105
+00:03:32,650 --> 00:03:36,060
+Next, let me create
+接下来
+
+106
+00:03:36,420 --> 00:03:37,700
+a second hidden unit, which
+我再建立第二个隐藏单元
+
+107
+00:03:37,930 --> 00:03:39,960
+I'm going to call a(2)2 that
+我们称之为a(2)2
+
+108
+00:03:40,350 --> 00:03:42,610
+is the second hidden unit of layer two.
+它是第二层的第二个隐藏单元
+
+109
+00:03:43,550 --> 00:03:44,590
+And I'm going to copy over the
+然后再从中间的青色网络中
+
+110
+00:03:44,740 --> 00:03:45,940
+Cyan Network in the
+复制出权值
+
+111
+00:03:46,170 --> 00:03:47,080
+middle, so I'm going
+这样我们就有了
+
+112
+00:03:47,130 --> 00:03:49,230
+to have the weights 10, minus 20,
+10 -20 -20
+
+113
+00:03:50,150 --> 00:03:51,060
+minus 20.
+这样三个权值
+
+114
+00:03:52,150 --> 00:03:55,570
+And so, let's pull some of the truth table values.
+因此 我们来看一下真值表中的值
+
+115
+00:03:56,170 --> 00:03:57,350
+For the Red Network, we know
+对于红色的这个网络
+
+116
+00:03:57,590 --> 00:03:59,340
+that was computing the x1 and x2.
+我们知道是x1和x2的与运算
+
+117
+00:03:59,690 --> 00:04:00,940
+And so this is
+所以
+
+118
+00:04:01,040 --> 00:04:02,460
+going to be approximately 0, 0,
+这里的值大概等于0 0 0 1
+
+119
+00:04:02,540 --> 00:04:05,030
+0, 1, depending on the values of x1 and x2.
+这取决于x1和x2的具体取值
+
+120
+00:04:07,040 --> 00:04:09,560
+And for a (2)2, that's the Cyan Network.
+对于a (2)2 也就是青色的网络
+
+121
+00:04:10,590 --> 00:04:11,750
+Well that we know the function not x1
+我们知道这是“(非x1)与(非x2)”的运算
+
+122
+00:04:12,240 --> 00:04:13,640
+and not x2 then outputs 1,
+那么对于x1和x2的四种取值
+
+123
+00:04:13,640 --> 00:04:15,610
+0, 0, 0 for the
+其结果将为
+
+124
+00:04:15,700 --> 00:04:17,830
+4 values of x1 and x2.
+1 0 0 0
+
+125
+00:04:18,480 --> 00:04:19,560
+Finally, I'm going to
+最后
+
+126
+00:04:19,810 --> 00:04:21,300
+create my output note, my
+建立输出节点
+
+127
+00:04:21,490 --> 00:04:23,950
+output unit that is a(3)1.
+也就是输出单元 a(3)1
+
+128
+00:04:24,860 --> 00:04:26,230
+This is one more output h
+这也是等于输出值h(x)
+
+129
+00:04:26,590 --> 00:04:28,270
+of x and I'm
+然后
+
+130
+00:04:28,390 --> 00:04:30,030
+going to copy over the OR
+复制一个或运算网络
+
+131
+00:04:30,320 --> 00:04:32,470
+Network for that and I'm going to
+同时
+
+132
+00:04:32,860 --> 00:04:34,330
+need a plus one bias unit here.
+我需要一个+1作为偏差单元
+
+133
+00:04:34,810 --> 00:04:36,010
+So, draw that in and I'm
+将其添加进来
+
+134
+00:04:36,320 --> 00:04:38,360
+going to copy over the weights from the Green Networks.
+然后从绿色的网络中复制出所有的权值
+
+135
+00:04:38,950 --> 00:04:39,750
+So, it's minus 10, 20, 20
+也就是-10 20 20
+
+136
+00:04:42,370 --> 00:04:44,460
+and we know earlier that this computes the OR function.
+我们之前已经知道这是一个或运算函数
+
+137
+00:04:46,660 --> 00:04:48,200
+So, let's go on the truth table entries.
+那么我们继续看真值表的值
+
+138
+00:04:50,300 --> 00:04:51,660
+For the first entry is 0
+第一行的值是0和1的或运算
+
+139
+00:04:51,720 --> 00:04:53,930
+or 1, which is gonna be
+其结果为1
+
+140
+00:04:54,140 --> 00:04:55,710
+1 then next 0 or
+然后是0和0的或运算
+
+141
+00:04:55,800 --> 00:04:57,280
+0, which is 0,
+其结果为0
+
+142
+00:04:57,350 --> 00:04:58,920
+0, or 0, which is 0,
+0和0的或运算 结果还是0
+
+143
+00:04:58,960 --> 00:05:00,420
+1 or 0 and that all
+1和0的或运算
+
+144
+00:05:00,600 --> 00:05:02,450
+is to 1 and thus, h
+其结果为1
+
+145
+00:05:02,640 --> 00:05:04,820
+of x is equal to 1
+因此 h(x)的值等于1
+
+146
+00:05:04,980 --> 00:05:06,270
+when either both x1 and
+当x1和x2都为0
+
+147
+00:05:06,780 --> 00:05:08,360
+x2 are 0 or when x1 and
+或者x1和x2都为1的时候成立
+
+148
+00:05:08,590 --> 00:05:10,160
+x2 are both 1. And
+具体来说
+
+149
+00:05:10,900 --> 00:05:12,170
+concretely, h of x
+在这两种情况时
+
+150
+00:05:12,680 --> 00:05:15,340
+outputs 1 exactly at these
+h(x)输出1
+
+151
+00:05:15,560 --> 00:05:16,850
+two locations and it outputs
+在另两种情况时
+
+152
+00:05:17,230 --> 00:05:19,270
+0 otherwise and thus,
+h(x)输出0
+
+153
+00:05:19,570 --> 00:05:20,970
+with this neural network, which
+那么对于这样一个神经网络
+
+154
+00:05:21,210 --> 00:05:23,030
+has an input layer, one
+有一个输入层
+
+155
+00:05:23,200 --> 00:05:24,560
+hidden layer and one output
+一个隐藏层
+
+156
+00:05:24,880 --> 00:05:25,920
+layer, we end up
+和一个输出层
+
+157
+00:05:26,100 --> 00:05:28,450
+with a nonlinear decision boundary that
+我们最终得到了
+
+158
+00:05:29,120 --> 00:05:30,520
+computes this XNOR function.
+计算XNOR函数的非线性判别边界
+
+159
+00:05:31,640 --> 00:05:33,390
+And the more general intuition is
+更一般的理解是
+
+160
+00:05:33,710 --> 00:05:34,870
+that in the input
+在输入层中
+
+161
+00:05:34,990 --> 00:05:35,780
+layer, we just had our raw
+我们只有原始输入值
+
+162
+00:05:36,060 --> 00:05:37,400
+inputs then we had
+然后我们建立了一个隐藏层
+
+163
+00:05:37,610 --> 00:05:39,510
+a hidden layer, which computed some
+用来计算稍微复杂一些的
+
+164
+00:05:39,680 --> 00:05:41,140
+slightly more complex functions of
+输入量的函数
+
+165
+00:05:41,250 --> 00:05:42,080
+the inputs that is shown
+如图所示
+
+166
+00:05:42,430 --> 00:05:43,410
+here, these are slightly more
+这些都是稍微复杂一些的函数
+
+167
+00:05:43,550 --> 00:05:44,960
+complex functions, and then by
+然后
+
+168
+00:05:45,250 --> 00:05:46,510
+adding yet another layer, we end
+通过添加另一个层
+
+169
+00:05:46,640 --> 00:05:49,030
+up with an even more complex nonlinear function.
+我们得到了一个更复杂一点的函数
+
+170
+00:05:50,550 --> 00:05:51,340
+And this is the sort of
+这就是关于
+
+171
+00:05:51,450 --> 00:05:53,810
+intuition about why Neural
+神经网络可以计算较复杂函数
+
+172
+00:05:54,100 --> 00:05:55,270
+Networks can compute pretty complicated
+的某种直观解释
+
+173
+00:05:55,840 --> 00:05:57,270
+functions that when you
+我们知道
+
+174
+00:05:57,340 --> 00:05:58,550
+have multiple layers, you have, you know,
+当层数很多的时候
+
+175
+00:05:58,910 --> 00:06:00,300
+relatively simple function of
+你有一个相对简单的输入量的函数
+
+176
+00:06:00,390 --> 00:06:01,500
+the inputs, and the second layer,
+作为第二层
+
+177
+00:06:02,160 --> 00:06:03,110
+but the third layer can build
+而第三层可以建立在此基础上
+
+178
+00:06:03,340 --> 00:06:04,590
+on that to compute even more
+来计算更加复杂一些的函数
+
+179
+00:06:04,820 --> 00:06:06,330
+complex functions and then
+然后再下一层
+
+180
+00:06:06,790 --> 00:06:08,730
+the layer after that can compute even more complex functions.
+又可以计算再复杂一些的函数
+
+181
+00:06:10,340 --> 00:06:11,740
+To wrap up this video, I
+在这段视频的最后
+
+182
+00:06:11,800 --> 00:06:13,330
+want to show you a fun example of
+我想给大家展示一个有趣的例子
+
+183
+00:06:13,480 --> 00:06:14,650
+an application of a neural
+这是一个神经网络
+
+184
+00:06:14,880 --> 00:06:16,400
+network that captures this intuition
+通过运用更深的层数
+
+185
+00:06:17,260 --> 00:06:19,440
+of the deeper layers computing more complex features.
+来计算更加复杂函数的例子
+
+186
+00:06:19,900 --> 00:06:21,040
+I want to show you
+我将要展示的这段视频
+
+187
+00:06:21,200 --> 00:06:22,480
+a video that I got from
+来源于我的一个好朋友
+
+188
+00:06:22,930 --> 00:06:24,170
+a good friend of mine, Yon Khun.
+阳乐昆(Yann LeCun)
+
+189
+00:06:24,850 --> 00:06:26,240
+Yon is a professor at
+Yann是一名教授
+
+190
+00:06:26,610 --> 00:06:27,680
+New York University, at NYU,
+供职于纽约大学
+
+191
+00:06:28,230 --> 00:06:29,400
+and he was one of
+他也是神经网络研究
+
+192
+00:06:29,470 --> 00:06:30,910
+the early pioneers of neural
+早期的奠基者之一
+
+193
+00:06:31,130 --> 00:06:32,590
+network research and sort
+也是这一领域的大牛
+
+194
+00:06:32,930 --> 00:06:34,610
+of a legend in the field
+他的很多理论和想法
+
+195
+00:06:34,930 --> 00:06:36,520
+now and his ideas are
+现在都已经被应用于
+
+196
+00:06:36,560 --> 00:06:38,340
+used in all sorts of products
+各种各样的产品和应用中
+
+197
+00:06:38,980 --> 00:06:40,490
+and applications throughout the world now.
+遍布于全世界
+
+198
+00:06:41,470 --> 00:06:42,230
+So, I want to show you
+所以我想向大家展示一段
+
+199
+00:06:42,380 --> 00:06:43,410
+a video from some of his
+他早期工作中的视频
+
+200
+00:06:43,740 --> 00:06:44,890
+early work in which he
+这段视频中
+
+201
+00:06:44,980 --> 00:06:46,110
+was using a neural network
+他使用神经网络的算法
+
+202
+00:06:47,000 --> 00:06:50,300
+to recognize handwriting - to do handwritten digit recognition.
+进行手写数字的辨识
+
+203
+00:06:51,370 --> 00:06:52,510
+You might remember early in this
+你也许记得
+
+204
+00:06:52,720 --> 00:06:53,630
+class, at the start of this
+在这门课刚开始的时候
+
+205
+00:06:53,730 --> 00:06:55,180
+class, I said that one of
+我说过
+
+206
+00:06:55,460 --> 00:06:56,720
+early successes of neural networks
+关于神经网络的一个早期成就
+
+207
+00:06:57,140 --> 00:06:58,170
+was trying to use it
+就是应用神经网络
+
+208
+00:06:58,320 --> 00:07:00,580
+to read zip codes, to help
+读取邮政编码
+
+209
+00:07:00,850 --> 00:07:02,940
+us, you know, send mail along. So, to read postal codes.
+以帮助我们进行邮递
+
+210
+00:07:03,880 --> 00:07:04,910
+So, this is one of the attempts.
+那么这便是其中一种尝试
+
+211
+00:07:05,250 --> 00:07:06,220
+So, this is one of the
+这就是为了解决这个问题
+
+212
+00:07:06,650 --> 00:07:08,370
+algorithms used to try to address that problem.
+而尝试采用的一种算法
+
+213
+00:07:09,320 --> 00:07:10,420
+In the video I'll show you
+在视频中
+
+214
+00:07:11,060 --> 00:07:12,640
+this area here is the
+这个区域
+
+215
+00:07:12,910 --> 00:07:14,420
+input area that shows a
+是输入区域
+
+216
+00:07:14,980 --> 00:07:16,460
+handwritten character shown to the
+表示的是手写字符
+
+217
+00:07:16,560 --> 00:07:18,610
+network. This column here
+它们将被传递给神经网络
+
+218
+00:07:19,490 --> 00:07:21,350
+shows a visualization of
+这一列数字表示
+
+219
+00:07:21,460 --> 00:07:23,550
+the features computed by so that the
+通过该网络第一个隐藏层运算后
+
+220
+00:07:23,900 --> 00:07:24,760
+first hidden layer of the
+特征量的可视化结果
+
+221
+00:07:24,830 --> 00:07:26,090
+network and so the first
+因此通过第一个隐藏层
+
+222
+00:07:26,400 --> 00:07:28,420
+hidden layer, you know, this visualization shows
+可视化结果显示的是
+
+223
+00:07:28,720 --> 00:07:31,190
+different features, different edges and lines and so on detected.
+探测出的不同特征 不同边缘和边线
+
+224
+00:07:32,360 --> 00:07:35,260
+This is a visualization of the next hidden layer.
+这是下一个隐藏层的可视化结果
+
+225
+00:07:35,530 --> 00:07:36,390
+It's kind of harder to see
+似乎很难看出
+
+226
+00:07:36,770 --> 00:07:38,170
+how to understand deeper hidden
+怎样理解更深的隐藏层
+
+227
+00:07:38,730 --> 00:07:39,680
+layers and that's the visualization
+以及下一个隐藏层
+
+228
+00:07:40,460 --> 00:07:41,830
+of what the next hidden layer is computing.
+计算的可视化结果
+
+229
+00:07:42,140 --> 00:07:43,530
+You probably have a hard time
+可能你如果要想看出到底在进行怎样的运算
+
+230
+00:07:44,180 --> 00:07:45,550
+seeing what's going on, you know,
+还是比较困难的
+
+231
+00:07:45,700 --> 00:07:46,800
+much beyond the first hidden layer,
+最终远远超出了第一个隐藏层的效果
+
+232
+00:07:47,640 --> 00:07:49,160
+but then finally, all of
+但不管怎样
+
+233
+00:07:49,260 --> 00:07:51,110
+these learned features get
+最终这些学习后的特征量
+
+234
+00:07:51,430 --> 00:07:52,590
+fed to the output layer
+将被送到最后一层 也就是输出层
+
+235
+00:07:53,260 --> 00:07:54,830
+and shown over here is
+并且在最后作为结果
+
+236
+00:07:55,030 --> 00:07:56,370
+the final answers, the final
+显示出来
+
+237
+00:07:56,800 --> 00:07:58,850
+predictive value for what
+最终预测到的结果
+
+238
+00:07:59,390 --> 00:08:02,150
+handwritten digit the neural network things that is being shown.
+就是这个神经网络辨识出的手写数字的值
+
+239
+00:08:03,130 --> 00:08:04,270
+So, let's take a look at the video.
+下面我们来观看这段视频
+
+240
+00:09:42,060 --> 00:09:44,370
+So, I hope
+我希望你
+
+241
+00:09:50,610 --> 00:09:52,010
+you enjoyed the video and that
+喜欢这段视频
+
+242
+00:09:52,260 --> 00:09:53,480
+this hopefully gave you some
+也希望这段视频能给你一些直观的感受
+
+243
+00:09:53,670 --> 00:09:55,240
+intuition about the sorts
+关于神经网络可以学习的
+
+244
+00:09:55,450 --> 00:09:57,120
+of pretty complicated functions neural
+较为复杂一些的函数
+
+245
+00:09:57,320 --> 00:09:58,420
+networks can learn in which
+在这个过程中
+
+246
+00:09:58,740 --> 00:10:00,250
+it takes this input this image,
+它使用的输入是不同的图像
+
+247
+00:10:00,670 --> 00:10:01,510
+just takes this input the
+或者说
+
+248
+00:10:01,620 --> 00:10:03,140
+raw pixels and the first
+就是一些原始的像素点
+
+249
+00:10:03,310 --> 00:10:04,640
+end of the layer computes some set
+第一层计算出一些特征
+
+250
+00:10:04,770 --> 00:10:05,680
+of features, the next end
+然后下一层再计算出
+
+251
+00:10:05,740 --> 00:10:06,900
+of the layer computes even more complex
+一些稍复杂的特征
+
+252
+00:10:07,330 --> 00:10:08,620
+features and even more complex features
+然后是更复杂的特征
+
+253
+00:10:09,560 --> 00:10:10,640
+and these features can then be
+然后这些特征
+
+254
+00:10:10,780 --> 00:10:12,030
+used by essentially the final
+实际上被最终传递给
+
+255
+00:10:12,940 --> 00:10:14,700
+layer of logistic regression classifiers
+最后一层逻辑回归分类器上
+
+256
+00:10:15,810 --> 00:10:17,550
+to make accurate predictions about what
+使其准确地预测出
+
+257
+00:10:17,880 --> 00:10:19,190
+are the numbers that the network sees.
+神经网络“看”到的数字
+
diff --git a/srt/8 - 7 - Multiclass Classification (4 min).srt b/srt/8 - 7 - Multiclass Classification (4 min).srt
new file mode 100644
index 00000000..bd172235
--- /dev/null
+++ b/srt/8 - 7 - Multiclass Classification (4 min).srt
@@ -0,0 +1,556 @@
+1
+00:00:00,320 --> 00:00:01,410
+In this video, I want to
+在这个视频中,我想
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,490 --> 00:00:02,710
+tell you about how to use neural
+告诉你如何使用神经
+
+3
+00:00:02,900 --> 00:00:04,390
+networks to do multiclass
+网络做多类
+
+4
+00:00:04,830 --> 00:00:06,690
+classification where we may
+分类,我们可以
+
+5
+00:00:06,820 --> 00:00:07,840
+have more than one category
+有一个以上的类别
+
+6
+00:00:07,930 --> 00:00:09,600
+that we're trying to distinguish amongst.
+我们正在试图区分之间。
+
+7
+00:00:10,470 --> 00:00:12,280
+In the last part of
+中的最后部分
+
+8
+00:00:12,600 --> 00:00:13,920
+the last video, where we
+最后一个视频,在这里我们
+
+9
+00:00:14,400 --> 00:00:15,320
+had the handwritten digit recognition
+有手写数字识别
+
+10
+00:00:15,830 --> 00:00:17,030
+problem, that was actually
+的问题,这实际上是
+
+11
+00:00:17,700 --> 00:00:19,000
+a multiclass classification problem because
+因为多类分类问题
+
+12
+00:00:19,440 --> 00:00:20,730
+there were ten possible categories
+有十个可能的类别
+
+13
+00:00:21,550 --> 00:00:22,820
+for recognizing the digits from
+从认识数字
+
+14
+00:00:23,040 --> 00:00:23,980
+0 through 9 and so, if
+0至9,所以,如果
+
+15
+00:00:24,060 --> 00:00:25,430
+you want us to fill you
+您希望我们填补你
+
+16
+00:00:25,830 --> 00:00:27,840
+in on the details of how to do that.
+在关于如何做到这一点的细节。
+
+17
+00:00:30,410 --> 00:00:31,870
+The way we do multiclass classification
+我们做的多类分类的方法
+
+18
+00:00:32,990 --> 00:00:34,380
+in a neural network is essentially
+在一个神经网络本质上是
+
+19
+00:00:35,060 --> 00:00:37,600
+an extension of the one versus all method.
+一个与所有方法的延伸。
+
+20
+00:00:38,610 --> 00:00:39,650
+So, let's say that we
+所以,让我们说,我们
+
+21
+00:00:39,790 --> 00:00:41,660
+have a computer vision example,
+有计算机视觉的例子,
+
+22
+00:00:42,630 --> 00:00:43,810
+where instead of just trying
+在那里,而不是只是想
+
+23
+00:00:44,010 --> 00:00:46,170
+to recognize cars as in
+认识到汽车在
+
+24
+00:00:46,310 --> 00:00:47,290
+the original example that I started off
+那我开始了最初的例子
+
+25
+00:00:47,470 --> 00:00:48,670
+with, but let's say that
+用,但让我们说,
+
+26
+00:00:49,060 --> 00:00:51,380
+we're trying to recognize, you know, four
+我们试图认识,你知道,四
+
+27
+00:00:51,510 --> 00:00:52,820
+categories of objects and given
+对象的类别和特定
+
+28
+00:00:53,030 --> 00:00:53,900
+an image we want to
+我们想要的图像
+
+29
+00:00:54,100 --> 00:00:56,360
+decide if it is a pedestrian, a car, a motorcycle or a truck.
+决定是否有行人,汽车,摩托车或卡车。
+
+30
+00:00:57,200 --> 00:00:58,750
+If that's the case, what
+如果是这样的话,有什么
+
+31
+00:00:58,920 --> 00:01:00,480
+we would do is we would
+我们会做的是,我们会
+
+32
+00:01:00,970 --> 00:01:02,820
+build a neural network with four
+建立一个神经网络与四
+
+33
+00:01:03,160 --> 00:01:04,500
+output units so that
+输出单元,使得
+
+34
+00:01:04,710 --> 00:01:08,110
+our neural network now outputs a vector of four numbers.
+我们的神经网络现在输出四个数字的向量。
+
+35
+00:01:09,110 --> 00:01:10,450
+So, the output now is actually
+所以,现在输出实际上是
+
+36
+00:01:11,170 --> 00:01:11,840
+needing to be a vector of four
+无需是四的矢量
+
+37
+00:01:12,070 --> 00:01:13,300
+numbers and what we're
+数字和我们在做什么
+
+38
+00:01:13,540 --> 00:01:14,400
+going to try to do is
+要尝试做的是
+
+39
+00:01:14,780 --> 00:01:16,680
+get the first output unit
+获得第一输出单元
+
+40
+00:01:17,180 --> 00:01:18,840
+to classify: is the
+分类:是
+
+41
+00:01:19,160 --> 00:01:20,650
+image a pedestrian, yes or no.
+图像中的行人,yes或no。
+
+42
+00:01:21,200 --> 00:01:24,530
+The second unit to classify: is the image a car, yes or no.
+第二个单元进行分类:是图像一辆车,是或否。
+
+43
+00:01:25,110 --> 00:01:26,880
+This unit to classify: is the
+本单元进行分类:是
+
+44
+00:01:27,130 --> 00:01:29,150
+image a motorcycle, yes or
+像摩托车,或者是
+
+45
+00:01:29,230 --> 00:01:30,460
+no, and this would classify:
+没有,而这将分类:
+
+46
+00:01:30,930 --> 00:01:32,930
+is the image a truck, yes or no.
+是图像卡车,yes或no。
+
+47
+00:01:33,720 --> 00:01:35,730
+And thus, when the image
+因而,当图像
+
+48
+00:01:36,390 --> 00:01:37,630
+is of a pedestrian, we
+是一行人,我们
+
+49
+00:01:37,820 --> 00:01:38,930
+would ideally want the network
+在理想情况下希望网络
+
+50
+00:01:39,410 --> 00:01:40,140
+to output 1, 0, 0, 0,
+到输出1,0,0,0,
+
+51
+00:01:40,250 --> 00:01:41,260
+when it is a
+当它是一个
+
+52
+00:01:41,520 --> 00:01:42,310
+car we want it to output
+车上我们希望它输出
+
+53
+00:01:42,750 --> 00:01:43,530
+0, 1, 0, 0, when this
+0,1,0,0,当该
+
+54
+00:01:43,840 --> 00:01:45,960
+is a motorcycle, we get it to or rather, we want
+被一辆摩托车,我们得到它或者说,我们要
+
+55
+00:01:46,390 --> 00:01:47,460
+it to output 0, 0,
+它输出到0,0,
+
+56
+00:01:47,580 --> 00:01:48,970
+1, 0 and so on.
+1,0等。
+
+57
+00:01:50,750 --> 00:01:51,880
+So this is just like
+因此,这就像
+
+58
+00:01:52,270 --> 00:01:53,690
+the "one versus all" method
+“一对所有”的方法
+
+59
+00:01:54,190 --> 00:01:55,520
+that we talked about when we
+我们谈到我们时
+
+60
+00:01:55,680 --> 00:01:58,120
+were describing logistic regression, and
+被描述logistic回归,和
+
+61
+00:01:58,320 --> 00:02:00,480
+here we have essentially four logistic
+在这里,我们有四个基本的逻辑
+
+62
+00:02:01,290 --> 00:02:03,100
+regression classifiers, each of
+回归分类器,每个
+
+63
+00:02:03,260 --> 00:02:04,800
+which is trying to recognize one
+这是试图承认一个
+
+64
+00:02:05,000 --> 00:02:06,780
+of the four classes that
+四班的那
+
+65
+00:02:06,940 --> 00:02:08,830
+we want to distinguish amongst.
+我们要区分之间。
+
+66
+00:02:09,540 --> 00:02:10,780
+So, rearranging the slide of
+因此,重新排列幻灯片
+
+67
+00:02:10,860 --> 00:02:12,130
+it, here's our neural network
+它,这里是我们的神经网络
+
+68
+00:02:12,540 --> 00:02:14,070
+with four output units and those
+有四个输出单元和那些
+
+69
+00:02:14,330 --> 00:02:15,510
+are what we want h
+是我们想要的?
+
+70
+00:02:15,670 --> 00:02:16,790
+of x to be when we
+x的是,当我们
+
+71
+00:02:16,990 --> 00:02:18,930
+have the different images, and
+有不同的图像,并
+
+72
+00:02:19,580 --> 00:02:20,860
+the way we're going to represent the
+我们要代表的方式
+
+73
+00:02:21,110 --> 00:02:22,690
+training set in these settings
+培训在这些设置中设置
+
+74
+00:02:23,260 --> 00:02:24,670
+is as follows. So, when we have
+如下。所以,当我们有
+
+75
+00:02:24,890 --> 00:02:26,170
+a training set with different images
+训练集与不同的图像
+
+76
+00:02:27,350 --> 00:02:28,990
+of pedestrians, cars, motorcycles and
+的行人,汽车,摩托车及
+
+77
+00:02:29,260 --> 00:02:30,450
+trucks, what we're going
+卡车,我们要去
+
+78
+00:02:30,510 --> 00:02:31,940
+to do in this example is
+做在这个例子中是
+
+79
+00:02:32,190 --> 00:02:34,580
+that whereas previously we had
+而那之前我们有
+
+80
+00:02:34,990 --> 00:02:36,780
+written out the labels as
+写出来的标签作为
+
+81
+00:02:37,040 --> 00:02:38,320
+y being an integer from
+y是整数
+
+82
+00:02:38,710 --> 00:02:42,180
+1, 2, 3 or 4. Instead of
+1,2,3或4。相反的
+
+83
+00:02:42,280 --> 00:02:44,210
+representing y this way,
+代表?这样一来,
+
+84
+00:02:44,890 --> 00:02:46,340
+we're going to instead represent y
+我们要代替代表?
+
+85
+00:02:47,050 --> 00:02:49,400
+as follows: namely Yi
+如下:即易
+
+86
+00:02:54,850 --> 00:02:55,230
+will be either 1, 0, 0, 0
+将是1,0,0,0
+
+87
+00:02:55,230 --> 00:02:57,040
+or 0, 1, 0, 0 or 0, 0, 1, 0 or 0, 0, 0, 1 depending on what the
+或0,1,0,0或0,0,1,0或0,0,0,1,取决于什么样的
+
+88
+00:02:57,490 --> 00:02:59,100
+corresponding image Xi is.
+相应的图像Xi为。
+
+89
+00:02:59,410 --> 00:03:00,700
+And so one training example
+等一训练实例
+
+90
+00:03:01,230 --> 00:03:03,090
+will be one pair Xi colon Yi
+将一对结肠熙怡
+
+91
+00:03:04,530 --> 00:03:06,340
+where Xi is an image with, you
+其中Xi是一个形象,你
+
+92
+00:03:06,440 --> 00:03:08,000
+know one of the four objects and
+知道有四个对象,并
+
+93
+00:03:08,170 --> 00:03:09,640
+Yi will be one of these vectors.
+易将这些载体之一。
+
+94
+00:03:10,970 --> 00:03:12,020
+And hopefully, we can find
+并希望,我们可以找到
+
+95
+00:03:12,420 --> 00:03:13,670
+a way to get our
+一种方式来获得我们
+
+96
+00:03:14,020 --> 00:03:15,100
+Neural Networks to output some
+神经网络的输出部分
+
+97
+00:03:15,290 --> 00:03:16,480
+value. So, the h of x
+值。因此,x的?
+
+98
+00:03:17,310 --> 00:03:20,360
+is approximately y and
+大约是y和
+
+99
+00:03:20,550 --> 00:03:22,000
+both h of x and Yi,
+x和毅小时,
+
+100
+00:03:22,600 --> 00:03:23,770
+both of these are going
+这两个要
+
+101
+00:03:24,020 --> 00:03:25,170
+to be in our example,
+要在我们的例子中,
+
+102
+00:03:26,060 --> 00:03:28,700
+four dimensional vectors when we have four classes.
+当我们有四个类四维向量。
+
+103
+00:03:31,810 --> 00:03:33,020
+So, that's how you
+所以,这是你如何
+
+104
+00:03:33,170 --> 00:03:34,830
+get neural network to do multiclass classification.
+得到的神经网络做多类分类。
+
+105
+00:03:36,290 --> 00:03:37,780
+This wraps up our discussion on
+这个部分是我们的讨论
+
+106
+00:03:38,050 --> 00:03:39,620
+how to represent Neural Networks
+如何表示神经网络
+
+107
+00:03:40,120 --> 00:03:41,620
+that is on our hypotheses representation.
+这是对我们的假设表示。
+
+108
+00:03:42,780 --> 00:03:44,180
+In the next set of videos, let's
+在下一组的视频,让我们
+
+109
+00:03:44,690 --> 00:03:45,830
+start to talk about how take
+开始谈论如何利用
+
+110
+00:03:45,990 --> 00:03:47,360
+a training set and how to
+训练集,以及如何
+
+111
+00:03:47,570 --> 00:03:49,970
+automatically learn the parameters of the neural network.
+自动学习神经网络的参数。
+
diff --git a/srt/9 - 1 - Cost Function (7 min).srt b/srt/9 - 1 - Cost Function (7 min).srt
new file mode 100644
index 00000000..ffdc3186
--- /dev/null
+++ b/srt/9 - 1 - Cost Function (7 min).srt
@@ -0,0 +1,971 @@
+1
+00:00:00,270 --> 00:00:01,380
+Neural Networks are one of
+神经网络是当今
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,450 --> 00:00:03,610
+the most powerful learning algorithms that we have today.
+最强大的学习算法之一
+
+3
+00:00:04,310 --> 00:00:05,490
+In this, and in the
+在本节课视频
+
+4
+00:00:05,590 --> 00:00:06,690
+next few videos, I'd like to
+和后面几次课程中
+
+5
+00:00:06,760 --> 00:00:08,030
+start talking about a learning
+我将开始讲述一种
+
+6
+00:00:08,380 --> 00:00:09,920
+algorithm for fitting the parameters
+在给定训练集下
+
+7
+00:00:10,630 --> 00:00:12,530
+of the neural network given the training set.
+为神经网络拟合参数的学习算法
+
+8
+00:00:13,460 --> 00:00:14,840
+As for the discussion of most
+正如我们讨论大多数
+
+9
+00:00:15,020 --> 00:00:16,300
+of the learning algorithms, we're going
+学习算法一样
+
+10
+00:00:16,450 --> 00:00:17,860
+to begin by talking about the
+我们准备从拟合神经网络参数的
+
+11
+00:00:17,960 --> 00:00:20,510
+cost function for fitting the parameters of the network.
+代价函数开始讲起
+
+12
+00:00:21,170 --> 00:00:22,650
+I'm going to focus
+我准备重点讲解
+
+13
+00:00:23,270 --> 00:00:24,790
+on the application of neural
+神经网络在分类问题
+
+14
+00:00:25,060 --> 00:00:26,590
+networks to classification problems.
+中的应用
+
+15
+00:00:27,660 --> 00:00:28,860
+So, suppose we have a
+假设我们有一个如左边所示的
+
+16
+00:00:29,120 --> 00:00:31,300
+network like that shown on the left.
+神经网络结构
+
+17
+00:00:31,530 --> 00:00:32,510
+And suppose we have a training
+然后假设我们有一个
+
+18
+00:00:32,710 --> 00:00:33,850
+set like this of this of
+像这样的训练集
+
+19
+00:00:33,980 --> 00:00:36,550
+Xi, Yi pairs of m training examples.
+m个训练样本x(i) y(i)
+
+20
+00:00:37,920 --> 00:00:39,040
+I'm going to use upper case
+我用大写字母 L
+
+21
+00:00:39,450 --> 00:00:40,640
+L to denote the
+来表示
+
+22
+00:00:40,790 --> 00:00:42,460
+total number of layers in this network.
+这个神经网络结构的总层数
+
+23
+00:00:43,330 --> 00:00:44,550
+So, for the network shown
+所以 对于左边的网络结构
+
+24
+00:00:44,810 --> 00:00:45,720
+on the left, we would have
+我们得到
+
+25
+00:00:46,370 --> 00:00:47,920
+capital L equals 4.
+L等于4
+
+26
+00:00:48,020 --> 00:00:48,910
+And, I'm going to use
+然后我准备用
+
+27
+00:00:49,180 --> 00:00:50,740
+s subscript L, to denote
+sl表示
+
+28
+00:00:51,260 --> 00:00:52,540
+the number of units, that is
+第L层的单元的数量
+
+29
+00:00:52,730 --> 00:00:54,490
+a number of neurons, not counting
+也就是神经元的数量
+
+30
+00:00:54,770 --> 00:00:57,180
+the bias unit in layer L of the network.
+这其中不包括L层的偏差单元
+
+31
+00:00:57,900 --> 00:00:59,440
+So, for example, we would
+比如说
+
+32
+00:00:59,580 --> 00:01:01,280
+have a S1 which
+我们得到s1 也就是输入层
+
+33
+00:01:01,370 --> 00:01:03,330
+is the input layer equals S3 unit,
+是等于3的单元
+
+34
+00:01:04,140 --> 00:01:05,970
+S2 in my example is five units.
+s2在这个例子里等于5个单位
+
+35
+00:01:06,900 --> 00:01:08,670
+And the output layer S4.
+然后输出层s4
+
+36
+00:01:09,940 --> 00:01:12,820
+Which is also equals SL, because capital L is equal to four.
+也就是sl 因为L本身等于4
+
+37
+00:01:12,990 --> 00:01:14,290
+The output layer in my
+在左边这个例子中输出层
+
+38
+00:01:14,450 --> 00:01:16,230
+example in the left has four units.
+有4个单位
+
+39
+00:01:17,630 --> 00:01:19,880
+We're going to consider two types of classification problems.
+我们将会讨论两种分类问题
+
+40
+00:01:20,430 --> 00:01:21,780
+The first is binary classification,
+第一种是二元分类
+
+41
+00:01:22,970 --> 00:01:25,550
+where the labels y are either zero or one.
+在这里y只能等于0或1
+
+42
+00:01:26,240 --> 00:01:28,540
+In this case, we would have one output unit.
+在这个例子中 我们有一个输出单元
+
+43
+00:01:29,140 --> 00:01:30,260
+So, this neural network on top
+上面这个神经网络的有四个输出单元
+
+44
+00:01:30,510 --> 00:01:32,430
+has four output units, but if
+但是如果我们
+
+45
+00:01:32,570 --> 00:01:33,960
+we had binary classification, we would
+用二元分类的话
+
+46
+00:01:34,120 --> 00:01:35,810
+have only one output unit
+我们就只能有一个输出结果
+
+47
+00:01:36,720 --> 00:01:38,360
+that computes h of x.
+也就是计算出来的h(x)
+
+48
+00:01:40,310 --> 00:01:41,550
+And the outputs of the
+神经网络的输出结果
+
+49
+00:01:41,630 --> 00:01:42,960
+neural network would be h
+h(x)就会是
+
+50
+00:01:43,140 --> 00:01:45,580
+of x is going to be a real number.
+一个实数
+
+51
+00:01:46,900 --> 00:01:47,980
+And in this case the number
+在这类问题里
+
+52
+00:01:48,360 --> 00:01:50,240
+of output units, SL, where
+输出单元的个数 sl
+
+53
+00:01:50,480 --> 00:01:51,880
+L is again the index
+L同样代表最后一层的序号
+
+54
+00:01:52,300 --> 00:01:53,970
+of the final layer because that's
+因为这就是我们
+
+55
+00:01:54,240 --> 00:01:55,630
+the number of layers we have in the network.
+在这个网络结构中的层数
+
+56
+00:01:56,570 --> 00:01:57,960
+So the number of units we
+所以我们在输出层的单元数目
+
+57
+00:01:58,110 --> 00:02:00,060
+have in the output layer is going to be equal to one.
+就将是1
+
+58
+00:02:01,040 --> 00:02:02,430
+In this case, to simplify notation
+在这类问题里 为了简化记法
+
+59
+00:02:02,950 --> 00:02:05,340
+later, I'm also going to set k equals 1.
+我会把K设为1
+
+60
+00:02:05,460 --> 00:02:06,560
+So, you can think of k as
+这样你可以把K看作
+
+61
+00:02:06,770 --> 00:02:08,240
+also denoting the number
+输出层的
+
+62
+00:02:08,700 --> 00:02:10,780
+of units in the output layer.
+单元数目
+
+63
+00:02:11,410 --> 00:02:12,980
+The second type of classification problem
+我们要考虑的第二类分类问题
+
+64
+00:02:13,280 --> 00:02:15,160
+we'll consider will be multiclass classification
+就是多类别的分类问题
+
+65
+00:02:15,780 --> 00:02:18,020
+problem where we may have k distinct classes.
+也就是会有K个不同的类
+
+66
+00:02:19,160 --> 00:02:20,760
+So, our early example, I
+比如说
+
+67
+00:02:21,070 --> 00:02:22,530
+had this representation for y
+如果我们有四类的话
+
+68
+00:02:23,080 --> 00:02:24,900
+if we have four classes and
+我们就用这样的表达形式来代表y
+
+69
+00:02:25,160 --> 00:02:27,050
+in this case, we would have caps lock K
+在这类问题里 我们就会有K个输出单元
+
+70
+00:02:27,340 --> 00:02:29,530
+output units and our hypotheses
+我们的假设输出
+
+71
+00:02:30,350 --> 00:02:33,720
+will output vectors that are K dimensional.
+就是一个K维向量
+
+72
+00:02:34,980 --> 00:02:36,230
+And the number of output units
+输出单元的个数
+
+73
+00:02:36,760 --> 00:02:38,390
+will be equal to K.
+就等于K
+
+74
+00:02:39,000 --> 00:02:40,020
+And usually we will have
+通常这类问题里
+
+75
+00:02:40,370 --> 00:02:41,620
+K greater than or equal
+我们都有K大于
+
+76
+00:02:41,820 --> 00:02:42,960
+to three in this case, because
+或等于3
+
+77
+00:02:43,980 --> 00:02:45,340
+if we had two classes then,
+因为如果只有两个类别
+
+78
+00:02:45,710 --> 00:02:46,560
+you know, we don't need to
+我们就不需要
+
+79
+00:02:46,690 --> 00:02:48,330
+use the one versus all method.
+使用这种一对多的方法
+
+80
+00:02:48,720 --> 00:02:49,640
+We need to use the one versus
+我们只有在K大于
+
+81
+00:02:49,970 --> 00:02:50,950
+all method only if we
+或者等于3个类的时候
+
+82
+00:02:51,110 --> 00:02:52,460
+have K greater than or
+才会使用这种
+
+83
+00:02:52,740 --> 00:02:54,250
+equal to three classes so we
+一对多的方法
+
+84
+00:02:54,470 --> 00:02:56,100
+only have two classes we will
+因为如果只有两个类别
+
+85
+00:02:56,180 --> 00:02:57,670
+need to use only one output unit.
+我们就只需要一个输出单元就可以了
+
+86
+00:02:58,250 --> 00:03:00,870
+Now, let's define the cost function for our cost function for our neural network.
+现在我们来为神经网络定义代价函数
+
+87
+00:03:03,880 --> 00:03:05,130
+The cost function we use for
+我们在神经网络里
+
+88
+00:03:05,240 --> 00:03:06,530
+the neural network is going to
+使用的代价函数
+
+89
+00:03:06,680 --> 00:03:08,300
+be a generalization of the
+应该是逻辑回归里
+
+90
+00:03:08,360 --> 00:03:09,340
+one that we use for logistic
+使用的代价函数的一般化形式
+
+91
+00:03:09,510 --> 00:03:11,500
+regression. For logistic regression,
+对于逻辑回归而言
+
+92
+00:03:12,100 --> 00:03:13,440
+we used to minimize the
+我们通常使代价函数 J(θ)
+
+93
+00:03:13,510 --> 00:03:14,490
+cost function j of theta
+最小化
+
+94
+00:03:15,270 --> 00:03:16,550
+that was minus 1 over
+也就是-1/m
+
+95
+00:03:16,770 --> 00:03:17,760
+m of this cost function
+乘以后面这个代价函数
+
+96
+00:03:18,720 --> 00:03:20,570
+and then plus this extra regularization
+然后再加上这个额外正则化项
+
+97
+00:03:21,300 --> 00:03:22,660
+term here, where this was
+这里是一个
+
+98
+00:03:22,850 --> 00:03:24,020
+a sum from j equals
+j从1到n的求和形式
+
+99
+00:03:24,700 --> 00:03:26,190
+1 through n, because we
+因为我们
+
+100
+00:03:26,270 --> 00:03:29,760
+did not regularize the bias term theta zero.
+并没有把偏差项 0正则化
+
+101
+00:03:31,030 --> 00:03:32,590
+For a neural network our cost
+对于一个神经网络来说
+
+102
+00:03:32,910 --> 00:03:34,490
+function is going to be a generalization of this.
+我们的代价函数是这个式子的一般化形式
+
+103
+00:03:35,650 --> 00:03:37,060
+Where instead of having basically
+这里不再是仅有一个
+
+104
+00:03:37,530 --> 00:03:39,360
+just one logistic regression output
+逻辑回归输出单元
+
+105
+00:03:39,650 --> 00:03:41,650
+unit, we may instead have K of them.
+取而代之的是K个
+
+106
+00:03:42,590 --> 00:03:43,520
+So here's our cost function.
+所以这是我们的代价函数
+
+107
+00:03:44,770 --> 00:03:46,300
+Neural network now outputs vectors
+神经网络现在输出了
+
+108
+00:03:46,720 --> 00:03:47,920
+in RK where K might
+在K维的向量
+
+109
+00:03:48,170 --> 00:03:48,830
+be equal to 1 if we
+这里K可以取到1 也就是
+
+110
+00:03:49,200 --> 00:03:50,350
+have the binary classification problem.
+原来的二元分类问题
+
+111
+00:03:51,380 --> 00:03:52,240
+I'm going to use this notation,
+我准备用这样一个记法
+
+112
+00:03:53,300 --> 00:03:56,470
+h of x subscript i, to denote the ith output.
+h(x)带下标i 来表示第i个输出
+
+113
+00:03:57,440 --> 00:03:59,860
+That is h of x is a K dimensional vector.
+也就是h(x)是一个K维向量
+
+114
+00:04:00,840 --> 00:04:02,590
+And so this subscript i just
+下标 i 表示
+
+115
+00:04:02,960 --> 00:04:04,400
+selects out the ith element
+选择了神经网络输出向量的
+
+116
+00:04:05,200 --> 00:04:07,510
+of the vector that is output by my neural network.
+第i个元素
+
+117
+00:04:08,900 --> 00:04:10,050
+My cost function, j of
+我的代价函数
+
+118
+00:04:10,180 --> 00:04:11,580
+theta is now going
+J(θ) 将成为下面这样的形式
+
+119
+00:04:11,760 --> 00:04:13,790
+to be the following is minus one
+-1/m乘以
+
+120
+00:04:13,940 --> 00:04:14,850
+over m of a sum
+一个类似于我们在
+
+121
+00:04:15,420 --> 00:04:16,780
+of a similar term to what
+逻辑回归里所用的
+
+122
+00:04:16,960 --> 00:04:18,990
+we have in logistic regression. Except that
+求和项
+
+123
+00:04:19,300 --> 00:04:20,360
+we have this sum from K
+除了这里我们求的是
+
+124
+00:04:21,020 --> 00:04:22,490
+equals one through K. The
+k从1到K的所有和
+
+125
+00:04:22,600 --> 00:04:23,650
+summation is basically a
+这个求和项主要是
+
+126
+00:04:23,720 --> 00:04:25,580
+sum over my K output unit.
+K个输出单元的求和
+
+127
+00:04:26,060 --> 00:04:28,290
+So, if I have four upper units.
+所以如果我有四个输出单元
+
+128
+00:04:29,400 --> 00:04:30,740
+That is the final layer of my
+也就是我的神经网络最后一层
+
+129
+00:04:30,850 --> 00:04:32,530
+neural network has four output
+有四个输出单元
+
+130
+00:04:32,860 --> 00:04:34,420
+units then this sum
+那么这个求和就是
+
+131
+00:04:34,700 --> 00:04:35,680
+from, this is a sum from
+这个求和项就是
+
+132
+00:04:35,900 --> 00:04:37,140
+K equals one through four
+求k等于从1到4的
+
+133
+00:04:38,050 --> 00:04:40,550
+of basically the logistic regression algorithms
+每一个的逻辑回归算法的代价函数
+
+134
+00:04:42,070 --> 00:04:43,640
+cost function but summing
+然后按四次输出的顺序
+
+135
+00:04:43,750 --> 00:04:45,570
+that cost function over each
+依次把这些代价函数
+
+136
+00:04:45,890 --> 00:04:47,120
+of my four output units in turn.
+加起来
+
+137
+00:04:47,800 --> 00:04:48,970
+And so, you notice
+所以你会特别注意到
+
+138
+00:04:49,380 --> 00:04:50,700
+in particular that this applies
+这个求和符号应用于
+
+139
+00:04:51,400 --> 00:04:53,530
+to YK, HK, because
+yk和hk 因为
+
+140
+00:04:53,740 --> 00:04:55,040
+we're basically taking the K
+我们主要是讨论
+
+141
+00:04:55,500 --> 00:04:57,020
+upper unit and comparing that
+K个输出单元
+
+142
+00:04:57,780 --> 00:04:59,590
+to the value of YK, which
+并且把它和yk的值相比
+
+143
+00:04:59,810 --> 00:05:02,020
+is you know, which is
+yk的值就是
+
+144
+00:05:02,210 --> 00:05:03,260
+that one of those vectors
+这些向量里表示
+
+145
+00:05:03,740 --> 00:05:05,110
+to say what cause it should be.
+它应当属于哪个类别的量
+
+146
+00:05:06,280 --> 00:05:08,060
+And finally, the second term
+最后 这里的第二项
+
+147
+00:05:08,360 --> 00:05:09,490
+here is the regularization
+这就是类似于我们在逻辑回归里所用的
+
+148
+00:05:10,440 --> 00:05:12,970
+term similar to what we had for logistic regression.
+正则化项
+
+149
+00:05:14,080 --> 00:05:15,640
+This summation terms looks really
+这个求和项看起来
+
+150
+00:05:15,850 --> 00:05:17,370
+complicated and always doing
+确实非常复杂
+
+151
+00:05:17,840 --> 00:05:19,460
+is a summing over these terms,
+它所做的就是把这些项全部相加
+
+152
+00:05:19,950 --> 00:05:21,670
+theta j i l for
+也就是对所有i j和l
+
+153
+00:05:21,860 --> 00:05:23,340
+all values of i j
+的θji的值都相加
+
+154
+00:05:23,410 --> 00:05:24,830
+and l. Except that we
+正如我们在逻辑回归里一样
+
+155
+00:05:25,010 --> 00:05:26,340
+don't sum over the terms
+这里要除去那些对应于偏差值的项
+
+156
+00:05:26,710 --> 00:05:28,210
+corresponding to these bias values
+那些项我们是不加进去的
+
+157
+00:05:28,900 --> 00:05:30,000
+like we had for logistic progression.
+那些项我们是不加进去的
+
+158
+00:05:30,900 --> 00:05:32,080
+Concretely, we don't sum
+具体地说 我们不把
+
+159
+00:05:32,240 --> 00:05:33,590
+over the terms corresponding
+那些对于i等于0的项
+
+160
+00:05:34,300 --> 00:05:36,290
+to where i is equal to zero.
+加入其中
+
+161
+00:05:36,780 --> 00:05:38,310
+So, that is because
+这是因为
+
+162
+00:05:38,920 --> 00:05:40,010
+when we are computing the activation
+当我们计算神经元的激励值时
+
+163
+00:05:40,590 --> 00:05:41,930
+of the neuron, we have terms
+我们会有这些项
+
+164
+00:05:42,280 --> 00:05:43,630
+like these, you know theta, i0
+θi0
+
+165
+00:05:43,810 --> 00:05:47,860
+plus theta, i1,
+加上θi1
+
+166
+00:05:48,160 --> 00:05:50,410
+x1 plus, and
+乘以x1 再加上 等等等等
+
+167
+00:05:50,520 --> 00:05:51,780
+so on, where I guess
+这里我认为
+
+168
+00:05:52,020 --> 00:05:53,310
+we could have a 2 there
+我们可以加上2的上标
+
+169
+00:05:53,490 --> 00:05:54,420
+if this is the first hidden layer,
+如果这是第一个隐含层的话
+
+170
+00:05:55,250 --> 00:05:56,800
+and so the values with
+所以这些带0的项
+
+171
+00:05:57,230 --> 00:05:58,730
+the 0 there at that corresponds to
+所以这些带0的项
+
+172
+00:05:58,730 --> 00:06:00,110
+something that multiplies into an
+对应于乘进去了
+
+173
+00:06:00,260 --> 00:06:01,460
+x0 or an a0 and
+x0 或者是a0什么的
+
+174
+00:06:02,210 --> 00:06:02,950
+so, this is kind of like
+这就是一个类似于
+
+175
+00:06:03,120 --> 00:06:04,810
+a bias unit and by
+偏差单元的项
+
+176
+00:06:04,980 --> 00:06:06,020
+analogy to what we were
+类比于我们在做
+
+177
+00:06:06,130 --> 00:06:07,680
+doing for logistic progression, we won't
+逻辑回归的时候
+
+178
+00:06:07,890 --> 00:06:09,090
+sum over those terms in
+我们就不应该把这些项
+
+179
+00:06:09,160 --> 00:06:11,050
+our regularization term because we
+加入到正规化项里去
+
+180
+00:06:11,160 --> 00:06:13,470
+don't want to regularize them and
+因为我们并不想正规化这些项
+
+181
+00:06:13,670 --> 00:06:15,140
+string the values 0.
+并把这些项设定为0
+
+182
+00:06:15,360 --> 00:06:16,530
+But this is just one possible convention
+但这只是一个合理的规定
+
+183
+00:06:17,670 --> 00:06:18,670
+and even if you were
+即使我们真的把他们加进去了
+
+184
+00:06:18,840 --> 00:06:20,960
+to sum over, you know, i equals 0 up
+也就是i从0加到sL
+
+185
+00:06:21,200 --> 00:06:22,810
+to SL, it will work
+这依然成立
+
+186
+00:06:23,160 --> 00:06:24,720
+about the same and it doesn't make a big difference.
+并且不会有大的差异
+
+187
+00:06:25,530 --> 00:06:26,760
+But maybe this convention
+但是这个"不把偏差项正规化"
+
+188
+00:06:27,500 --> 00:06:28,790
+of not regularizing the bias
+的规定可能只是会
+
+189
+00:06:29,070 --> 00:06:30,320
+term is just slightly more common.
+更常见一些
+
+190
+00:06:32,960 --> 00:06:34,200
+So, that's the cost function
+好了 这就是我们准备
+
+191
+00:06:34,690 --> 00:06:36,270
+we're going to use to fill on your own network.
+应用于神经网络的代价函数
+
+192
+00:06:36,790 --> 00:06:38,130
+In the next video, we'll start
+在下一个视频中
+
+193
+00:06:38,480 --> 00:06:40,270
+to talk about an algorithm for
+我会开始讲解一个算法
+
+194
+00:06:40,570 --> 00:06:42,530
+trying to optimize the cost function.
+来最优化这个代价函数
+
diff --git a/srt/9 - 2 - Backpropagation Algorithm (12 min).srt b/srt/9 - 2 - Backpropagation Algorithm (12 min).srt
new file mode 100644
index 00000000..afed126f
--- /dev/null
+++ b/srt/9 - 2 - Backpropagation Algorithm (12 min).srt
@@ -0,0 +1,1690 @@
+1
+00:00:00,090 --> 00:00:01,798
+In the previous video, we talked about
+在上一个视频里 我们讲解了
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,857 --> 00:00:03,868
+a cost function for the neural network.
+神经网络的代价函数
+
+3
+00:00:04,139 --> 00:00:07,079
+In this video, let's start to talk about an algorithm,
+在这个视频里 让我们来说说
+
+4
+00:00:07,200 --> 00:00:09,062
+for trying to minimize the cost function.
+让代价函数最小化的算法
+
+5
+00:00:09,240 --> 00:00:12,735
+In particular, we'll talk about the back propagation algorithm.
+具体来说 我们将主要讲解反向传播算法
+
+6
+00:00:13,834 --> 00:00:15,380
+Here's the cost function that
+这个就是我们上一个视频里写好的
+
+7
+00:00:15,520 --> 00:00:17,905
+we wrote down in the previous video.
+代价函数
+
+8
+00:00:17,972 --> 00:00:19,438
+What we'd like to do is
+我们要做的就是
+
+9
+00:00:19,484 --> 00:00:21,161
+try to find parameters theta
+设法找到参数
+
+10
+00:00:21,246 --> 00:00:23,440
+to try to minimize j of theta.
+使得J(θ)取到最小值
+
+11
+00:00:23,530 --> 00:00:25,782
+In order to use either gradient descent
+为了使用梯度下降法或者
+
+12
+00:00:25,832 --> 00:00:28,625
+or one of the advance optimization algorithms.
+其他某种高级优化算法
+
+13
+00:00:28,675 --> 00:00:30,206
+What we need to do therefore is
+我们需要做的就是
+
+14
+00:00:30,249 --> 00:00:31,598
+to write code that takes
+写好一个可以通过输入 参数 θ
+
+15
+00:00:31,645 --> 00:00:33,487
+this input the parameters theta
+然后计算 J(θ)
+
+16
+00:00:33,540 --> 00:00:34,965
+and computes j of theta
+和这些
+
+17
+00:00:35,014 --> 00:00:37,364
+and these partial derivative terms.
+偏导数项的代码
+
+18
+00:00:37,425 --> 00:00:38,763
+Remember, that the parameters
+记住 这些神经网络里
+
+19
+00:00:38,790 --> 00:00:40,710
+in the the neural network of these things,
+对应的参数
+
+20
+00:00:40,760 --> 00:00:43,435
+theta superscript l subscript ij,
+也就是 θ 上标 (l) 下标 ij 的参数
+
+21
+00:00:43,492 --> 00:00:44,868
+that's the real number
+这些都是实数
+
+22
+00:00:44,930 --> 00:00:47,185
+and so, these are the partial derivative terms
+所以这些都是我们需要计算的
+
+23
+00:00:47,249 --> 00:00:48,869
+we need to compute.
+偏导数项
+
+24
+00:00:48,900 --> 00:00:50,077
+In order to compute the
+为了计算代价函数
+
+25
+00:00:50,115 --> 00:00:51,840
+cost function j of theta,
+J(θ)
+
+26
+00:00:51,883 --> 00:00:53,986
+we just use this formula up here
+我们就是用上面这个公式
+
+27
+00:00:54,042 --> 00:00:55,617
+and so, what I want to do
+所以我们在本节视频里大部分时间
+
+28
+00:00:55,655 --> 00:00:56,850
+for the most of this video is
+想要做的都是
+
+29
+00:00:56,897 --> 00:00:58,595
+focus on talking about
+重点关注
+
+30
+00:00:58,636 --> 00:00:59,952
+how we can compute these
+如何计算这些
+
+31
+00:00:59,989 --> 00:01:01,994
+partial derivative terms.
+偏导数项
+
+32
+00:01:02,031 --> 00:01:03,812
+Let's start by talking about
+我们从只有一个
+
+33
+00:01:03,858 --> 00:01:05,512
+the case of when we have only
+训练样本的情况
+
+34
+00:01:05,556 --> 00:01:06,839
+one training example,
+开始说起
+
+35
+00:01:06,872 --> 00:01:09,385
+so imagine, if you will that our entire
+假设 我们整个训练集
+
+36
+00:01:09,432 --> 00:01:11,301
+training set comprises only one
+只包含一个训练样本
+
+37
+00:01:11,351 --> 00:01:14,006
+training example which is a pair xy.
+也就是实数对
+
+38
+00:01:14,049 --> 00:01:15,591
+I'm not going to write x1y1
+我这里不写成x(1) y(1)
+
+39
+00:01:15,629 --> 00:01:16,375
+just write this.
+就写成这样
+
+40
+00:01:16,410 --> 00:01:17,665
+Write a one training example
+把这一个训练样本记为 (x, y)
+
+41
+00:01:17,718 --> 00:01:19,980
+as xy and let's tap through
+让我们粗看一遍
+
+42
+00:01:20,031 --> 00:01:21,423
+the sequence of calculations
+使用这一个训练样本
+
+43
+00:01:21,462 --> 00:01:24,332
+we would do with this one training example.
+来计算的顺序
+
+44
+00:01:25,754 --> 00:01:27,129
+The first thing we do is
+首先我们
+
+45
+00:01:27,167 --> 00:01:29,175
+we apply forward propagation in
+应用前向传播方法来
+
+46
+00:01:29,212 --> 00:01:31,773
+order to compute whether a hypotheses
+计算一下在给定输入的时候
+
+47
+00:01:31,813 --> 00:01:34,238
+actually outputs given the input.
+假设函数是否会真的输出结果
+
+48
+00:01:34,272 --> 00:01:36,734
+Concretely, the called the
+具体地说 这里的
+
+49
+00:01:36,769 --> 00:01:39,025
+a(1) is the activation values
+a(1) 就是第一层的激励值
+
+50
+00:01:39,071 --> 00:01:41,541
+of this first layer that was the input there.
+也就是输入层在的地方
+
+51
+00:01:41,600 --> 00:01:43,452
+So, I'm going to set that to x
+所以我准备设定他为
+
+52
+00:01:43,505 --> 00:01:45,389
+and then we're going to compute
+然后我们来计算
+
+53
+00:01:45,435 --> 00:01:47,506
+z(2) equals theta(1) a(1)
+z(2) 等于 θ(1) 乘以 a(1)
+
+54
+00:01:47,552 --> 00:01:49,919
+and a(2) equals g, the sigmoid
+然后 a(2) 就等于 g(z(2)) 函数
+
+55
+00:01:49,980 --> 00:01:52,250
+activation function applied to z(2)
+其中g是一个S型激励函数
+
+56
+00:01:52,310 --> 00:01:53,753
+and this would give us our
+这就会计算出第一个
+
+57
+00:01:53,800 --> 00:01:56,115
+activations for the first middle layer.
+隐藏层的激励值
+
+58
+00:01:56,162 --> 00:01:58,208
+That is for layer two of the network
+也就是神经网络的第二层
+
+59
+00:01:58,241 --> 00:02:00,649
+and we also add those bias terms.
+我们还增加这个偏差项
+
+60
+00:02:01,315 --> 00:02:03,132
+Next we apply 2 more steps
+接下来我们再用2次
+
+61
+00:02:03,176 --> 00:02:04,966
+of this four and propagation
+前向传播
+
+62
+00:02:05,013 --> 00:02:08,328
+to compute a(3) and a(4)
+来计算出 a(3) 和 最后的 a(4)
+
+63
+00:02:08,360 --> 00:02:11,458
+which is also the upwards
+同样也就是假设函数
+
+64
+00:02:11,505 --> 00:02:14,089
+of a hypotheses h of x.
+h(x) 的输出
+
+65
+00:02:14,711 --> 00:02:18,103
+So this is our vectorized implementation of
+所以这里我们实现了把前向传播
+
+66
+00:02:18,145 --> 00:02:19,228
+forward propagation
+向量化
+
+67
+00:02:19,276 --> 00:02:20,888
+and it allows us to compute
+这使得我们可以计算
+
+68
+00:02:20,938 --> 00:02:22,280
+the activation values
+神经网络结构里的
+
+69
+00:02:22,345 --> 00:02:24,056
+for all of the neurons
+每一个神经元的
+
+70
+00:02:24,110 --> 00:02:25,948
+in our neural network.
+激励值
+
+71
+00:02:27,934 --> 00:02:29,608
+Next, in order to compute
+接下来
+
+72
+00:02:29,650 --> 00:02:30,967
+the derivatives, we're going to use
+为了计算导数项 我们将
+
+73
+00:02:31,026 --> 00:02:33,589
+an algorithm called back propagation.
+采用一种叫做反向传播(Backpropagation)的算法
+
+74
+00:02:34,904 --> 00:02:37,765
+The intuition of the back propagation algorithm
+反向传播算法从直观上说
+
+75
+00:02:37,807 --> 00:02:38,430
+is that for each note
+就是对每一个结点
+
+76
+00:02:38,430 --> 00:02:41,065
+we're going to compute the term
+我们计算这样一项
+
+77
+00:02:41,126 --> 00:02:43,642
+delta superscript l subscript j
+δ下标 j 上标(l)
+
+78
+00:02:43,676 --> 00:02:45,130
+that's going to somehow
+这就用某种形式
+
+79
+00:02:45,171 --> 00:02:46,310
+represent the error
+代表了第 l 层的第 j 个结点的
+
+80
+00:02:46,361 --> 00:02:48,511
+of note j in the layer l.
+误差
+
+81
+00:02:48,552 --> 00:02:49,682
+So, recall that
+我们还记得
+
+82
+00:02:49,716 --> 00:02:52,313
+a superscript l subscript j
+a 上标 (l) 下标 j
+
+83
+00:02:52,355 --> 00:02:54,138
+that does the activation of
+表示的是第 l 层第 j 个单元的
+
+84
+00:02:54,185 --> 00:02:56,182
+the j of unit in layer l
+激励值
+
+85
+00:02:56,224 --> 00:02:58,001
+and so, this delta term
+所以这个 δ 项
+
+86
+00:02:58,045 --> 00:02:59,037
+is in some sense
+在某种程度上
+
+87
+00:02:59,082 --> 00:03:00,978
+going to capture our error
+就捕捉到了我们
+
+88
+00:03:01,012 --> 00:03:03,618
+in the activation of that neural duo.
+在这个神经节点的激励值的误差
+
+89
+00:03:03,650 --> 00:03:05,798
+So, how we might wish the activation
+所以我们可能希望这个节点的
+
+90
+00:03:05,823 --> 00:03:07,975
+of that note is slightly different.
+激励值稍微不一样
+
+91
+00:03:08,047 --> 00:03:09,670
+Concretely, taking the example
+具体地讲 我们用
+
+92
+00:03:10,270 --> 00:03:11,100
+neural network that we have
+右边这个有四层
+
+93
+00:03:11,360 --> 00:03:12,700
+on the right which has four layers.
+的神经网络结构做例子
+
+94
+00:03:13,440 --> 00:03:15,710
+And so capital L is equal to 4.
+所以这里大写 L 等于4
+
+95
+00:03:16,060 --> 00:03:17,120
+For each output unit, we're going to compute this delta term.
+对于每一个输出单元 我们准备计算δ项
+
+96
+00:03:17,400 --> 00:03:19,130
+So, delta for the j of unit in the fourth layer is equal to
+所以第四层的第j个单元的δ就等于
+
+97
+00:03:23,380 --> 00:03:24,490
+just the activation of that
+这个单元的激励值
+
+98
+00:03:24,720 --> 00:03:26,350
+unit minus what was
+减去训练样本里的
+
+99
+00:03:26,490 --> 00:03:28,650
+the actual value of 0 in our training example.
+真实值0
+
+100
+00:03:29,900 --> 00:03:32,420
+So, this term here can
+所以这一项可以
+
+101
+00:03:32,580 --> 00:03:34,510
+also be written h of
+同样可以写成
+
+102
+00:03:34,710 --> 00:03:38,040
+x subscript j, right.
+h(x) 下标 j
+
+103
+00:03:38,330 --> 00:03:39,640
+So this delta term is just
+所以 δ 这一项就是
+
+104
+00:03:39,930 --> 00:03:40,900
+the difference between when a
+假设输出
+
+105
+00:03:41,290 --> 00:03:43,200
+hypotheses output and what
+和训练集y值
+
+106
+00:03:43,370 --> 00:03:44,870
+was the value of y
+之间的差
+
+107
+00:03:45,570 --> 00:03:46,900
+in our training set whereas
+这里
+
+108
+00:03:47,060 --> 00:03:48,610
+y subscript j is
+y 下标 j 就是
+
+109
+00:03:48,750 --> 00:03:49,910
+the j of element of the
+我们标记训练集里向量
+
+110
+00:03:50,090 --> 00:03:53,340
+vector value y in our labeled training set.
+的第j个元素的值
+
+111
+00:03:56,200 --> 00:03:57,790
+And by the way, if you
+顺便说一句
+
+112
+00:03:57,970 --> 00:04:00,460
+think of delta a and
+如果你把 δ a 和 y 这三个
+
+113
+00:04:01,000 --> 00:04:02,350
+y as vectors then you can
+都看做向量
+
+114
+00:04:02,520 --> 00:04:03,760
+also take those and come
+那么你可以同样这样写
+
+115
+00:04:04,030 --> 00:04:05,890
+up with a vectorized implementation of
+并且得出一个向量化的表达式
+
+116
+00:04:06,010 --> 00:04:07,310
+it, which is just
+也就是
+
+117
+00:04:07,690 --> 00:04:09,840
+delta 4 gets set as
+δ(4)等于
+
+118
+00:04:10,700 --> 00:04:14,330
+a4 minus y. Where
+a(4) 减去 y 这里
+
+119
+00:04:14,560 --> 00:04:15,820
+here, each of these delta
+每一个变量
+
+120
+00:04:16,540 --> 00:04:18,080
+4 a4 and y, each of
+也就是 δ(4) a(4) 和 y
+
+121
+00:04:18,180 --> 00:04:19,860
+these is a vector whose
+都是一个向量
+
+122
+00:04:20,640 --> 00:04:22,040
+dimension is equal to
+并且向量维数等于
+
+123
+00:04:22,250 --> 00:04:24,150
+the number of output units in our network.
+输出单元的数目
+
+124
+00:04:25,210 --> 00:04:26,880
+So we've now computed the
+所以现在我们计算出
+
+125
+00:04:27,320 --> 00:04:28,670
+era term's delta
+网络结构的
+
+126
+00:04:29,020 --> 00:04:30,170
+4 for our network.
+误差项 δ(4)
+
+127
+00:04:31,440 --> 00:04:32,950
+What we do next is compute
+我们下一步就是计算
+
+128
+00:04:33,620 --> 00:04:36,280
+the delta terms for the earlier layers in our network.
+网络中前面几层的误差项 δ
+
+129
+00:04:37,210 --> 00:04:38,690
+Here's a formula for computing delta
+这个就是计算 δ(3) 的公式
+
+130
+00:04:39,010 --> 00:04:39,830
+3 is delta 3 is equal
+δ(3) 等于
+
+131
+00:04:40,310 --> 00:04:42,050
+to theta 3 transpose times delta 4.
+θ(3) 的转置乘以 δ(4)
+
+132
+00:04:42,560 --> 00:04:44,190
+And this dot times, this
+然后这里的点乘
+
+133
+00:04:44,390 --> 00:04:46,390
+is the element y's multiplication operation
+这是我们从 MATLAB 里知道的
+
+134
+00:04:47,580 --> 00:04:48,380
+that we know from MATLAB.
+对 y 元素的乘法操作
+
+135
+00:04:49,160 --> 00:04:50,760
+So delta 3 transpose delta
+所以 θ(3) 转置乘以
+
+136
+00:04:51,020 --> 00:04:52,860
+4, that's a vector; g prime
+δ(4) 这是一个向量
+
+137
+00:04:53,480 --> 00:04:55,080
+z3 that's also a vector
+g'(z(3)) 同样也是一个向量
+
+138
+00:04:55,800 --> 00:04:57,370
+and so dot times is
+所以点乘就是
+
+139
+00:04:57,530 --> 00:04:59,670
+in element y's multiplication between these two vectors.
+两个向量的元素间对应相乘
+
+140
+00:05:01,460 --> 00:05:02,650
+This term g prime of
+其中这一项 g'(z(3))
+
+141
+00:05:02,740 --> 00:05:04,560
+z3, that formally is actually
+其实是对激励函数 g
+
+142
+00:05:04,950 --> 00:05:06,420
+the derivative of the activation
+在输入值为 z(3) 的时候
+
+143
+00:05:06,720 --> 00:05:08,740
+function g evaluated at
+所求的
+
+144
+00:05:08,890 --> 00:05:10,620
+the input values given by z3.
+导数
+
+145
+00:05:10,760 --> 00:05:12,620
+If you know calculus, you
+如果你掌握微积分的话
+
+146
+00:05:12,710 --> 00:05:13,470
+can try to work it out yourself
+你可以试着自己解出来
+
+147
+00:05:13,850 --> 00:05:16,100
+and see that you can simplify it to the same answer that I get.
+然后可以简化得到我这里的结果
+
+148
+00:05:16,860 --> 00:05:19,690
+But I'll just tell you pragmatically what that means.
+但是我只是从实际角度告诉你这是什么意思
+
+149
+00:05:20,000 --> 00:05:21,260
+What you do to compute this g
+你计算这个 g'
+
+150
+00:05:21,460 --> 00:05:23,310
+prime, these derivative terms is
+这个导数项其实是
+
+151
+00:05:23,510 --> 00:05:25,660
+just a3 dot times1
+a(3) 点乘 (1-a(3))
+
+152
+00:05:26,010 --> 00:05:27,900
+minus A3 where A3
+这里a(3)是
+
+153
+00:05:28,160 --> 00:05:29,420
+is the vector of activations.
+激励向量
+
+154
+00:05:30,150 --> 00:05:31,440
+1 is the vector of
+1是以1为元素的向量
+
+155
+00:05:31,600 --> 00:05:33,240
+ones and A3 is
+a(3) 又是
+
+156
+00:05:34,020 --> 00:05:35,970
+again the activation
+一个对那一层的
+
+157
+00:05:36,290 --> 00:05:38,850
+the vector of activation values for that layer.
+激励向量
+
+158
+00:05:39,170 --> 00:05:40,210
+Next you apply a similar
+接下来你应用一个相似的公式
+
+159
+00:05:40,540 --> 00:05:42,850
+formula to compute delta 2
+来计算 δ(2)
+
+160
+00:05:43,220 --> 00:05:45,230
+where again that can be
+同样这里可以利用一个
+
+161
+00:05:45,670 --> 00:05:47,410
+computed using a similar formula.
+相似的公式
+
+162
+00:05:48,450 --> 00:05:49,950
+Only now it is a2
+只是在这里
+
+163
+00:05:50,120 --> 00:05:53,850
+like so and I
+是 a(2)
+
+164
+00:05:53,960 --> 00:05:55,020
+then prove it here but you
+这里我并没有证明
+
+165
+00:05:55,110 --> 00:05:56,400
+can actually, it's possible to
+但是如果你懂微积分的话
+
+166
+00:05:56,490 --> 00:05:57,520
+prove it if you know calculus
+证明是完全可以做到的
+
+167
+00:05:58,240 --> 00:05:59,520
+that this expression is equal
+那么这个表达式从数学上讲
+
+168
+00:05:59,860 --> 00:06:02,010
+to mathematically, the derivative of
+就等于激励函数
+
+169
+00:06:02,190 --> 00:06:03,570
+the g function of the activation
+g函数的偏导数
+
+170
+00:06:04,040 --> 00:06:05,460
+function, which I'm denoting
+这里我用
+
+171
+00:06:05,910 --> 00:06:08,540
+by g prime. And finally,
+g‘来表示
+
+172
+00:06:09,270 --> 00:06:10,690
+that's it and there is
+最后 就到这儿结束了
+
+173
+00:06:10,860 --> 00:06:13,650
+no delta1 term, because the
+这里没有 δ(1) 项 因为
+
+174
+00:06:13,720 --> 00:06:15,590
+first layer corresponds to the
+第一次对应输入层
+
+175
+00:06:15,630 --> 00:06:16,940
+input layer and that's just the
+那只是表示
+
+176
+00:06:17,000 --> 00:06:18,200
+feature we observed in our
+我们在训练集观察到的
+
+177
+00:06:18,300 --> 00:06:20,380
+training sets, so that doesn't have any error associated with that.
+所以不会存在误差
+
+178
+00:06:20,600 --> 00:06:22,080
+It's not like, you know,
+这就是说
+
+179
+00:06:22,120 --> 00:06:23,680
+we don't really want to try to change those values.
+我们是不想改变这些值的
+
+180
+00:06:24,320 --> 00:06:25,240
+And so we have delta
+所以这个例子中我们的 δ 项就只有
+
+181
+00:06:25,510 --> 00:06:28,090
+terms only for layers 2, 3 and for this example.
+第2层和第3层
+
+182
+00:06:30,170 --> 00:06:32,120
+The name back propagation comes from
+反向传播法这个名字
+
+183
+00:06:32,170 --> 00:06:33,260
+the fact that we start by
+源于我们从
+
+184
+00:06:33,350 --> 00:06:34,720
+computing the delta term for
+输出层开始计算
+
+185
+00:06:34,740 --> 00:06:36,190
+the output layer and then
+δ项
+
+186
+00:06:36,370 --> 00:06:37,480
+we go back a layer and
+然后我们返回到上一层
+
+187
+00:06:37,880 --> 00:06:39,670
+compute the delta terms for the
+计算第三隐藏层的
+
+188
+00:06:39,850 --> 00:06:41,050
+third hidden layer and then we
+δ项 接着我们
+
+189
+00:06:41,180 --> 00:06:42,540
+go back another step to compute
+再往前一步来计算
+
+190
+00:06:42,770 --> 00:06:44,070
+delta 2 and so, we're sort of
+δ(2) 所以说
+
+191
+00:06:44,660 --> 00:06:46,060
+back propagating the errors from
+我们是类似于把输出层的误差
+
+192
+00:06:46,280 --> 00:06:47,270
+the output layer to layer 3
+反向传播给了第3层
+
+193
+00:06:47,650 --> 00:06:50,180
+to their to hence the name back complication.
+然后是再传到第二层 这就是反向传播的意思
+
+194
+00:06:51,270 --> 00:06:53,120
+Finally, the derivation is
+最后 这个推导过程是出奇的麻烦的
+
+195
+00:06:53,340 --> 00:06:56,510
+surprisingly complicated, surprisingly involved but
+出奇的复杂
+
+196
+00:06:56,820 --> 00:06:58,100
+if you just do this few steps
+但是如果你按照
+
+197
+00:06:58,280 --> 00:07:00,130
+steps of computation it is possible
+这样几个步骤计算
+
+198
+00:07:00,680 --> 00:07:02,540
+to prove viral frankly some
+就有可能简单直接地完成
+
+199
+00:07:02,810 --> 00:07:04,440
+what complicated mathematical proof.
+复杂的数学证明
+
+200
+00:07:05,200 --> 00:07:07,410
+It's possible to prove that if
+如果你忽略标准化所产生的项
+
+201
+00:07:07,560 --> 00:07:09,690
+you ignore authorization then the
+我们可以证明
+
+202
+00:07:09,800 --> 00:07:11,080
+partial derivative terms you want
+我们要求的偏导数项
+
+203
+00:07:12,220 --> 00:07:14,650
+are exactly given by the
+恰好就等于
+
+204
+00:07:14,780 --> 00:07:17,690
+activations and these delta terms.
+激励函数和这些 δ 项
+
+205
+00:07:17,870 --> 00:07:20,630
+This is ignoring lambda or
+这里我们忽略了 λ
+
+206
+00:07:20,780 --> 00:07:22,730
+alternatively the regularization
+或者说 标准化项
+
+207
+00:07:23,770 --> 00:07:24,630
+term lambda will
+λ 是等于
+
+208
+00:07:25,000 --> 00:07:25,170
+equal to 0.
+
+209
+00:07:25,680 --> 00:07:27,130
+We'll fix this detail later
+我们将在之后完善这一个
+
+210
+00:07:27,470 --> 00:07:29,430
+about the regularization term, but
+关于正则化项
+
+211
+00:07:29,620 --> 00:07:30,740
+so by performing back propagation
+所以到现在 我们通过
+
+212
+00:07:31,610 --> 00:07:32,820
+and computing these delta terms,
+反向传播 计算这些δ项
+
+213
+00:07:33,180 --> 00:07:34,240
+you can, you know, pretty
+可以非常快速的计算出
+
+214
+00:07:34,530 --> 00:07:36,320
+quickly compute these partial
+所有参数的
+
+215
+00:07:36,380 --> 00:07:38,150
+derivative terms for all of your parameters.
+偏导数项
+
+216
+00:07:38,920 --> 00:07:40,020
+So this is a lot of detail.
+好了 现在讲了很多细节了
+
+217
+00:07:40,570 --> 00:07:41,900
+Let's take everything and put
+现在让我们把所有内容整合在一起
+
+218
+00:07:42,320 --> 00:07:43,660
+it all together to talk about
+然后说说
+
+219
+00:07:44,120 --> 00:07:45,490
+how to implement back propagation
+如何实现反向传播算法
+
+220
+00:07:46,560 --> 00:07:48,590
+to compute derivatives with respect to your parameters.
+来计算关于这些参数的偏导数
+
+221
+00:07:49,790 --> 00:07:50,770
+And for the case of when
+当我们有
+
+222
+00:07:51,000 --> 00:07:52,460
+we have a large training
+一个非常大的训练样本时
+
+223
+00:07:52,830 --> 00:07:53,850
+set, not just a training
+而不是像我们例子里这样的一个训练样本
+
+224
+00:07:54,100 --> 00:07:56,320
+set of one example, here's what we do.
+我们是这样做的
+
+225
+00:07:57,290 --> 00:07:58,140
+Suppose we have a training
+假设我们有
+
+226
+00:07:58,270 --> 00:07:59,750
+set of m examples like
+m 个样本的训练集
+
+227
+00:07:59,900 --> 00:08:01,610
+that shown here.
+正如此处所写
+
+228
+00:08:01,850 --> 00:08:02,600
+The first thing we're going to do is
+我要做的第一件事就是
+
+229
+00:08:03,220 --> 00:08:04,560
+we're going to set these delta
+固定这些
+
+230
+00:08:05,100 --> 00:08:07,270
+l subscript i j. So this triangular symbol?
+带下标 i j 的 Δ
+
+231
+00:08:08,090 --> 00:08:09,990
+That's actually the capital Greek
+这其实是
+
+232
+00:08:10,310 --> 00:08:11,980
+alphabet delta .
+大写的希腊字母 δ
+
+233
+00:08:12,050 --> 00:08:14,080
+The symbol we had on the previous slide was the lower case delta.
+我们之前写的那个是小写
+
+234
+00:08:14,390 --> 00:08:16,810
+So the triangle is capital delta.
+这个三角形是大写的 Δ
+
+235
+00:08:17,430 --> 00:08:18,490
+We're gonna set this equal to zero
+我们将对每一个i
+
+236
+00:08:18,680 --> 00:08:21,930
+for all values of l i j.
+和 j 对应的 Δ 等于0
+
+237
+00:08:22,110 --> 00:08:23,850
+Eventually, this capital delta
+实际上 这些大写 Δij
+
+238
+00:08:24,530 --> 00:08:25,830
+l i j will be used
+会被用来计算
+
+239
+00:08:26,860 --> 00:08:29,920
+to compute the partial
+偏导数项
+
+240
+00:08:30,290 --> 00:08:31,570
+derivative term, partial derivative
+就是 J(θ)
+
+241
+00:08:32,380 --> 00:08:35,240
+respect to theta l i j of
+关于 θ 上标(l) 下标 i j 的
+
+242
+00:08:35,430 --> 00:08:37,190
+J of theta.
+偏导数
+
+243
+00:08:39,040 --> 00:08:40,210
+So as we'll see in
+所以 正如我们接下来看到的
+
+244
+00:08:40,480 --> 00:08:41,550
+a second, these deltas are going
+这些 δ
+
+245
+00:08:41,670 --> 00:08:43,700
+to be used as accumulators that
+会被作为累加项
+
+246
+00:08:43,950 --> 00:08:45,360
+will slowly add things in
+慢慢地增加
+
+247
+00:08:45,700 --> 00:08:47,130
+order to compute these partial derivatives.
+以算出这些偏导数
+
+248
+00:08:49,570 --> 00:08:51,920
+Next, we're going to loop through our training set.
+接下来我们将遍历我们的训练集
+
+249
+00:08:52,150 --> 00:08:53,270
+So, we'll say for i equals
+我们这样写
+
+250
+00:08:53,610 --> 00:08:55,400
+1 through m and so
+写成 For i = 1 to m
+
+251
+00:08:55,620 --> 00:08:57,270
+for the i iteration, we're
+对于第 i 个循环而言
+
+252
+00:08:57,410 --> 00:08:59,180
+going to working with the training example xi, yi.
+我们将取训练样本 (x(i), y(i))
+
+253
+00:09:00,480 --> 00:09:03,220
+So
+所以
+
+254
+00:09:03,720 --> 00:09:04,590
+the first thing we're going to do
+我们要做的第一件事是
+
+255
+00:09:04,690 --> 00:09:06,120
+is set a1 which is the
+设定a(1) 也就是
+
+256
+00:09:06,570 --> 00:09:07,830
+activations of the input layer,
+输入层的激励函数
+
+257
+00:09:08,190 --> 00:09:09,030
+set that to be equal to
+设定它等于 x(i)
+
+258
+00:09:09,950 --> 00:09:11,800
+xi is the inputs for our
+x(i) 是我们第 i 个训练样本的
+
+259
+00:09:12,670 --> 00:09:15,070
+i training example, and then
+输入值
+
+260
+00:09:15,340 --> 00:09:17,590
+we're going to perform forward propagation to
+接下来我们运用正向传播
+
+261
+00:09:17,730 --> 00:09:19,400
+compute the activations for
+来计算第二层的激励值
+
+262
+00:09:19,790 --> 00:09:20,900
+layer two, layer three and so
+然后是第三层 第四层
+
+263
+00:09:21,170 --> 00:09:22,050
+on up to the final
+一直这样
+
+264
+00:09:22,500 --> 00:09:25,190
+layer, layer capital L. Next,
+到最后一层 L层
+
+265
+00:09:25,570 --> 00:09:26,970
+we're going to use the output
+接下来 我们将用
+
+266
+00:09:27,280 --> 00:09:28,530
+label yi from this
+我们这个样本的
+
+267
+00:09:28,680 --> 00:09:29,870
+specific example we're looking
+输出值 y(i)
+
+268
+00:09:30,340 --> 00:09:31,650
+at to compute the error
+来计算这个输出值
+
+269
+00:09:31,950 --> 00:09:34,140
+term for delta L for the output there.
+所对应的误差项 δ(L)
+
+270
+00:09:34,480 --> 00:09:35,730
+So delta L is what
+所以 δ(L) 就是
+
+271
+00:09:35,880 --> 00:09:38,190
+a hypotheses output minus what
+假设输出减去
+
+272
+00:09:38,660 --> 00:09:39,870
+the target label was?
+目标输出
+
+273
+00:09:41,840 --> 00:09:42,560
+And then we're going to use
+接下来 我们将
+
+274
+00:09:42,850 --> 00:09:44,550
+the back propagation algorithm to
+运用反向传播算法
+
+275
+00:09:44,740 --> 00:09:46,020
+compute delta L minus 1,
+来计算 δ(L-1)
+
+276
+00:09:46,220 --> 00:09:47,250
+delta L minus 2, and
+δ(L-2)
+
+277
+00:09:47,350 --> 00:09:49,880
+so on down to delta 2 and once again
+一直这样直到 δ(2)
+
+278
+00:09:50,270 --> 00:09:51,380
+there is now delta 1 because
+再强调一下 这里没有 δ(1)
+
+279
+00:09:51,460 --> 00:09:54,380
+we don't associate an error term with the input layer.
+因为我们不需要对输入层考虑误差项
+
+280
+00:09:57,000 --> 00:09:58,160
+And finally, we're going to
+最后我们将用
+
+281
+00:09:58,340 --> 00:10:00,650
+use these capital delta terms
+这些大写的 Δ
+
+282
+00:10:01,190 --> 00:10:02,800
+to accumulate these partial derivative
+来累积我们在前面写好的
+
+283
+00:10:03,400 --> 00:10:05,670
+terms that we wrote down on the previous line.
+偏导数项
+
+284
+00:10:06,870 --> 00:10:07,870
+And by the way, if you
+顺便说一下
+
+285
+00:10:07,960 --> 00:10:11,340
+look at this expression, it's possible to vectorize this too.
+如果你再看下这个表达式 你可以把它写成向量形式
+
+286
+00:10:12,020 --> 00:10:13,040
+Concretely, if you think
+具体地说
+
+287
+00:10:13,310 --> 00:10:14,860
+of delta ij as
+如果你把
+
+288
+00:10:15,000 --> 00:10:18,090
+a matrix, indexed by subscript ij.
+δij 看作一个矩阵 i j代表矩阵中的位置
+
+289
+00:10:19,220 --> 00:10:20,590
+Then, if delta L is
+那么 如果 δ(L) 是一个矩阵
+
+290
+00:10:20,780 --> 00:10:22,040
+a matrix we can rewrite
+我们就可以写成
+
+291
+00:10:22,130 --> 00:10:24,100
+this as delta L, gets
+Δ(l) 等于
+
+292
+00:10:24,350 --> 00:10:26,710
+updated as delta L plus
+Δ(l) 加上
+
+293
+00:10:27,830 --> 00:10:29,370
+lower case delta L plus
+小写的 δ(l+1)
+
+294
+00:10:29,640 --> 00:10:32,780
+one times aL transpose.
+乘以 a(l) 的转置
+
+295
+00:10:33,570 --> 00:10:35,380
+So that's a vectorized implementation of
+这就是用向量化的形式
+
+296
+00:10:35,520 --> 00:10:37,150
+this that automatically does
+实现了对所有 i 和 j
+
+297
+00:10:37,590 --> 00:10:38,850
+an update for all values of
+的自动更新值
+
+298
+00:10:39,010 --> 00:10:41,250
+i and j. Finally, after
+最后
+
+299
+00:10:41,500 --> 00:10:43,480
+executing the body of
+执行这个 for 循环体之后
+
+300
+00:10:43,580 --> 00:10:45,350
+the four-loop we then go outside the four-loop
+我们跳出这个 for 循环
+
+301
+00:10:46,330 --> 00:10:47,000
+and we compute the following.
+然后计算下面这些式子
+
+302
+00:10:47,440 --> 00:10:49,690
+We compute capital D as
+我们按照如下公式计算
+
+303
+00:10:50,020 --> 00:10:51,400
+follows and we have
+大写
+
+304
+00:10:51,510 --> 00:10:52,750
+two separate cases for j
+我们对于 j=0 和 j≠0
+
+305
+00:10:52,980 --> 00:10:54,890
+equals zero and j not equals zero.
+分两种情况讨论
+
+306
+00:10:56,080 --> 00:10:57,250
+The case of j equals zero
+在 j=0 的情况下
+
+307
+00:10:57,680 --> 00:10:58,730
+corresponds to the bias
+对应偏差项
+
+308
+00:10:59,150 --> 00:11:00,030
+term so when j equals
+所以当 j=0 的时候
+
+309
+00:11:00,390 --> 00:11:01,320
+zero that's why we're missing
+这就是为什么
+
+310
+00:11:01,800 --> 00:11:03,320
+is an extra regularization term.
+我们没有写额外的标准化项
+
+311
+00:11:05,470 --> 00:11:06,850
+Finally, while the formal proof
+最后 尽管严格的证明对于
+
+312
+00:11:07,180 --> 00:11:08,970
+is pretty complicated what you
+你来说太复杂
+
+313
+00:11:09,030 --> 00:11:10,410
+can show is that once
+你现在可以说明的是
+
+314
+00:11:10,640 --> 00:11:12,530
+you've computed these D terms,
+一旦你计算出来了这些
+
+315
+00:11:13,510 --> 00:11:15,230
+that is exactly the partial
+这就正好是
+
+316
+00:11:15,640 --> 00:11:17,610
+derivative of the cost
+代价函数关于
+
+317
+00:11:17,920 --> 00:11:19,230
+function with respect to each
+每一个参数的偏导数
+
+318
+00:11:19,470 --> 00:11:20,890
+of your perimeters and so you
+所以你可以把他们用在
+
+319
+00:11:21,040 --> 00:11:22,470
+can use those in either gradient
+梯度下降法
+
+320
+00:11:22,610 --> 00:11:23,530
+descent or in one of the advanced authorization
+或者其他一种更高级的
+
+321
+00:11:25,450 --> 00:11:25,450
+algorithms.
+优化算法上
+
+322
+00:11:28,310 --> 00:11:29,360
+So that's the back propagation
+这就是反向传播算法
+
+323
+00:11:29,990 --> 00:11:31,110
+algorithm and how you compute
+以及你如何计算
+
+324
+00:11:31,470 --> 00:11:33,080
+derivatives of your cost
+神经网络代价函数的
+
+325
+00:11:33,340 --> 00:11:34,710
+function for a neural network.
+偏导数
+
+326
+00:11:35,470 --> 00:11:36,330
+I know this looks like this
+我知道这个里面
+
+327
+00:11:36,470 --> 00:11:38,810
+was a lot of details and this was a lot of steps strung together.
+细节琐碎 步骤繁多
+
+328
+00:11:39,460 --> 00:11:40,770
+But both in the programming
+但是在后面的编程作业
+
+329
+00:11:41,100 --> 00:11:43,010
+assignments write out and later
+和后续的视频里
+
+330
+00:11:43,110 --> 00:11:44,580
+in this video, we'll give
+我都会给你一个
+
+331
+00:11:44,720 --> 00:11:45,900
+you a summary of this so
+清晰的总结
+
+332
+00:11:46,050 --> 00:11:46,830
+we can have all the pieces
+这样我们就可以把算法的所有细节
+
+333
+00:11:47,260 --> 00:11:48,780
+of the algorithm together so that
+拼合到一起
+
+334
+00:11:48,920 --> 00:11:50,550
+you know exactly what you need
+这样 当你想运用反向传播算法
+
+335
+00:11:50,610 --> 00:11:51,760
+to implement if you want
+来计算你的神经网络的代价函数
+
+336
+00:11:51,940 --> 00:11:53,460
+to implement back propagation to compute
+关于这些参数的偏导数的时候
+
+337
+00:11:53,890 --> 00:11:56,432
+the derivatives of your neural network's
+你就会清晰地知道
+
+338
+00:11:56,574 --> 00:11:59,348
+cost function with respect to those parameters.
+你要的是什么
+
diff --git a/srt/9 - 3 - Backpropagation Intuition (13 min).srt b/srt/9 - 3 - Backpropagation Intuition (13 min).srt
new file mode 100644
index 00000000..b80008aa
--- /dev/null
+++ b/srt/9 - 3 - Backpropagation Intuition (13 min).srt
@@ -0,0 +1,1846 @@
+1
+00:00:00,260 --> 00:00:03,120
+In the previous video, we talked about the back propagation algorithm.
+在上一段视频中 我们介绍了反向传播算法
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:04,230 --> 00:00:05,090
+To a lot of people
+对很多人来说
+
+3
+00:00:05,220 --> 00:00:06,140
+seeing it for the first time,
+当第一次看到这种算法时
+
+4
+00:00:06,460 --> 00:00:07,610
+the first impression is often
+第一印象通常是
+
+5
+00:00:08,070 --> 00:00:09,250
+that wow, this is
+哇哦
+
+6
+00:00:09,380 --> 00:00:11,650
+a very complicated algorithm and there are all
+这个算法需要那么多繁杂的步骤
+
+7
+00:00:11,970 --> 00:00:12,990
+these different steps. And I'm
+简直是太复杂了
+
+8
+00:00:13,130 --> 00:00:13,980
+not quite sure how they fit
+实在不知道这些步骤
+
+9
+00:00:14,180 --> 00:00:15,130
+together and its like kind
+到底应该如何合在一起使用
+
+10
+00:00:15,400 --> 00:00:17,830
+of a black box with all these complicated steps.
+就好像一个黑箱 里面充满了复杂的步骤
+
+11
+00:00:18,130 --> 00:00:18,830
+In case that 's how you are
+如果你对反向传播算法
+
+12
+00:00:18,870 --> 00:00:20,460
+feeling about back propagation, that's
+也有这种感受的话
+
+13
+00:00:20,860 --> 00:00:22,100
+actually okay.
+这其实是正常的
+
+14
+00:00:22,740 --> 00:00:24,100
+Back propagation may be unfortunately
+相比于线性回归算法
+
+15
+00:00:24,970 --> 00:00:26,920
+is a less mathematically clean or
+和逻辑回归算法而言
+
+16
+00:00:27,060 --> 00:00:28,520
+less mathematically simple algorithm
+从数学的角度上讲
+
+17
+00:00:28,860 --> 00:00:30,680
+compared to linear regression or
+反向传播算法
+
+18
+00:00:31,130 --> 00:00:32,850
+logistic regression, and I've
+似乎并不简洁
+
+19
+00:00:33,020 --> 00:00:35,560
+actually used back propagation, you know, pretty
+对于反向传播这种算法
+
+20
+00:00:36,080 --> 00:00:37,310
+successfully for many years and
+其实我已经使用了很多年了
+
+21
+00:00:37,530 --> 00:00:39,130
+even today, I still don't sometimes
+但即便如此
+
+22
+00:00:39,510 --> 00:00:40,320
+feel like I have a very
+即使是现在 我也经常感觉
+
+23
+00:00:40,430 --> 00:00:41,790
+good sense of just what
+自己对反向传播算法的理解并不是十分深入
+
+24
+00:00:42,130 --> 00:00:43,580
+it's doing most of intuition about
+对于反向传播算法究竟是如何执行的
+
+25
+00:00:43,830 --> 00:00:45,980
+what background propagation is doing.
+并没有一个很直观的理解
+
+26
+00:00:46,740 --> 00:00:47,850
+For those of you that are doing
+做过编程练习的同学
+
+27
+00:00:48,250 --> 00:00:49,920
+the programming exercises that will
+应该可以感受到
+
+28
+00:00:50,480 --> 00:00:51,970
+at least mechanically step you
+这些练习或多或少能帮助你
+
+29
+00:00:52,280 --> 00:00:53,710
+through the different steps of
+将这些复杂的步骤梳理了一遍
+
+30
+00:00:53,810 --> 00:00:54,910
+how to implement back prop
+巩固了反向传播算法具体是如何实现的
+
+31
+00:00:55,200 --> 00:00:56,860
+so you will be able to get it to work for yourself.
+这样你才能自己掌握这种算法
+
+32
+00:00:57,910 --> 00:00:58,850
+And what I want to do
+在这段视频中
+
+33
+00:00:58,970 --> 00:01:00,170
+in this video is look a
+我想更加深入地
+
+34
+00:01:00,460 --> 00:01:01,750
+little bit more at the
+讨论一下
+
+35
+00:01:02,190 --> 00:01:03,640
+mechanical steps of back propagation
+反向传播算法的这些复杂的步骤
+
+36
+00:01:04,160 --> 00:01:05,620
+and try to give you a little
+并且希望给你一个
+
+37
+00:01:05,840 --> 00:01:07,450
+more intuition about what the
+更加全面直观的感受
+
+38
+00:01:07,930 --> 00:01:09,080
+mechanical steps of back prop
+理解这些步骤究竟是在做什么
+
+39
+00:01:09,250 --> 00:01:10,590
+is doing to hopefully convince you
+也希望通过这段视频
+
+40
+00:01:10,790 --> 00:01:12,530
+that, you know, it is at least a reasonable algorithm.
+你能理解 它至少还是一个合理的算法
+
+41
+00:01:14,680 --> 00:01:16,240
+In case even after this video, in
+但可能你即使看了这段视频
+
+42
+00:01:16,380 --> 00:01:18,000
+case back propagation still seems
+你还是觉得
+
+43
+00:01:18,760 --> 00:01:19,920
+very black box and kind
+反向传播依然很复杂
+
+44
+00:01:20,160 --> 00:01:21,600
+of like, you know, too many complicated
+依然像一个黑箱 太多复杂的步骤
+
+45
+00:01:22,150 --> 00:01:23,230
+steps, a little bit magical to to
+依然感到有点神奇
+
+46
+00:01:23,330 --> 00:01:24,740
+you, that's actually okay.
+这也是没关系的
+
+47
+00:01:24,930 --> 00:01:26,760
+And even though,
+我说了
+
+48
+00:01:27,050 --> 00:01:27,840
+you know, I have used back prop
+即使是我 接触反向传播这么多年了
+
+49
+00:01:28,070 --> 00:01:31,590
+for many years, sometimes it's a difficult algorithm to understand.
+有时候仍然觉得这是一个难以理解的算法
+
+50
+00:01:32,310 --> 00:01:34,140
+But hopefully this video will help a little bit.
+但还是希望这段视频能有些许帮助
+
+51
+00:01:36,410 --> 00:01:37,970
+In order to better understand back
+为了更好地理解
+
+52
+00:01:38,190 --> 00:01:39,660
+propagation, let's take another
+反向传播算法
+
+53
+00:01:40,100 --> 00:01:42,290
+closer look at what forward propagation is doing.
+我们再来仔细研究一下前向传播的原理
+
+54
+00:01:43,170 --> 00:01:44,420
+Here's the neural network with two
+幻灯片所示的神经网络
+
+55
+00:01:44,770 --> 00:01:46,070
+input units that is not
+包含两个输入单元
+
+56
+00:01:46,390 --> 00:01:48,480
+counting the bias unit, and
+这不包括偏差单元
+
+57
+00:01:48,700 --> 00:01:50,300
+two hidden units in this
+在第二层有两个隐藏单元
+
+58
+00:01:50,500 --> 00:01:51,590
+layer and two hidden units
+在下一层也有两个隐藏单元
+
+59
+00:01:52,030 --> 00:01:53,490
+in the next layer, and then
+最后的输出层
+
+60
+00:01:53,640 --> 00:01:55,090
+finally one output unit.
+有一个输出单元
+
+61
+00:01:55,520 --> 00:01:57,800
+And again, these counts 2,
+再提醒一下 这里说的2 2 2
+
+62
+00:01:57,920 --> 00:02:00,240
+2, 2 are not counting these bias units on top.
+都不算顶上附加的偏差单元+1
+
+63
+00:02:01,520 --> 00:02:03,170
+In order to illustrate forward
+为了更清楚地展示前向传播
+
+64
+00:02:03,430 --> 00:02:04,570
+propagation, I'm going to
+我想把这个网络
+
+65
+00:02:04,690 --> 00:02:06,080
+draw this network a little bit differently.
+画得稍微不同一些
+
+66
+00:02:08,040 --> 00:02:09,180
+And in particular, I'm going to
+具体来讲
+
+67
+00:02:09,370 --> 00:02:10,840
+draw this neural network with the
+我把这个神经网络的节点
+
+68
+00:02:10,930 --> 00:02:12,620
+nodes drawn as these very
+都画成椭圆型
+
+69
+00:02:12,920 --> 00:02:15,010
+fat ellipses, so that I can write text in them.
+以便在节点里面写字
+
+70
+00:02:15,840 --> 00:02:16,800
+When performing forward propagation,
+在进行前向传播时
+
+71
+00:02:17,600 --> 00:02:18,900
+we might have some particular example,
+我们可以用一个具体的例子说明
+
+72
+00:02:19,760 --> 00:02:21,190
+say some example x(i) comma
+假如说 训练样本
+
+73
+00:02:21,610 --> 00:02:22,990
+y(i) and it will
+x(i) y(i)
+
+74
+00:02:23,080 --> 00:02:24,550
+be this x(i) that we
+那么这里的 x(i)
+
+75
+00:02:24,740 --> 00:02:26,460
+feed into the input layer, so
+将被传入输入层
+
+76
+00:02:27,080 --> 00:02:28,850
+that this may be,
+因此这里就是
+
+77
+00:02:29,110 --> 00:02:30,290
+x(i)1 and x(i)2 are the
+x(i)1 和 x(i)2
+
+78
+00:02:30,440 --> 00:02:31,360
+values we set the input
+这是我们输入层的值
+
+79
+00:02:31,510 --> 00:02:32,870
+layer to and when we
+那么
+
+80
+00:02:33,010 --> 00:02:34,350
+forward propagate it to the
+当我们进行前向传播
+
+81
+00:02:34,650 --> 00:02:36,210
+first hidden layer here, what
+传播到第一个隐藏层时
+
+82
+00:02:36,360 --> 00:02:38,070
+we do is compute z(2)1 and
+我们的做法是 算出 z(2)1
+
+83
+00:02:39,370 --> 00:02:42,900
+z(2)2, so these are the
+和 z(2)2
+
+84
+00:02:43,770 --> 00:02:45,010
+weighted sum of inputs of the
+因此这两个值
+
+85
+00:02:45,260 --> 00:02:47,000
+input units and then
+是输入单元的加权总和
+
+86
+00:02:47,230 --> 00:02:48,680
+we apply the sigmoid of
+接下来
+
+87
+00:02:48,940 --> 00:02:50,670
+the logistic function and the
+我们将S型的逻辑函数
+
+88
+00:02:51,940 --> 00:02:53,630
+sigmoid activation function applied
+和S型的激励函数
+
+89
+00:02:54,050 --> 00:02:55,670
+to the z value, gives us
+应用到z值上
+
+90
+00:02:55,960 --> 00:02:57,520
+these activation values.
+得出了这样的激励值
+
+91
+00:02:57,880 --> 00:02:59,670
+So that gives us a(2)1
+因此我们得到 a(2)1
+
+92
+00:02:59,870 --> 00:03:01,160
+and a(2)2, and then we
+和 a(2)2 的值
+
+93
+00:03:01,260 --> 00:03:02,500
+forward propagate again to get,
+然后 再做一次前向传播
+
+94
+00:03:03,940 --> 00:03:05,570
+you know, here z(3)1,
+这里的 z(3)1
+
+95
+00:03:06,010 --> 00:03:07,500
+apply the sigmoid of the
+应用S型的逻辑函数
+
+96
+00:03:07,690 --> 00:03:09,500
+logistic function, the activation function
+和激励函数
+
+97
+00:03:10,080 --> 00:03:11,200
+to that, to get the
+得到 a(3)1
+
+98
+00:03:11,240 --> 00:03:14,310
+31 and similarly Like so,
+类似这样进行下去
+
+99
+00:03:15,580 --> 00:03:17,850
+until we get z(4)1, apply the
+最后我们得到 z(4)1
+
+100
+00:03:18,080 --> 00:03:19,450
+activation function this gives
+应用激励函数
+
+101
+00:03:19,630 --> 00:03:20,940
+us a(4)1 which is the
+得到 a(4)1
+
+102
+00:03:21,630 --> 00:03:23,030
+finer output value of the network.
+这也是这个网络的输出单元的值
+
+103
+00:03:24,860 --> 00:03:25,920
+Let's erase this arrow to
+我把这个箭头擦掉
+
+104
+00:03:26,040 --> 00:03:28,490
+give myself some space, and if
+这样留点书写空间
+
+105
+00:03:28,620 --> 00:03:30,170
+you look at what this
+那么
+
+106
+00:03:30,610 --> 00:03:32,280
+computation really is doing, focusing
+如果你仔细看这里的计算
+
+107
+00:03:32,780 --> 00:03:33,970
+on this hidden unit
+关注这一层的隐藏单元
+
+108
+00:03:34,400 --> 00:03:35,860
+lets say we have that
+我们知道了这个权值
+
+109
+00:03:36,090 --> 00:03:37,770
+this weight, shown in
+这里用桃红色表示的
+
+110
+00:03:37,870 --> 00:03:39,500
+magenta there, is my
+这是我们的权值
+
+111
+00:03:39,700 --> 00:03:42,820
+weight theta 2(1)0 the
+θ(2)10
+
+112
+00:03:43,090 --> 00:03:45,930
+indexing is not important, and this
+这里的角标不重要
+
+113
+00:03:46,140 --> 00:03:47,440
+way here which I guess
+而这里的权值
+
+114
+00:03:47,570 --> 00:03:49,270
+I am highlighting in red, that
+我用红色来标记的
+
+115
+00:03:49,630 --> 00:03:51,290
+is theta 2(1)1 and
+是θ(2)11
+
+116
+00:03:52,870 --> 00:03:53,970
+this weight here, which I'm
+而这里的权值
+
+117
+00:03:54,050 --> 00:03:55,370
+drawing in green, in a cyan,
+我用青色表示的
+
+118
+00:03:55,720 --> 00:03:59,530
+is theta 2(1)2 so
+是θ(2)12
+
+119
+00:04:00,410 --> 00:04:01,970
+the way the computers value z(3)1
+因此要计算 z(3)1
+
+120
+00:04:02,540 --> 00:04:05,230
+is z(3)1 is as
+z(3)1 的值等于
+
+121
+00:04:05,410 --> 00:04:09,120
+equal to this Weight,
+这个桃红色的权值
+
+122
+00:04:10,430 --> 00:04:11,840
+times this value so that's
+乘以这个值
+
+123
+00:04:13,070 --> 00:04:14,970
+theta 2(1)0 times 1,
+也就是θ(2)10 乘上1
+
+124
+00:04:16,240 --> 00:04:19,190
+plus this red
+加上这个红色的权值
+
+125
+00:04:19,410 --> 00:04:21,480
+weight times this value, so
+乘以这个值
+
+126
+00:04:21,670 --> 00:04:23,690
+that's theta 2(1)1 times
+也就是θ(2)11
+
+127
+00:04:25,270 --> 00:04:28,520
+a(2)1, and finally this
+乘上a(2)1
+
+128
+00:04:28,860 --> 00:04:30,140
+cyan red times this value,
+最后是青色的权值乘上这个值
+
+129
+00:04:30,660 --> 00:04:33,950
+which is therefore, plus theta
+也就是
+
+130
+00:04:35,120 --> 00:04:37,300
+2(1)2 times a(2)1.
+θ(2)12乘以a(2)1
+
+131
+00:04:38,870 --> 00:04:40,170
+And so that's forward propagation.
+那么这就是前向传播
+
+132
+00:04:42,410 --> 00:04:43,680
+And it turns out that, as
+事实上
+
+133
+00:04:43,870 --> 00:04:44,730
+we see later on in this
+正如我们后面将会看到的
+
+134
+00:04:44,790 --> 00:04:46,140
+video, what back propagation
+反向传播的做法
+
+135
+00:04:46,530 --> 00:04:47,730
+is doing, is doing a
+其过程
+
+136
+00:04:47,780 --> 00:04:49,120
+process very similar to
+非常类似于此
+
+137
+00:04:49,300 --> 00:04:50,860
+this, except that instead of
+只有计算的方向不同而已
+
+138
+00:04:50,950 --> 00:04:53,120
+the computations flowing from the
+与这里前向传播的方向从左至右
+
+139
+00:04:53,360 --> 00:04:54,270
+left to the right of this network,
+不同的是
+
+140
+00:04:55,250 --> 00:04:56,510
+the computations is there flow
+反向传播的算法中
+
+141
+00:04:56,940 --> 00:04:58,070
+from the right to the
+计算的方向是
+
+142
+00:04:58,220 --> 00:04:59,720
+left of the network, and using
+从右往左的
+
+143
+00:05:00,050 --> 00:05:02,170
+a very similar computation as this,
+但计算的过程是完全类似的
+
+144
+00:05:02,430 --> 00:05:03,710
+and I'll say in two
+在接下来的两页幻灯片中
+
+145
+00:05:03,920 --> 00:05:05,260
+slides exactly what I mean by that.
+我会详细地讲解
+
+146
+00:05:06,400 --> 00:05:07,880
+To better understand what back
+为了更好地理解
+
+147
+00:05:08,070 --> 00:05:09,710
+propagation is doing, let's look
+反向传播算法的原理
+
+148
+00:05:09,780 --> 00:05:10,920
+at the cost function, it's just the
+我们把目光转向代价函数
+
+149
+00:05:11,070 --> 00:05:12,270
+cost function that we had for
+这个代价函数
+
+150
+00:05:12,670 --> 00:05:14,950
+when we have only one output unit.
+对应的情况是只有一个输出单元
+
+151
+00:05:15,350 --> 00:05:16,300
+If we have more than
+如果我们有不止一个
+
+152
+00:05:16,400 --> 00:05:17,410
+one output unit, we just
+输出单元的话
+
+153
+00:05:17,820 --> 00:05:19,850
+have a summation, you know, over the
+只需要对所有的输出单元
+
+154
+00:05:19,930 --> 00:05:22,170
+output units index, if only
+进行一次求和运算
+
+155
+00:05:22,370 --> 00:05:25,990
+one output unit then this
+但如果只有一个输出单元时
+
+156
+00:05:26,190 --> 00:05:27,490
+is a cost function operation and
+代价函数就是这样
+
+157
+00:05:27,610 --> 00:05:30,340
+we do forward propagation and back propagation on one example at a time.
+我们用同一个样本同时来做正向和反向传播
+
+158
+00:05:30,560 --> 00:05:31,440
+So, let's just focus on the
+那么 请注意这组训练样本
+
+159
+00:05:31,770 --> 00:05:34,770
+single example x(i)y(i), and focus
+x(i) y(i)
+
+160
+00:05:35,360 --> 00:05:36,480
+on the case of having one output
+注意这种只有一个输出单元的情况
+
+161
+00:05:36,810 --> 00:05:38,390
+unit so y(i) here
+那么这里的 y(i)
+
+162
+00:05:38,660 --> 00:05:40,390
+is just a real number, and
+就是一个实数
+
+163
+00:05:40,680 --> 00:05:42,790
+let's ignore regularization, so lambda
+如果不考虑正则化
+
+164
+00:05:43,010 --> 00:05:44,300
+equals zero, and this final
+也就是说 λ 等于0
+
+165
+00:05:44,640 --> 00:05:46,480
+term, that regularization term goes away.
+因此最后的正则化项就没有了
+
+166
+00:05:47,320 --> 00:05:48,220
+Now, if you look inside
+好的 那么如果你观察
+
+167
+00:05:48,730 --> 00:05:50,480
+this summation, you find that
+这个求和运算括号里面
+
+168
+00:05:50,780 --> 00:05:53,290
+the cost term associated with
+与第i个训练样本对应的
+
+169
+00:05:53,450 --> 00:05:54,980
+the I'f training example, that
+代价项
+
+170
+00:05:55,190 --> 00:05:57,230
+is, the cost associated with training
+也就是说
+
+171
+00:05:58,040 --> 00:06:00,420
+example x(i)y(i), that's
+和训练样本 x(i) y(i) 对应的代价项
+
+172
+00:06:00,540 --> 00:06:01,820
+going to be given by this expression, that the
+将由这个式子确定
+
+173
+00:06:02,030 --> 00:06:03,270
+cost, sort of, of training example
+因此 第 i 个样本的代价值
+
+174
+00:06:03,810 --> 00:06:04,910
+i is written as follows.
+可以写成如下的形式
+
+175
+00:06:06,080 --> 00:06:07,320
+And what this cost
+而这个代价函数
+
+176
+00:06:07,650 --> 00:06:08,650
+function does, is it plays
+所扮演的角色
+
+177
+00:06:09,080 --> 00:06:10,580
+a role similar to the square error.
+可以看作是平方误差
+
+178
+00:06:10,750 --> 00:06:11,530
+So, rather than looking at this
+因此 我们不必关心
+
+179
+00:06:12,190 --> 00:06:14,050
+complicated expression, if you
+这个复杂的表达式
+
+180
+00:06:14,170 --> 00:06:15,380
+want you can think of cos
+当然如果你愿意
+
+181
+00:06:15,620 --> 00:06:17,600
+of i being approximately, you know, the square of
+你可以把 cost(i) 想成是
+
+182
+00:06:18,020 --> 00:06:19,310
+difference between or the
+该神经网络输出值
+
+183
+00:06:19,430 --> 00:06:20,870
+neural network outputs versus what
+与实际值的
+
+184
+00:06:21,170 --> 00:06:22,980
+is the actual value. Just as
+差的平方
+
+185
+00:06:23,150 --> 00:06:24,340
+in logistic regression, we actually
+就像在逻辑回归中
+
+186
+00:06:24,620 --> 00:06:25,510
+prefer to use this slightly
+我们选择稍微复杂的一点的
+
+187
+00:06:25,830 --> 00:06:27,060
+more complicated cost function using
+代价函数
+
+188
+00:06:27,370 --> 00:06:28,580
+the log, but for the
+log函数
+
+189
+00:06:28,640 --> 00:06:30,230
+purpose of intuition, feel free
+但为了容易理解
+
+190
+00:06:30,570 --> 00:06:31,440
+to think of the cost function
+可以把这个代价函数
+
+191
+00:06:32,000 --> 00:06:32,750
+as being sort of the squared
+看作是某种
+
+192
+00:06:33,250 --> 00:06:35,000
+error cost function, and so
+平方误差函数
+
+193
+00:06:35,220 --> 00:06:36,870
+this cos of i measures how
+因此 这里的cos(i)
+
+194
+00:06:37,110 --> 00:06:38,780
+well is the network doing on
+表征了该神经网络
+
+195
+00:06:38,880 --> 00:06:40,600
+correctly predicting example i.
+是否能准确地预测样本i的值
+
+196
+00:06:40,840 --> 00:06:42,000
+How close is the output
+也就是输出值
+
+197
+00:06:42,810 --> 00:06:44,640
+to the actually observed label y(i).
+和实际观测值y(i)的接近程度
+
+198
+00:06:45,590 --> 00:06:47,610
+Now let's look at what back propagation is doing.
+现在我们来看反向传播是怎么做的
+
+199
+00:06:48,420 --> 00:06:50,170
+One useful intuition is that
+一种直观的理解是
+
+200
+00:06:51,190 --> 00:06:52,940
+back propagation is computing these
+反向传播算法就是在计算
+
+201
+00:06:53,610 --> 00:06:54,840
+delta superscript l
+所有这些δ(i)j项
+
+202
+00:06:55,050 --> 00:06:57,440
+subscript j terms, and we
+并且我们可以
+
+203
+00:06:57,730 --> 00:06:58,520
+can think of these as
+把它们看作是
+
+204
+00:06:58,650 --> 00:07:00,070
+the quote error of the
+这些激励值的
+
+205
+00:07:00,300 --> 00:07:02,460
+activation value that we
+"误差"
+
+206
+00:07:02,620 --> 00:07:03,980
+got for unit j in
+注意这些激励值是
+
+207
+00:07:04,440 --> 00:07:05,750
+the layer, in the
+第 l 层中的
+
+208
+00:07:07,130 --> 00:07:07,400
+lth layer.
+第 j 项
+
+209
+00:07:07,660 --> 00:07:09,070
+More formally, and this is
+更正式一点的说法是
+
+210
+00:07:09,340 --> 00:07:10,280
+maybe only for those of
+也许那些比较熟悉微积分的同学
+
+211
+00:07:10,360 --> 00:07:11,480
+you that are familiar with calculus,
+更能理解
+
+212
+00:07:12,690 --> 00:07:14,080
+more formally, what the delta
+更正式地说
+
+213
+00:07:14,260 --> 00:07:15,820
+terms actually are is this:
+δ 项实际上是
+
+214
+00:07:15,950 --> 00:07:17,810
+they're the partial derivative with respect
+关于 z(l)j 的
+
+215
+00:07:18,240 --> 00:07:20,000
+to z(l)j, that is
+偏微分
+
+216
+00:07:20,150 --> 00:07:21,460
+the weighted sum of inputs that
+也就是 cost 函数
+
+217
+00:07:21,650 --> 00:07:22,700
+we're computing the z terms,
+关于我们计算出的 输入项的加权和
+
+218
+00:07:23,410 --> 00:07:25,760
+partial derivative respect of these things of the cost function.
+也就是 z 项的 偏微分
+
+219
+00:07:27,000 --> 00:07:28,650
+So concretely the cost function
+所以 实际上这个代价函数
+
+220
+00:07:28,900 --> 00:07:30,000
+is a function of the label
+是一个关于标签 y
+
+221
+00:07:30,250 --> 00:07:31,350
+y and of the
+和这个 h(x) 的值也就是
+
+222
+00:07:31,470 --> 00:07:32,680
+value, this h of
+神经网络输出值
+
+223
+00:07:32,780 --> 00:07:35,060
+x output value neural network. And
+的函数
+
+224
+00:07:35,180 --> 00:07:36,430
+if we could go inside the neural network
+如果我们观察该网络内部的话
+
+225
+00:07:37,340 --> 00:07:39,200
+and just change those z(l)j
+把这些 z(l)j 项
+
+226
+00:07:39,860 --> 00:07:41,450
+values a little bit, then
+稍微改一点点
+
+227
+00:07:41,640 --> 00:07:44,250
+that would affect these values that the neural net.
+那就将影响到该神经网络的输出
+
+228
+00:07:44,990 --> 00:07:47,290
+And so that will end up changing the cost function.
+并且最终会改变代价函数的值
+
+229
+00:07:48,340 --> 00:07:50,120
+And again really this
+当然 还是那句话
+
+230
+00:07:50,210 --> 00:07:51,690
+is only for those of you expert in calculus.
+讲这些只是对那些熟悉微积分的同学
+
+231
+00:07:52,960 --> 00:07:55,580
+If you are familiar with comfortable with partial derivatives.
+如果你对偏微分很熟悉的话
+
+232
+00:07:56,540 --> 00:07:57,860
+What these delta terms are,
+你能理解这些δ项是什么
+
+233
+00:07:57,950 --> 00:07:59,270
+is they're, they turn out to
+它们实际上是
+
+234
+00:07:59,370 --> 00:08:00,800
+be the partial derivative of the
+代价函数
+
+235
+00:08:00,870 --> 00:08:04,010
+cos function with respect to these intermediate terms that we're computing.
+关于这些中间项的偏微分
+
+236
+00:08:05,500 --> 00:08:07,250
+And so their measure of
+因此 它们度量着
+
+237
+00:08:07,910 --> 00:08:08,940
+how much would we like to
+我们对神经网络的权值
+
+238
+00:08:09,140 --> 00:08:11,090
+change the neural network's weights in
+做多少的改变
+
+239
+00:08:11,250 --> 00:08:13,620
+order to affect these intermediate values
+对中间的计算量
+
+240
+00:08:14,150 --> 00:08:16,110
+of the computation, so as
+影响是多少
+
+241
+00:08:16,240 --> 00:08:17,430
+to affect the final output the
+进一步地
+
+242
+00:08:17,470 --> 00:08:18,980
+neural network h of x and
+对整个神经网络的输出 h(x) 影响多少
+
+243
+00:08:19,160 --> 00:08:20,770
+therefore affect the overall cost.
+以及对整个的代价值影响多少
+
+244
+00:08:21,510 --> 00:08:22,820
+In case this last part of
+可能刚才讲的
+
+245
+00:08:23,030 --> 00:08:25,290
+this partial derivative intuition, in case
+偏微分的这种理解
+
+246
+00:08:25,530 --> 00:08:26,920
+that didn't make sense, don't worry
+不太容易理解
+
+247
+00:08:27,070 --> 00:08:28,230
+about it, the rest of this
+没关系
+
+248
+00:08:28,390 --> 00:08:29,770
+we can do without really
+不用偏微分的思想
+
+249
+00:08:30,280 --> 00:08:32,400
+talking partial derivatives but let's
+我们同样也可以理解
+
+250
+00:08:32,660 --> 00:08:33,780
+look in more detail at what
+我们再深入一点
+
+251
+00:08:34,100 --> 00:08:36,020
+back propagation is doing.
+研究一下反向传播的过程
+
+252
+00:08:36,250 --> 00:08:37,440
+For the output layer, if first
+对于输出层
+
+253
+00:08:37,890 --> 00:08:39,630
+sets this delta term, we say
+如果我们设置δ项
+
+254
+00:08:39,830 --> 00:08:41,400
+delta 4(1), as y(i)
+比如说 δ(4)1 等于 y(i)
+
+255
+00:08:41,700 --> 00:08:44,430
+if we're doing forward propagation
+假设我们进行第i个训练样本的
+
+256
+00:08:44,890 --> 00:08:48,010
+and back propagation on this
+正向传播
+
+257
+00:08:48,210 --> 00:08:50,180
+training example i. It says it's y(i)
+和反向传播
+
+258
+00:08:51,030 --> 00:08:52,970
+minus a(4)1,
+那么应该等于 y(i) 减去 a(4)1
+
+259
+00:08:53,250 --> 00:08:54,370
+so it's really the error, it's
+因此这实际是两者的偏差
+
+260
+00:08:54,560 --> 00:08:55,680
+the difference between the actual value
+也就是 y 的实际值
+
+261
+00:08:56,000 --> 00:08:57,210
+of y minus what was
+减去预测值
+
+262
+00:08:57,630 --> 00:08:58,020
+the value predicted.
+得到的差值
+
+263
+00:08:58,530 --> 00:09:00,160
+And so we're going to compute delta
+这样 我们就算出了
+
+264
+00:09:00,670 --> 00:09:01,880
+4(1) like so.
+δ(4)1 的值
+
+265
+00:09:03,510 --> 00:09:06,200
+Next we're going to do propagate these values backwards.
+接下来我们要对这些值进行反向传播
+
+266
+00:09:06,910 --> 00:09:07,820
+I explain this in a second
+我稍后将详细解释
+
+267
+00:09:08,510 --> 00:09:10,810
+and end up computing the delta terms of the previous layer.
+计算出前一层的 δ 项的值
+
+268
+00:09:11,350 --> 00:09:12,450
+We're going to end up
+那么这里我们计算出
+
+269
+00:09:12,560 --> 00:09:13,720
+with delta 3(1); delta 3(2);
+δ(3)1 和 δ(3)2
+
+270
+00:09:13,990 --> 00:09:15,210
+and then we're going to
+然后同样的
+
+271
+00:09:15,600 --> 00:09:17,940
+propagate this further
+再进行下一层的反向传播
+
+272
+00:09:18,380 --> 00:09:19,340
+backward and end up
+这一次计算出
+
+273
+00:09:19,470 --> 00:09:21,960
+computing delta 2(1) and
+δ(2)1
+
+274
+00:09:22,690 --> 00:09:23,800
+delta 2(2).
+以及 δ(2)2
+
+275
+00:09:25,190 --> 00:09:27,290
+Now the back propagation calculation
+反向传播的计算
+
+276
+00:09:28,730 --> 00:09:30,050
+is a lot like running the
+和进行前向传播几乎相同
+
+277
+00:09:30,140 --> 00:09:32,870
+forward propagation algorithm, but doing it backwards.
+唯一的区别就是方向相反
+
+278
+00:09:33,260 --> 00:09:33,890
+So here's what I mean.
+我想表达的是
+
+279
+00:09:34,160 --> 00:09:35,300
+Let's look at how we end
+我们来看我们是怎样得到
+
+280
+00:09:35,460 --> 00:09:37,370
+up with this value of Delta 2(2).
+δ(2)2 的值的
+
+281
+00:09:38,060 --> 00:09:39,280
+So we have Delta
+我们要计算 δ(2)2
+
+282
+00:09:39,480 --> 00:09:42,330
+2(2) and similar to
+与前向传播类似
+
+283
+00:09:42,600 --> 00:09:44,760
+forward propagation, let me label a couple of the weights.
+我要对一些权值进行标记
+
+284
+00:09:45,000 --> 00:09:47,620
+So this weight should be one cyan--let's say
+那么这条权值 用桃红色表示的
+
+285
+00:09:47,890 --> 00:09:50,680
+that weight is theta 2
+就是 θ(2)12
+
+286
+00:09:51,190 --> 00:09:54,190
+of 1, 2 and this
+然后这根箭头表示的权值
+
+287
+00:09:54,450 --> 00:09:55,970
+weight down here, let me highlight
+我用红色来标记
+
+288
+00:09:56,280 --> 00:09:57,740
+this in red. That's going to be, let's say,
+它代表的是
+
+289
+00:09:58,030 --> 00:09:59,760
+theta 2 of 2, 2.
+θ(2)22
+
+290
+00:10:01,510 --> 00:10:03,410
+So if we
+所以
+
+291
+00:10:03,500 --> 00:10:05,450
+look at how Delta 2(2)
+我们来看
+
+292
+00:10:05,800 --> 00:10:07,540
+is computed. How it's computed for this note. It turns
+δ(2)2是如何得到的
+
+293
+00:10:08,390 --> 00:10:09,690
+out that what we're
+实际上
+
+294
+00:10:09,800 --> 00:10:10,830
+going to do is
+我们要做的是
+
+295
+00:10:10,970 --> 00:10:12,030
+we're going to take this value and
+我们要用这个 δ 值
+
+296
+00:10:12,350 --> 00:10:14,340
+multiply it by this weight and
+和权值相乘
+
+297
+00:10:14,630 --> 00:10:16,770
+add it to this value
+然后加上
+
+298
+00:10:17,580 --> 00:10:18,660
+multiplied by that weight.
+这个 δ 值乘以权值的结果
+
+299
+00:10:18,930 --> 00:10:19,850
+So it's really a weighted sum
+也就是说 它其实是
+
+300
+00:10:20,800 --> 00:10:22,880
+of the new, these delta values.
+这些δ值的加权和
+
+301
+00:10:23,280 --> 00:10:25,570
+weighted by the corresponding edge strength.
+权值是这些对应边的强度
+
+302
+00:10:25,960 --> 00:10:27,270
+So concretely, let me fill this in.
+让我把这些具体的值写出来
+
+303
+00:10:28,430 --> 00:10:29,550
+This delta 2,2 is going to
+δ(2)2 的值
+
+304
+00:10:30,270 --> 00:10:32,610
+be equal to theta 2(1)2,
+等于桃红色的这条权值
+
+305
+00:10:33,110 --> 00:10:34,660
+which is that magenta weight,
+θ(2)12
+
+306
+00:10:34,980 --> 00:10:38,850
+times delta 3(1) plus, and
+乘以δ(3)1
+
+307
+00:10:38,990 --> 00:10:40,080
+then the thing I have in red, that's
+加上 下一个是用红色标记的权值
+
+308
+00:10:41,230 --> 00:10:43,530
+theta 2(2)2
+θ(2)22
+
+309
+00:10:43,860 --> 00:10:46,230
+times Delta 3(2).
+乘上δ(3)2
+
+310
+00:10:46,700 --> 00:10:48,550
+So it is really, literally this red
+所以 简单地说
+
+311
+00:10:48,800 --> 00:10:51,340
+weight times this value, plus this
+就是红色的权值乘以它指向的值
+
+312
+00:10:51,570 --> 00:10:52,690
+magenta weight times it's value
+加上 桃红色的权值乘以它指向的值
+
+313
+00:10:53,540 --> 00:10:55,820
+and that's how we wind up with that value of delta.
+这样我们就得到了上一层的 δ 值
+
+314
+00:10:56,880 --> 00:10:59,490
+And just as another example, let's look at this value.
+再举一个例子
+
+315
+00:10:59,870 --> 00:11:00,750
+How did we get that value?
+我们来看这个 δ 值
+
+316
+00:11:01,320 --> 00:11:02,660
+Well, it's a similar
+是怎么得到的呢?
+
+317
+00:11:02,890 --> 00:11:04,490
+process, if this weight,
+仍然是类似的过程
+
+318
+00:11:05,530 --> 00:11:07,000
+which I'm going to highlight in
+如果这个权值
+
+319
+00:11:07,100 --> 00:11:08,310
+green, if this weight is
+用绿色表示的这根箭头
+
+320
+00:11:08,440 --> 00:11:09,860
+equal to, say, delta
+假如这个权值
+
+321
+00:11:10,450 --> 00:11:12,990
+3(1)2, then we have
+是θ(3)12
+
+322
+00:11:13,920 --> 00:11:15,360
+that, delta 3(2) is
+那么 δ(3)2
+
+323
+00:11:15,630 --> 00:11:17,010
+going to be equal to
+将等于这条绿色的权值
+
+324
+00:11:17,910 --> 00:11:19,860
+that green weight, theta 3(1)2
+θ(3)12
+
+325
+00:11:20,800 --> 00:11:22,260
+times delta 4(1).
+乘以 δ(4)1
+
+326
+00:11:22,930 --> 00:11:25,520
+And by the
+另外顺便提一下
+
+327
+00:11:25,610 --> 00:11:26,560
+way, so far I've been
+目前为止
+
+328
+00:11:26,670 --> 00:11:28,310
+writing the delta values only
+我写的 δ 值
+
+329
+00:11:28,660 --> 00:11:30,390
+for the hidden units and
+仅仅是隐藏层中的
+
+330
+00:11:30,560 --> 00:11:32,750
+not, but not, excluding the bias units.
+没有包括偏差单元+1
+
+331
+00:11:33,620 --> 00:11:34,610
+Depending on how you define
+包不包括偏差单元取决于你
+
+332
+00:11:35,030 --> 00:11:37,170
+the back propagation algorithm or depending on
+如何定义这个反向传播算法
+
+333
+00:11:37,330 --> 00:11:38,610
+how you implement it, you know,
+或者取决于你怎样实现这个算法
+
+334
+00:11:38,710 --> 00:11:40,510
+you may end up implementing something
+你也可以
+
+335
+00:11:40,850 --> 00:11:42,390
+to compute delta values for
+对这些偏差单元
+
+336
+00:11:42,900 --> 00:11:43,950
+these bias units as well.
+计算 δ 的值
+
+337
+00:11:44,960 --> 00:11:46,230
+The bias unit is always output
+这些偏差单元
+
+338
+00:11:46,620 --> 00:11:47,880
+the values plus one and they
+总是取为+1的值
+
+339
+00:11:47,990 --> 00:11:48,980
+are just what they are and there's
+一直都这么取
+
+340
+00:11:49,220 --> 00:11:50,060
+no way for us to change
+我们不能也没有必要
+
+341
+00:11:50,210 --> 00:11:51,960
+the value and so, depending
+更改偏差单元的值
+
+342
+00:11:52,340 --> 00:11:53,440
+on your implementation of back prop,
+所以还是取决于你实现反向传播的方法
+
+343
+00:11:53,770 --> 00:11:54,960
+the way I usually implement it,
+通常说来 我在执行反向传播的时候
+
+344
+00:11:55,090 --> 00:11:56,180
+I do end up computing these
+我是算出了
+
+345
+00:11:56,340 --> 00:11:57,670
+delta values, but we
+这些偏差单元的δ值
+
+346
+00:11:57,760 --> 00:11:58,900
+just discard them and we
+但我通常忽略掉它们
+
+347
+00:11:58,990 --> 00:12:00,560
+don't use them, because they don't
+而不把它们代入计算
+
+348
+00:12:00,800 --> 00:12:02,130
+end up being part of
+因为它们其实
+
+349
+00:12:02,220 --> 00:12:04,130
+the calculation needed to compute the derivatives.
+并不是计算那些微分的必要部分
+
+350
+00:12:04,990 --> 00:12:06,720
+So, hopefully, that gives
+好了 我希望这节课
+
+351
+00:12:06,990 --> 00:12:08,360
+you a little bit of intuition
+能给你一个 有关反向传播算法的实现过程
+
+352
+00:12:08,750 --> 00:12:10,380
+about what back propagation is doing.
+更深刻的印象
+
+353
+00:12:12,480 --> 00:12:13,290
+In case of all this, they
+我知道可能这些过程
+
+354
+00:12:13,440 --> 00:12:14,670
+still seem so magical and
+还是看起来很神奇
+
+355
+00:12:14,760 --> 00:12:16,090
+so black box, in a
+很“黑箱”
+
+356
+00:12:16,240 --> 00:12:17,560
+later video, in the
+不要紧 在后面的课程中
+
+357
+00:12:17,770 --> 00:12:19,880
+putting it together video, I'll try
+在"putting it together"视频中
+
+358
+00:12:20,150 --> 00:12:22,650
+to give a little more intuition about what that back propagation is doing.
+我还会再介绍一点有关反向传播的内容
+
+359
+00:12:23,250 --> 00:12:24,360
+But, unfortunately, this is, you
+但是很遗憾的是
+
+360
+00:12:24,450 --> 00:12:26,370
+know, a difficult algorithm to
+要想完全看清并且理解这个算法
+
+361
+00:12:26,510 --> 00:12:28,770
+try to visualize and understand what it is really doing.
+的确是很困难的
+
+362
+00:12:29,500 --> 00:12:30,790
+But fortunately, you know,
+但我想
+
+363
+00:12:30,990 --> 00:12:32,280
+often I guess, many people
+幸运的是 多年来
+
+364
+00:12:32,940 --> 00:12:33,930
+have been using it very successfully
+很多人都能顺利地运用
+
+365
+00:12:34,420 --> 00:12:35,640
+for many years and if
+反向传播算法
+
+366
+00:12:35,730 --> 00:12:37,810
+you infer the algorithm, you have
+并且如果你执行一遍整个算法
+
+367
+00:12:37,990 --> 00:12:40,090
+a very effective learning algorithm, even
+你就能掌握这种很强大的机器学习算法
+
+368
+00:12:40,340 --> 00:12:41,400
+though the inner workings of exactly
+尽管它内部的工作原理
+
+369
+00:12:41,900 --> 00:12:43,190
+how it works can be harder to visualize.
+的确显得很难观察
+
diff --git a/srt/9 - 4 - Implementation Note_ Unrolling Parameters (8 min).srt b/srt/9 - 4 - Implementation Note_ Unrolling Parameters (8 min).srt
new file mode 100644
index 00000000..2092be50
--- /dev/null
+++ b/srt/9 - 4 - Implementation Note_ Unrolling Parameters (8 min).srt
@@ -0,0 +1,1101 @@
+1
+00:00:00,250 --> 00:00:01,530
+In the previous video, we talked
+在上一段视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,850 --> 00:00:02,870
+about how to use back propagation
+我们谈到了怎样使用反向传播算法
+
+3
+00:00:03,980 --> 00:00:05,810
+to compute the derivatives of your cost function.
+计算代价函数的导数
+
+4
+00:00:06,780 --> 00:00:07,770
+In this video, I want
+在这段视频中
+
+5
+00:00:08,030 --> 00:00:10,260
+to quickly tell you about one implementational detail of
+我想快速地向你介绍一个细节的实现过程
+
+6
+00:00:11,220 --> 00:00:13,110
+unrolling your parameters from
+怎样把你的参数
+
+7
+00:00:13,670 --> 00:00:15,500
+matrices into vectors, which we
+从矩阵展开成向量
+
+8
+00:00:15,610 --> 00:00:17,870
+need in order to use the advanced optimization routines.
+以便我们在高级最优化步骤中的使用需要
+
+9
+00:00:20,230 --> 00:00:21,470
+Concretely, let's say
+具体来讲
+
+10
+00:00:21,640 --> 00:00:23,120
+you've implemented a cost function
+你执行了代价函数costFunction
+
+11
+00:00:23,660 --> 00:00:24,870
+that takes this input, you know, parameters
+输入参数是theta
+
+12
+00:00:25,420 --> 00:00:28,690
+theta and returns the cost function and returns derivatives.
+函数返回值是代价函数以及导数值
+
+13
+00:00:30,050 --> 00:00:31,260
+Then you can pass this to
+然后你可以将返回值
+
+14
+00:00:31,510 --> 00:00:33,820
+an advanced authorization algorithm by fminunc
+传递给高级最优化算法fminunc
+
+15
+00:00:34,080 --> 00:00:34,790
+and fminunc
+顺便提醒
+
+16
+00:00:34,890 --> 00:00:35,900
+isn't the only one by the way.
+fminunc并不是唯一的算法
+
+17
+00:00:36,060 --> 00:00:38,660
+There are also other advanced authorization algorithms.
+你也可以使用别的优化算法
+
+18
+00:00:39,710 --> 00:00:40,910
+But what all of them
+但它们的功能
+
+19
+00:00:41,030 --> 00:00:41,970
+do is take those input
+都是取出这些输入值
+
+20
+00:00:42,730 --> 00:00:43,560
+pointedly the cost function,
+@costFunction
+
+21
+00:00:44,490 --> 00:00:45,730
+and some initial value of theta.
+以及theta值的一些初始值
+
+22
+00:00:47,010 --> 00:00:48,490
+And both, and these
+并且这些程序
+
+23
+00:00:48,730 --> 00:00:51,600
+routines assume that theta and
+都假设theta
+
+24
+00:00:51,740 --> 00:00:53,360
+the initial value of theta, that
+和这些theta初始值
+
+25
+00:00:53,580 --> 00:00:55,410
+these are parameter vectors, maybe
+都是参数向量
+
+26
+00:00:55,640 --> 00:00:57,040
+Rn or Rn plus 1.
+也许是n或者n+1阶
+
+27
+00:00:57,870 --> 00:01:00,440
+But these are vectors and it
+但它们都是向量
+
+28
+00:01:00,530 --> 00:01:01,880
+also assumes that, you know, your cost
+同时假设这个代价函数
+
+29
+00:01:02,150 --> 00:01:03,770
+function will return as
+第二个返回值
+
+30
+00:01:03,960 --> 00:01:05,640
+a second return value this
+也就是gradient值
+
+31
+00:01:05,830 --> 00:01:07,410
+gradient which is also Rn
+也是n阶或者n+1阶
+
+32
+00:01:07,640 --> 00:01:09,860
+and Rn plus 1. So also a vector.
+所以它也是一个向量
+
+33
+00:01:10,840 --> 00:01:11,890
+This worked fine when we
+这部分在我们使用逻辑回归的时候
+
+34
+00:01:12,040 --> 00:01:14,030
+were using logistic progression but
+运行顺利
+
+35
+00:01:14,220 --> 00:01:15,120
+now that we're using a neural
+但现在 对于神经网络
+
+36
+00:01:15,280 --> 00:01:17,160
+network our parameters are
+我们的参数将不再是
+
+37
+00:01:17,220 --> 00:01:18,370
+no longer vectors, but instead
+向量
+
+38
+00:01:18,980 --> 00:01:21,110
+they are these matrices where for
+而是矩阵了
+
+39
+00:01:21,310 --> 00:01:22,670
+a full neural network we would
+因此对于一个完整的神经网络
+
+40
+00:01:22,830 --> 00:01:26,050
+have parameter matrices theta 1, theta 2, theta 3
+我们的参数矩阵为θ(1) θ(2) θ(3)
+
+41
+00:01:26,700 --> 00:01:28,080
+that we might represent in Octave
+在Octave中我们可以设为
+
+42
+00:01:28,680 --> 00:01:30,660
+as these matrices theta 1, theta 2, theta 3.
+Theta1 Theta2 Theta3
+
+43
+00:01:31,450 --> 00:01:33,160
+And similarly these gradient
+类似的 这些梯度项gradient
+
+44
+00:01:33,760 --> 00:01:35,030
+terms that were expected to return.
+也是需要得到的返回值
+
+45
+00:01:35,720 --> 00:01:36,890
+Well, in the previous video we
+那么在之前的视频中
+
+46
+00:01:36,980 --> 00:01:38,430
+showed how to compute these
+我们演示了如何计算
+
+47
+00:01:38,840 --> 00:01:40,520
+gradient matrices, which was
+这些梯度矩阵
+
+48
+00:01:40,980 --> 00:01:42,290
+capital D1, capital D2,
+它们是D(1) D(2) D(3)
+
+49
+00:01:42,560 --> 00:01:43,950
+capital D3, which we
+在Octave中
+
+50
+00:01:44,080 --> 00:01:46,130
+might represent an octave as matrices D1, D2, D3.
+我们用矩阵D1 D2 D3来表示
+
+51
+00:01:48,080 --> 00:01:49,150
+In this video I want
+在这节视频中
+
+52
+00:01:49,480 --> 00:01:50,420
+to quickly tell you about the
+我想很快地向你介绍
+
+53
+00:01:50,510 --> 00:01:51,480
+idea of how to take
+怎样取出这些矩阵
+
+54
+00:01:51,980 --> 00:01:54,060
+these matrices and unroll them into vectors.
+并且将它们展开成向量
+
+55
+00:01:54,590 --> 00:01:55,750
+So that they end up
+以便它们最终
+
+56
+00:01:55,910 --> 00:01:57,790
+being in a format suitable for
+成为恰当的格式
+
+57
+00:01:57,930 --> 00:02:00,090
+passing into as theta here off for getting
+能够传入这里的Theta
+
+58
+00:02:00,460 --> 00:02:01,850
+out for a gradient there.
+并且得到正确的梯度返回值gradient
+
+59
+00:02:03,220 --> 00:02:04,540
+Concretely, let's say we
+具体来说
+
+60
+00:02:04,670 --> 00:02:06,740
+have a neural network with one
+假设我们有这样一个神经网络
+
+61
+00:02:06,950 --> 00:02:08,250
+input layer with ten units,
+其输入层有10个输入单元
+
+62
+00:02:09,010 --> 00:02:10,000
+hidden layer with ten units
+隐藏层有10个单元
+
+63
+00:02:10,540 --> 00:02:11,870
+and one output layer with
+最后的输出层
+
+64
+00:02:12,020 --> 00:02:13,090
+just one unit, so s1
+只有一个输出单元
+
+65
+00:02:13,270 --> 00:02:14,030
+is the number of units in layer one
+因此s1等于第一层的单元数
+
+66
+00:02:14,440 --> 00:02:15,710
+and s2 is the
+s2等于第二层的单元数
+
+67
+00:02:15,860 --> 00:02:18,220
+number of units in layer two, and s3 is a number
+s3等于第三层的
+
+68
+00:02:18,520 --> 00:02:20,700
+of units in layer three.
+单元个数
+
+69
+00:02:21,560 --> 00:02:23,200
+In this case, the dimension of
+在这种情况下
+
+70
+00:02:23,460 --> 00:02:25,240
+your matrices theta and
+矩阵θ的维度
+
+71
+00:02:25,350 --> 00:02:26,380
+D are going to be
+和矩阵D的维度
+
+72
+00:02:26,570 --> 00:02:28,110
+given by these expressions.
+将由这些表达式确定
+
+73
+00:02:28,520 --> 00:02:30,300
+For example, theta one
+比如说
+
+74
+00:02:30,630 --> 00:02:33,220
+is going to a 10 by 11 matrix and so on.
+θ(1)是一个10x11的矩阵 以此类推
+
+75
+00:02:34,420 --> 00:02:35,740
+So in if you want
+因此 在Octave中
+
+76
+00:02:35,950 --> 00:02:37,960
+to convert between these matrices.
+如果你想将这些矩阵
+
+77
+00:02:38,580 --> 00:02:38,580
+vectors.
+转化为向量
+
+78
+00:02:39,330 --> 00:02:40,590
+What you can do is take
+那么你要做的
+
+79
+00:02:40,830 --> 00:02:42,130
+your theta 1, theta
+是取出你的Theta1 Theta2
+
+80
+00:02:42,350 --> 00:02:44,220
+2, theta 3, and write this
+Theta3
+
+81
+00:02:44,410 --> 00:02:45,470
+piece of code and this will
+然后使用这段代码
+
+82
+00:02:45,610 --> 00:02:46,820
+take all the elements of
+这段代码将取出
+
+83
+00:02:46,900 --> 00:02:48,540
+your three theta matrices and
+三个θ矩阵中的所有元素
+
+84
+00:02:48,770 --> 00:02:49,400
+take all the elements
+也就是说取出Theta1
+
+85
+00:02:49,860 --> 00:02:51,150
+of theta one, all the
+的所有元素
+
+86
+00:02:51,260 --> 00:02:52,290
+elements of theta 2, all the
+Theta2的所有元素
+
+87
+00:02:52,400 --> 00:02:53,840
+elements of theta 3,
+Theta3的所有元素
+
+88
+00:02:54,130 --> 00:02:55,510
+and unroll them and put
+然后把它们全部展开
+
+89
+00:02:55,770 --> 00:02:57,420
+all the elements into a big long vector.
+成为一个很长的向量
+
+90
+00:02:58,540 --> 00:02:59,880
+Which is thetaVec and similarly
+也就是thetaVec
+
+91
+00:03:00,960 --> 00:03:02,510
+the second command would take
+同样的 第二段代码
+
+92
+00:03:02,830 --> 00:03:04,350
+all of your D matrices and
+将取出D矩阵的所有元素
+
+93
+00:03:04,490 --> 00:03:05,600
+unroll them into a big
+然后展开
+
+94
+00:03:05,930 --> 00:03:07,340
+long vector and call them
+成为一个长向量
+
+95
+00:03:07,510 --> 00:03:08,810
+DVec. And finally
+被叫做DVec
+
+96
+00:03:09,370 --> 00:03:10,330
+if you want to go back from
+最后 如果你想从向量表达
+
+97
+00:03:10,520 --> 00:03:13,380
+the vector representations to the matrix representations.
+返回到矩阵表达式的话
+
+98
+00:03:14,620 --> 00:03:15,630
+What you do to get back
+你要做的是
+
+99
+00:03:15,840 --> 00:03:17,720
+to theta one say is take
+比如想再得到Theta1
+
+100
+00:03:17,940 --> 00:03:19,250
+thetaVec and pull
+那么取thetaVec
+
+101
+00:03:19,530 --> 00:03:20,980
+out the first 110 elements.
+抽出前110个元素
+
+102
+00:03:21,470 --> 00:03:22,930
+So theta 1 has 110
+因此
+
+103
+00:03:23,390 --> 00:03:24,650
+elements because it's a
+Theta1就有110个元素
+
+104
+00:03:24,720 --> 00:03:26,420
+10 by 11 matrix so that
+因为它应该是一个10x11的矩阵
+
+105
+00:03:26,810 --> 00:03:28,200
+pulls out the first 110 elements
+所以 抽出前110个元素
+
+106
+00:03:28,540 --> 00:03:30,200
+and then you can
+然后你就能使用
+
+107
+00:03:30,370 --> 00:03:32,960
+use the reshape command to reshape those back into theta 1.
+reshape矩阵变维命令来重新得到Theta1
+
+108
+00:03:33,010 --> 00:03:34,730
+And similarly, to get
+同样类似的
+
+109
+00:03:34,900 --> 00:03:35,850
+back theta 2 you pull
+要重新得到Theta2矩阵
+
+110
+00:03:36,280 --> 00:03:39,010
+out the next 110 elements and reshape it.
+你需要抽出下一组110个元素并且重新组合
+
+111
+00:03:39,670 --> 00:03:41,410
+And for theta 3, you pull out
+然后对于Theta3
+
+112
+00:03:41,450 --> 00:03:43,320
+the final eleven elements and run
+你需要抽出最后11个元素
+
+113
+00:03:43,500 --> 00:03:45,210
+reshape to get back the theta 3.
+然后执行reshape命令 重新得到Theta3
+
+114
+00:03:48,840 --> 00:03:50,700
+Here's a quick Octave demo of that process.
+以下是这一过程的Octave演示
+
+115
+00:03:51,270 --> 00:03:52,370
+So for this example
+对于这一个例子
+
+116
+00:03:53,010 --> 00:03:54,530
+let's set theta 1 equal
+让我们假设Theta1
+
+117
+00:03:55,340 --> 00:03:57,440
+to be ones of 10 by
+为一个10x11的单位矩阵
+
+118
+00:03:57,670 --> 00:03:59,580
+11, so it's a matrix of all ones. And
+因此它每一项都为1
+
+119
+00:04:00,360 --> 00:04:01,400
+just to make this easier seen,
+为了更易看清
+
+120
+00:04:01,750 --> 00:04:03,060
+let's set that to be 2
+让我们把Theta2设为
+
+121
+00:04:03,280 --> 00:04:05,150
+times ones, 10 by
+一个10行11列矩阵
+
+122
+00:04:05,310 --> 00:04:07,390
+11 and let's
+每个元素都为2
+
+123
+00:04:07,600 --> 00:04:09,570
+set theta 3 equals 3
+然后设Theta3 是一个1x11的矩阵
+
+124
+00:04:10,290 --> 00:04:12,110
+times 1's of 1 by 11.
+每个元素都为3
+
+125
+00:04:12,390 --> 00:04:13,680
+So this is 3
+因此 这样我们得到三个独立的矩阵
+
+126
+00:04:13,980 --> 00:04:17,030
+separate matrices: theta 1, theta 2, theta 3.
+Theta1 Theta2 Theta3
+
+127
+00:04:17,770 --> 00:04:19,010
+We want to put all of these as a vector.
+现在我们想把所有这些矩阵变成一个向量
+
+128
+00:04:19,670 --> 00:04:22,740
+ThetaVec equals theta
+thetaVec =
+
+129
+00:04:23,380 --> 00:04:26,660
+1; theta 2
+[Theta1(:); Theta2(:); Theta3(:)];
+
+130
+00:04:28,540 --> 00:04:28,990
+theta 3.
+好的
+
+131
+00:04:29,260 --> 00:04:32,060
+Right, that's a colon
+注意中间有冒号
+
+132
+00:04:32,540 --> 00:04:34,220
+in the middle and like so
+像这样
+
+133
+00:04:35,350 --> 00:04:37,420
+and now thetavec is
+现在thetaVec矩阵
+
+134
+00:04:37,590 --> 00:04:40,090
+going to be a very long vector.
+就变成了一个很长的向量
+
+135
+00:04:41,050 --> 00:04:41,910
+That's 231 elements.
+含有231个元素
+
+136
+00:04:42,970 --> 00:04:46,000
+If I display it, I find
+如果把它打出来
+
+137
+00:04:46,290 --> 00:04:47,640
+that this very long vector with
+我们就能看出它是一个很长的向量
+
+138
+00:04:47,780 --> 00:04:48,610
+all the elements of the first
+包括第一个矩阵的所有元素
+
+139
+00:04:48,880 --> 00:04:49,630
+matrix, all the elements of
+第二个矩阵的所有元素
+
+140
+00:04:50,090 --> 00:04:52,360
+the second matrix, then all the elements of the third matrix.
+以及第三个矩阵的所有元素
+
+141
+00:04:53,480 --> 00:04:54,450
+And if I want to get back
+如果我想重新得到
+
+142
+00:04:54,930 --> 00:04:56,420
+my original matrices, I can
+我最初的三个矩阵
+
+143
+00:04:56,500 --> 00:05:00,040
+do reshape thetaVec.
+我可以对thetaVec使用reshape命令
+
+144
+00:05:01,400 --> 00:05:02,580
+Let's pull out the first 110
+抽出前110个元素
+
+145
+00:05:03,100 --> 00:05:05,640
+elements and reshape them to a 10 by 11 matrix.
+将它们重组为一个10x11的矩阵
+
+146
+00:05:06,810 --> 00:05:08,240
+This gives me back theta 1.
+这样我又再次得到了Theta1矩阵
+
+147
+00:05:08,690 --> 00:05:09,770
+And if I then pull
+然后我再取出
+
+148
+00:05:10,280 --> 00:05:12,220
+out the next 110 elements.
+接下来的110个元素
+
+149
+00:05:12,720 --> 00:05:14,690
+So that's indices 111 to 220.
+也就是111到220号元素
+
+150
+00:05:14,850 --> 00:05:16,470
+I get back all of my 2's.
+我就又重组还原了第二个矩阵
+
+151
+00:05:18,030 --> 00:05:19,330
+And if I go
+最后
+
+152
+00:05:20,850 --> 00:05:22,110
+from 221 up to
+再抽出221到最后一个元素
+
+153
+00:05:22,280 --> 00:05:24,240
+the last element, which is
+也就是第231个元素
+
+154
+00:05:24,440 --> 00:05:25,970
+element 231, and reshape to
+然后重组为1x11的矩阵
+
+155
+00:05:26,070 --> 00:05:28,130
+1 by 11, I get back theta 3.
+我就又得到了Theta3矩阵
+
+156
+00:05:30,810 --> 00:05:32,110
+To make this process really concrete,
+为了使这个过程更形象
+
+157
+00:05:32,950 --> 00:05:34,750
+here's how we use the unrolling
+下面我们来看怎样将这一方法
+
+158
+00:05:35,320 --> 00:05:36,990
+idea to implement our learning algorithm.
+应用于我们的学习算法
+
+159
+00:05:38,200 --> 00:05:39,180
+Let's say that you have some
+假设说你有一些
+
+160
+00:05:39,490 --> 00:05:40,600
+initial value of the parameters
+初始参数值
+
+161
+00:05:41,170 --> 00:05:42,410
+theta 1, theta 2, theta 3.
+θ(1) θ(2) θ(3)
+
+162
+00:05:42,950 --> 00:05:43,740
+What we're going to do
+我们要做的是
+
+163
+00:05:44,020 --> 00:05:45,880
+is take these and unroll
+取出这些参数并且将它们
+
+164
+00:05:46,290 --> 00:05:47,610
+them into a long vector
+展开为一个长向量
+
+165
+00:05:47,960 --> 00:05:50,380
+we're gonna call initial theta to
+我们称之为initialTheta
+
+166
+00:05:50,600 --> 00:05:52,170
+pass in to fminunc
+然后作为theta参数的初始设置
+
+167
+00:05:52,360 --> 00:05:54,900
+as this initial setting of the parameters theta.
+传入函数fminunc
+
+168
+00:05:56,160 --> 00:05:58,310
+The other thing we need to do is implement the cost function.
+我们要做的另一件事是执行代价函数costFunction
+
+169
+00:05:59,310 --> 00:06:01,510
+Here's my implementation of the cost function.
+实现算法如下
+
+170
+00:06:02,900 --> 00:06:04,070
+The cost function is going to
+代价函数costFunction
+
+171
+00:06:04,160 --> 00:06:05,500
+give us input, thetaVec,
+将传入参数thetaVec
+
+172
+00:06:05,980 --> 00:06:07,090
+which is going to be all
+这也是包含
+
+173
+00:06:07,350 --> 00:06:08,770
+of my parameters vectors that in
+我所有参数的向量
+
+174
+00:06:08,870 --> 00:06:10,680
+the form that's been unrolled into a vector.
+是将所有的参数展开成一个向量的形式
+
+175
+00:06:11,960 --> 00:06:12,800
+So the first thing I'm going to
+因此我要做的第一件事是
+
+176
+00:06:13,000 --> 00:06:13,890
+do is I'm going to use
+我要使用
+
+177
+00:06:14,100 --> 00:06:16,580
+thetaVec and I'm going to use the reshape functions.
+thetaVec和重组函数reshape
+
+178
+00:06:17,040 --> 00:06:18,120
+So I'll pull out elements from
+因此我要抽出thetaVec中的元素
+
+179
+00:06:18,320 --> 00:06:19,440
+thetaVec and use reshape
+然后重组
+
+180
+00:06:19,750 --> 00:06:20,950
+to get back my
+以得到我的初始参数矩阵
+
+181
+00:06:21,320 --> 00:06:23,560
+original parameter matrices, theta 1, theta 2, theta 3.
+θ(1) θ(2) θ(3)
+
+182
+00:06:24,120 --> 00:06:26,530
+So these are going to be matrices that I'm going to get.
+所以这些是我需要得到的矩阵
+
+183
+00:06:26,620 --> 00:06:28,000
+So that gives me a
+因此 这样我就有了
+
+184
+00:06:28,060 --> 00:06:29,920
+more convenient form in which
+一个使用这些矩阵的
+
+185
+00:06:30,130 --> 00:06:31,580
+to use these matrices so that I
+更方便的形式
+
+186
+00:06:31,750 --> 00:06:33,590
+can run forward propagation and
+这样我就能执行前向传播
+
+187
+00:06:33,880 --> 00:06:35,400
+back propagation to compute my
+和反向传播
+
+188
+00:06:35,570 --> 00:06:38,140
+derivatives, and to compute my cost function j of theta.
+来计算出导数 以求得代价函数的J(θ)
+
+189
+00:06:39,710 --> 00:06:40,900
+And finally, I can then
+最后
+
+190
+00:06:41,120 --> 00:06:42,620
+take my derivatives and unroll
+我可以取出这些导数值 然后展开它们
+
+191
+00:06:43,030 --> 00:06:44,530
+them, to keeping the elements
+让它们保持和我展开的θ值
+
+192
+00:06:45,140 --> 00:06:47,440
+in the same ordering as I did when I unroll my thetas.
+同样的顺序
+
+193
+00:06:48,390 --> 00:06:49,780
+But I'm gonna unroll D1, D2,
+我要展开D1 D2 D3
+
+194
+00:06:50,030 --> 00:06:51,330
+D3, to get gradientVec
+来得到gradientVec
+
+195
+00:06:52,190 --> 00:06:55,180
+which is now what my cost function can return.
+这个值可由我的代价函数返回
+
+196
+00:06:55,490 --> 00:06:57,420
+It can return a vector of these derivatives.
+它可以以一个向量的形式返回这些导数值
+
+197
+00:06:59,150 --> 00:07:00,310
+So, hopefully, you now have
+好了 希望你现在
+
+198
+00:07:00,490 --> 00:07:01,650
+a good sense of how to
+对怎样进行参数的矩阵表达式
+
+199
+00:07:01,890 --> 00:07:03,200
+convert back and forth between
+和向量表达式
+
+200
+00:07:03,360 --> 00:07:04,970
+the matrix representation of the
+之间的转换
+
+201
+00:07:05,090 --> 00:07:08,220
+parameters versus the vector representation of the parameters.
+有了一个更清晰的认识
+
+202
+00:07:09,360 --> 00:07:10,290
+The advantage of the matrix
+使用矩阵表达式
+
+203
+00:07:10,760 --> 00:07:12,330
+representation is that when
+的好处是
+
+204
+00:07:12,470 --> 00:07:13,530
+your parameters are stored as
+当你的参数以矩阵的形式储存时
+
+205
+00:07:13,670 --> 00:07:15,670
+matrices it's more convenient when
+你在进行正向传播
+
+206
+00:07:15,830 --> 00:07:17,430
+you're doing forward propagation and
+和反向传播时
+
+207
+00:07:17,530 --> 00:07:19,110
+back propagation and it's easier
+你会觉得更加方便
+
+208
+00:07:19,850 --> 00:07:21,160
+when your parameters are stored as
+当你将参数储存为矩阵时
+
+209
+00:07:21,360 --> 00:07:22,770
+matrices to take advantage
+一大好处是
+
+210
+00:07:23,400 --> 00:07:24,780
+of the, sort of, vectorized implementations.
+充分利用了向量化的实现过程
+
+211
+00:07:26,230 --> 00:07:27,900
+Whereas in contrast the advantage of
+相反地
+
+212
+00:07:28,090 --> 00:07:30,250
+the vector representation, when you
+向量表达式的优点是
+
+213
+00:07:30,320 --> 00:07:31,820
+have like thetaVec or DVec is that
+如果你有像thetaVec或者DVec这样的矩阵
+
+214
+00:07:32,500 --> 00:07:34,540
+when you are using the advanced optimization algorithms.
+当你使用一些高级的优化算法时
+
+215
+00:07:34,770 --> 00:07:36,640
+Those algorithms tend to
+这些算法通常要求
+
+216
+00:07:36,760 --> 00:07:37,730
+assume that you have
+你所有的参数
+
+217
+00:07:38,090 --> 00:07:40,730
+all of your parameters unrolled into a big long vector.
+都要展开成一个长向量的形式
+
+218
+00:07:41,720 --> 00:07:42,930
+And so with what we just
+希望通过我们刚才介绍的内容
+
+219
+00:07:43,140 --> 00:07:44,650
+went through, hopefully you can now quickly
+你能够根据需要 更加轻松地
+
+220
+00:07:45,410 --> 00:07:47,020
+convert between the two as needed.
+在两种形式之间转换
+
diff --git a/srt/9 - 5 - Gradient Checking (12 min).srt b/srt/9 - 5 - Gradient Checking (12 min).srt
new file mode 100644
index 00000000..66a256c8
--- /dev/null
+++ b/srt/9 - 5 - Gradient Checking (12 min).srt
@@ -0,0 +1,1655 @@
+1
+00:00:00,290 --> 00:00:01,510
+In the last few videos, we talked
+在之前的视频中 我们讨论了
+
+2
+00:00:01,840 --> 00:00:02,770
+about how to do forward-propagation
+如何使用前向传播
+
+3
+00:00:03,570 --> 00:00:05,200
+and back-propagation in a
+和反向传播
+
+4
+00:00:05,250 --> 00:00:07,560
+neural network in order to compute derivatives.
+计算神经网络中的导数
+
+5
+00:00:08,800 --> 00:00:10,070
+But back prop as an algorithm
+但反向传播作为一个
+
+6
+00:00:10,580 --> 00:00:11,910
+has a lot of details and,
+有很多细节的算法
+
+7
+00:00:12,170 --> 00:00:12,920
+you know, can be a little
+在实现的时候
+
+8
+00:00:13,050 --> 00:00:14,930
+bit tricky to implement.
+会有点复杂
+
+9
+00:00:15,700 --> 00:00:17,480
+And one unfortunate property is
+而且有一个不好的方面是
+
+10
+00:00:17,750 --> 00:00:18,690
+that there are many
+在实现反向传播时
+
+11
+00:00:18,780 --> 00:00:20,080
+ways to have subtle bugs in back
+会遇到很多细小的错误
+
+12
+00:00:20,320 --> 00:00:22,000
+prop so that if
+所以如果你把它和梯度下降法
+
+13
+00:00:22,140 --> 00:00:23,130
+you run it with gradient descent
+或者其他优化算法一起运行时
+
+14
+00:00:23,480 --> 00:00:26,590
+or some other optimization algorithm, it could actually look like it's working.
+可能看起来它运行正常
+
+15
+00:00:27,240 --> 00:00:28,480
+And, you know, your cost function J
+并且 你的代价函数J
+
+16
+00:00:28,700 --> 00:00:29,930
+of theta may end up
+最后可能
+
+17
+00:00:30,090 --> 00:00:31,240
+decreasing on every iteration
+在每次梯度下降法迭代时
+
+18
+00:00:31,830 --> 00:00:33,660
+of gradient descent, but this
+都会减小
+
+19
+00:00:33,830 --> 00:00:35,180
+could pull through even though
+即使在实现反向传播时有一些小错误
+
+20
+00:00:35,440 --> 00:00:37,690
+there might be some bug in your implementation of back prop.
+可能也会检查不出来
+
+21
+00:00:38,400 --> 00:00:39,280
+So it looks like J of
+所以它看起来是
+
+22
+00:00:39,360 --> 00:00:40,830
+theta is decreasing, but you
+J(θ)在减小
+
+23
+00:00:40,920 --> 00:00:42,230
+might just wind up with
+但是可能你最后得到的神经网络
+
+24
+00:00:42,410 --> 00:00:43,760
+a neural network that
+但是可能你最后得到的
+
+25
+00:00:43,880 --> 00:00:44,970
+has a higher level of error
+神经网络误差
+
+26
+00:00:45,490 --> 00:00:46,540
+than you would with a bug-free
+比没有错误的要高
+
+27
+00:00:46,780 --> 00:00:48,130
+implementation and you might
+而且你很可能
+
+28
+00:00:48,330 --> 00:00:49,330
+just not know that there
+就是不知道
+
+29
+00:00:49,460 --> 00:00:50,470
+was this subtle bug that's giving
+你的结果是这些
+
+30
+00:00:50,530 --> 00:00:52,260
+you this performance.
+小错误导致的
+
+31
+00:00:52,950 --> 00:00:53,320
+So what can we do about this?
+那你应该怎么办呢
+
+32
+00:00:54,160 --> 00:00:55,940
+There's an idea called gradient checking
+有一个想法叫梯度检验 (Gradient Checking)
+
+33
+00:00:56,790 --> 00:00:58,720
+that eliminates almost all of these problems.
+可以解决基本所有的问题
+
+34
+00:00:59,250 --> 00:01:00,550
+So today, every time I
+我现在每次实现
+
+35
+00:01:00,770 --> 00:01:02,150
+implement back propagation or a
+神经网络的反向传播
+
+36
+00:01:02,370 --> 00:01:03,320
+similar gradient descent algorithm on
+或者类似的
+
+37
+00:01:03,450 --> 00:01:04,950
+the neural network or any other
+梯度下降算法
+
+38
+00:01:05,640 --> 00:01:07,310
+reasonably complex model, I
+或者其他比较复杂的模型
+
+39
+00:01:07,540 --> 00:01:08,840
+always implement gradient checking.
+我都会使用梯度检验
+
+40
+00:01:09,650 --> 00:01:10,610
+And if you do this it will
+如果你这么做
+
+41
+00:01:10,730 --> 00:01:12,010
+help you make sure and
+它会帮你确定
+
+42
+00:01:12,140 --> 00:01:13,410
+sort of gain high confidence that
+并且能很确信
+
+43
+00:01:13,540 --> 00:01:14,940
+your implementation of forward prop
+你实现的前向传播和反向传播
+
+44
+00:01:15,370 --> 00:01:17,430
+and back prop or whatever, is 100% correct.
+或者其他的什么 是100%正确的
+
+45
+00:01:18,240 --> 00:01:19,090
+And in what I've seen
+我见过很多
+
+46
+00:01:19,330 --> 00:01:20,880
+this pretty much all the
+这样解决那些
+
+47
+00:01:21,160 --> 00:01:23,090
+problems associated with sort of
+实现时容易有
+
+48
+00:01:23,420 --> 00:01:25,790
+a buggy implementation of the background.
+有小错误的问题
+
+49
+00:01:26,330 --> 00:01:27,470
+And in the previous videos,
+在之前的视频中
+
+50
+00:01:28,170 --> 00:01:29,120
+I sort of ask you to take on
+我一般是让你相信
+
+51
+00:01:29,390 --> 00:01:30,950
+faith that the formulas I
+我给出的计算
+
+52
+00:01:31,170 --> 00:01:33,000
+gave for computing the deltas, and the
+δ,d项
+
+53
+00:01:33,110 --> 00:01:34,220
+D's, and so on, I ask
+等等之类的公式
+
+54
+00:01:34,260 --> 00:01:35,480
+you to take on faith that those
+我要求你们相信
+
+55
+00:01:36,330 --> 00:01:37,600
+actually do compute the gradients
+他们计算的就是
+
+56
+00:01:38,180 --> 00:01:39,790
+of the cost function, but once
+代价函数的梯度
+
+57
+00:01:40,150 --> 00:01:41,740
+you implement numerical gradient checking,
+但一旦你们实现数值梯度检验
+
+58
+00:01:42,130 --> 00:01:43,210
+which is the topic of this video,
+也就是这节视频的主题
+
+59
+00:01:43,800 --> 00:01:45,250
+you'll be able to verify for
+你就能够自己验证
+
+60
+00:01:45,350 --> 00:01:46,490
+yourself that the code you're
+你写的代码
+
+61
+00:01:46,610 --> 00:01:48,530
+writing is indeed computing
+确实是在计算
+
+62
+00:01:49,600 --> 00:01:50,520
+the derivative of the cost
+代价函数J的导数
+
+63
+00:01:50,820 --> 00:01:53,060
+function J. So here's the idea.
+想法是这样的
+
+64
+00:01:53,550 --> 00:01:54,520
+Consider the following example.
+考虑下面这个例子
+
+65
+00:01:55,450 --> 00:01:56,230
+Suppose I have the function
+假如我有一个
+
+66
+00:01:56,710 --> 00:01:58,140
+J of theta, and I
+函数J(θ)
+
+67
+00:01:58,250 --> 00:02:01,320
+have some value, theta, and
+并且我有个值 θ
+
+68
+00:02:01,610 --> 00:02:04,380
+for this example, I'm going to assume that theta is just a real number.
+在这个例子中 我假定θ只是一个实数
+
+69
+00:02:05,470 --> 00:02:08,210
+And let's say I want to estimate the derivative of this function at this point.
+假如说我想估计这个函数在这一点的导数
+
+70
+00:02:08,710 --> 00:02:10,220
+And so the derivative is, you know, equal
+这个导数等于
+
+71
+00:02:10,750 --> 00:02:13,190
+to the slope of that sort of tangent line.
+这条切线的斜率
+
+72
+00:02:14,270 --> 00:02:15,420
+Here's how I'm going to numerically
+下面我要用数值方法
+
+73
+00:02:16,180 --> 00:02:17,840
+approximate the derivative, or
+来计算近似的导数
+
+74
+00:02:17,970 --> 00:02:19,190
+rather here's a procedure for numerically
+这个是用数值方法
+
+75
+00:02:19,780 --> 00:02:21,480
+approximating the derivative: I'm
+计算近似导数的过程
+
+76
+00:02:21,800 --> 00:02:23,520
+going to compute theta plus epsilon,
+我要计算θ+ε
+
+77
+00:02:24,000 --> 00:02:25,550
+so value a little bit to the right.
+这个值在右边一点
+
+78
+00:02:26,340 --> 00:02:27,900
+And we are going to compute theta minus epsilon.
+然后计算θ-ε
+
+79
+00:02:28,410 --> 00:02:30,800
+And I'm going to look
+然后看这两个点
+
+80
+00:02:30,950 --> 00:02:34,360
+at those two points and connect
+用一条直线
+
+81
+00:02:34,840 --> 00:02:35,860
+them by a straight line.
+把它们连起来
+
+82
+00:02:43,160 --> 00:02:44,280
+And I'm going to connect
+我要把这两个点
+
+83
+00:02:44,480 --> 00:02:45,490
+these two points by a straight
+用一条直线连起来
+
+84
+00:02:45,680 --> 00:02:46,430
+line and I'm going to
+然后用这条
+
+85
+00:02:46,480 --> 00:02:47,740
+use the slope of that
+红色线的斜率
+
+86
+00:02:48,000 --> 00:02:49,200
+little red line as my
+来作为我
+
+87
+00:02:49,390 --> 00:02:50,940
+approximation to the derivative,
+导数的近似值
+
+88
+00:02:51,460 --> 00:02:53,110
+which is the true derivative is
+真正的导数是这边这条
+
+89
+00:02:53,280 --> 00:02:54,740
+the slope of the blue line over there.
+蓝色线的斜率
+
+90
+00:02:55,260 --> 00:02:56,660
+So, you know, it seems like it would be a pretty good approximation.
+这看起来是个不错的近似
+
+91
+00:02:58,220 --> 00:02:59,450
+Mathematically, the slope of this
+在数学上 这条红线的斜率等于
+
+92
+00:02:59,670 --> 00:03:01,340
+red line is this vertical
+这个垂直的高度
+
+93
+00:03:01,890 --> 00:03:03,680
+height, divided by this
+除以这个
+
+94
+00:03:03,890 --> 00:03:05,580
+horizontal width, so this
+这个水平的宽度
+
+95
+00:03:05,840 --> 00:03:07,500
+point on top is J of
+所以上面这点
+
+96
+00:03:08,920 --> 00:03:10,840
+theta plus epsilon. This point
+是J(θ+ε)
+
+97
+00:03:11,140 --> 00:03:13,020
+here is J of theta minus epsilon.
+这点是J(Θ-ε)
+
+98
+00:03:13,830 --> 00:03:15,450
+So this vertical difference is j
+垂直方向上的
+
+99
+00:03:15,670 --> 00:03:17,530
+of theta plus epsilon, minus J
+差是J(θ+ε)-J(θ+ε)
+
+100
+00:03:17,810 --> 00:03:18,810
+of theta, minus epsilon, and
+也就是说
+
+101
+00:03:19,700 --> 00:03:21,730
+this horizontal distance is just 2 epsilon.
+水平的距离就是2ε
+
+102
+00:03:23,620 --> 00:03:25,340
+So, my approximation is going
+那么 我的近似是这样的
+
+103
+00:03:25,410 --> 00:03:27,280
+to be that the derivative,
+J(θ)
+
+104
+00:03:29,110 --> 00:03:30,160
+with respect to theta of J of
+对θ的导数
+
+105
+00:03:30,490 --> 00:03:32,170
+theta--add this value of
+近似等于
+
+106
+00:03:32,320 --> 00:03:34,950
+theta--that that's approximately J
+J(θ+ε)-J(θ-ε)
+
+107
+00:03:35,150 --> 00:03:36,860
+of theta plus epsilon, minus
+除以2ε
+
+108
+00:03:37,460 --> 00:03:40,600
+J of theta, minus epsilon, over 2 epsilon.
+近似于J(θ+ε)-J(θ-ε) 除以2ε
+
+109
+00:03:42,280 --> 00:03:43,330
+Usually, I use a pretty
+通常
+
+110
+00:03:43,600 --> 00:03:44,790
+small value for epsilon and
+我给ε取很小的值
+
+111
+00:03:45,040 --> 00:03:46,270
+set epsilon to be maybe
+比如可能取
+
+112
+00:03:46,530 --> 00:03:48,220
+on the order of 10 to the minus 4.
+10的-4次方
+
+113
+00:03:48,740 --> 00:03:49,890
+There's usually a large range
+ε的取值在一个
+
+114
+00:03:50,190 --> 00:03:52,280
+of different values for epsilon that work just fine.
+很大范围内都是可行的
+
+115
+00:03:53,050 --> 00:03:54,470
+And in fact, if you
+实际上
+
+116
+00:03:55,280 --> 00:03:56,540
+let epsilon become really small
+如果你让ε非常小
+
+117
+00:03:57,010 --> 00:03:58,580
+then, mathematically, this term here
+那么 数学上
+
+118
+00:03:59,210 --> 00:04:00,790
+actually, mathematically, you know,
+这里这项实际上就是导数
+
+119
+00:04:01,000 --> 00:04:02,340
+becomes the derivative, becomes exactly
+就变成了函数
+
+120
+00:04:02,860 --> 00:04:04,310
+the slope of the function at this point.
+在这点上准确的斜率
+
+121
+00:04:05,050 --> 00:04:05,730
+It's just that we don't want
+只是我们不想用
+
+122
+00:04:05,910 --> 00:04:06,980
+to use epsilon that's too, too
+非常非常小的ε
+
+123
+00:04:07,170 --> 00:04:09,630
+small because then you might run into numerical problems.
+因为可能会产生数值问题
+
+124
+00:04:10,130 --> 00:04:11,070
+So, you know, I usually use
+所以我通常让ε
+
+125
+00:04:11,380 --> 00:04:14,200
+epsilon around 10 to the minus 4, say.
+差不多等于10^-4
+
+126
+00:04:14,470 --> 00:04:15,220
+And by the way some of you may
+顺便说一下 可能你们有些学习者
+
+127
+00:04:15,330 --> 00:04:17,590
+have seen it alternative formula for
+见过另外这种
+
+128
+00:04:17,750 --> 00:04:19,710
+estimating the derivative which is this formula.
+估计导数的公式
+
+129
+00:04:21,590 --> 00:04:23,500
+This one on the right is called the one-sided difference.
+右边这个叫做单侧拆分
+
+130
+00:04:24,040 --> 00:04:26,580
+Whereas, the formula on the left that's called a two-sided difference.
+左边这个公式叫做双侧差分
+
+131
+00:04:27,120 --> 00:04:28,670
+The two-sided difference gives
+双侧差分给我们了一个
+
+132
+00:04:28,890 --> 00:04:29,750
+us a slightly more accurate estimate,
+稍微精确些的估计
+
+133
+00:04:30,170 --> 00:04:31,410
+so I usually use that rather
+所以我通常用那个
+
+134
+00:04:31,670 --> 00:04:33,540
+than just this one-sided difference estimate.
+而不用这个单侧差分估计
+
+135
+00:04:35,900 --> 00:04:37,280
+So, concretely, what you implement
+具体地说 你在Octave中实现时
+
+136
+00:04:37,750 --> 00:04:39,280
+in Octave is you implement the following.
+要使用下面这个
+
+137
+00:04:40,270 --> 00:04:41,490
+You implement call to compute, gradApprox
+你的程序要调用
+
+138
+00:04:41,600 --> 00:04:43,160
+which is going to
+gradApprox来计算
+
+139
+00:04:43,270 --> 00:04:44,590
+be approximation to zero relative
+这个函数
+
+140
+00:04:45,380 --> 00:04:46,820
+as just, you know, this formula: J of
+会通过这个公式
+
+141
+00:04:47,200 --> 00:04:48,550
+theta plus epsilon, minus J of theta,
+J(θ+ε)-J(θ-ε)
+
+142
+00:04:48,730 --> 00:04:50,800
+minus epsilon, divided by two times epsilon.
+除以2ε
+
+143
+00:04:52,060 --> 00:04:52,980
+And this will give you a
+它会给出这点导数的
+
+144
+00:04:53,100 --> 00:04:56,110
+numerical estimate of the gradient at that point.
+数值估计
+
+145
+00:04:56,590 --> 00:04:58,910
+And in this example it seems like it's a pretty good estimate.
+在这个例子中 它看起来是个很好的估计
+
+146
+00:05:01,970 --> 00:05:03,460
+Now, on the previous slide,
+在之前的幻灯片中
+
+147
+00:05:03,710 --> 00:05:05,040
+we consider the case of
+我们考虑了
+
+148
+00:05:05,290 --> 00:05:07,010
+when theta was a real number.
+θ是一个实数的情况
+
+149
+00:05:08,000 --> 00:05:08,670
+Now, let's look at the more
+现在我们看更普遍的情况
+
+150
+00:05:08,900 --> 00:05:11,650
+general case of where theta is a vector parameter.
+θ是一个向量参数
+
+151
+00:05:12,220 --> 00:05:13,270
+So let's say theta is an
+假如说θ是n维向量
+
+152
+00:05:13,520 --> 00:05:14,610
+Rn, and it might be unreal
+它可能是我们的
+
+153
+00:05:15,000 --> 00:05:16,510
+version of the parameters of
+神经网络参数的
+
+154
+00:05:16,610 --> 00:05:18,010
+our neural network. So
+展开形式
+
+155
+00:05:18,250 --> 00:05:19,580
+theta is a vector that
+所以θ是一个有
+
+156
+00:05:19,800 --> 00:05:21,230
+has n elements, theta 1
+有n个元素的向量
+
+157
+00:05:21,350 --> 00:05:25,100
+up to theta n. We
+θ1到θn
+
+158
+00:05:25,240 --> 00:05:26,530
+can then use a similar idea
+我们可以用类似的想法
+
+159
+00:05:27,080 --> 00:05:29,300
+to approximate all of the partial derivative terms.
+来估计所有的偏导数项
+
+160
+00:05:30,250 --> 00:05:31,730
+Concretely, the partial derivative
+具体地说
+
+161
+00:05:32,420 --> 00:05:33,840
+of a cost function with respect
+代价函数对
+
+162
+00:05:34,110 --> 00:05:35,710
+to the first parameter theta
+第一个参数θ1取偏导数
+
+163
+00:05:36,110 --> 00:05:37,270
+1, that can be
+它可以用J
+
+164
+00:05:37,410 --> 00:05:40,270
+obtained by taking J and increasing theta 1.
+和增大的θ1得到
+
+165
+00:05:40,380 --> 00:05:43,030
+So you have J of theta 1 plus epsilon, and so on
+所以你有J(θ1+ε) 等等
+
+166
+00:05:43,520 --> 00:05:44,780
+minus J of this theta
+减去J(θ1-ε)
+
+167
+00:05:45,520 --> 00:05:46,820
+1 minus epsilon and divide it by 2 epsilon.
+然后除以2ε
+
+168
+00:05:48,130 --> 00:05:49,660
+The partial derivative respect to
+对第二个参数θ2
+
+169
+00:05:49,740 --> 00:05:51,090
+the second parameter theta 2, is
+取偏导数
+
+170
+00:05:51,620 --> 00:05:53,130
+again this thing, except you're
+还是这样
+
+171
+00:05:53,270 --> 00:05:54,370
+taking J of, here you're
+除了你要对
+
+172
+00:05:54,740 --> 00:05:56,240
+increasing theta 2 by epsilon.
+θ2+ε取J
+
+173
+00:05:56,570 --> 00:05:58,290
+And here you're decreasing theta 2 by epsilon.
+这里还有θ2-ε
+
+174
+00:05:59,100 --> 00:06:00,170
+And so on down to the
+这样计算后面的偏导数
+
+175
+00:06:00,260 --> 00:06:01,680
+derivative with respect to
+直到θn
+
+176
+00:06:01,780 --> 00:06:02,780
+theta n. Would be if you
+它的算法是
+
+177
+00:06:03,030 --> 00:06:04,550
+increase and decrease theta n
+对θn增加
+
+178
+00:06:05,060 --> 00:06:06,140
+by epsilon over there.
+和减少ε
+
+179
+00:06:09,790 --> 00:06:11,550
+So, these equations give
+这些公式
+
+180
+00:06:11,720 --> 00:06:13,580
+you a way to numerically approximate
+给出一个计算J
+
+181
+00:06:14,690 --> 00:06:16,500
+the partial derivative of "J"
+对任意参数求偏导数的
+
+182
+00:06:17,250 --> 00:06:20,100
+with respect to any one of your parameters they derive.
+数值近似的方法
+
+183
+00:06:23,640 --> 00:06:26,030
+Concretely, what you implement is therefore, the following.
+具体地说 你要实现的是下面这个
+
+184
+00:06:27,900 --> 00:06:29,260
+We implement the following in Octave
+我们把这个用在Octave里
+
+185
+00:06:29,820 --> 00:06:31,000
+to numerically compute the derivatives.
+来计算数值导数
+
+186
+00:06:32,220 --> 00:06:33,670
+We say for i equals 1
+假如 i
+
+187
+00:06:33,790 --> 00:06:35,110
+through n where n is
+等于 1 到 n
+
+188
+00:06:35,310 --> 00:06:37,140
+the dimension of our parameter vector theta.
+n是我们的参数向量θ的维度
+
+189
+00:06:37,730 --> 00:06:40,680
+And I usually do this with the unrolled version of the parameters.
+我通常用参数的展开形式来计算
+
+190
+00:06:41,250 --> 00:06:42,210
+So you know theta is just
+你知道θ只是我们
+
+191
+00:06:42,530 --> 00:06:44,770
+a long list of all of my parameters in my neural networks.
+神经网络模型的一长列参数
+
+192
+00:06:46,230 --> 00:06:47,550
+I'm going to set theta plus equals
+我让thetaPlus等于theta
+
+193
+00:06:47,830 --> 00:06:49,270
+theta, then increase theta plus
+然后给thetaPlus的第 i 项
+
+194
+00:06:49,630 --> 00:06:51,170
+the ith element by epsilon.
+加上EPSILON
+
+195
+00:06:51,660 --> 00:06:53,010
+And so this is basically
+这就是基本的
+
+196
+00:06:53,720 --> 00:06:54,830
+theta plus is equal to theta
+thetaPlus等于theta
+
+197
+00:06:55,340 --> 00:06:56,280
+except for theta plus i,
+除了thetaPlus(i)
+
+198
+00:06:56,580 --> 00:06:57,820
+which is now incremented by epsilon.
+它会增加EPSILON
+
+199
+00:06:58,310 --> 00:06:59,400
+So if theta plus
+所以如果thetaPlus
+
+200
+00:07:00,810 --> 00:07:01,880
+is equal to, right, theta
+等于θ1 θ2 等等
+
+201
+00:07:01,970 --> 00:07:03,370
+1, theta 2, and so on and then theta
+那么θi
+
+202
+00:07:04,020 --> 00:07:05,160
+i has epsilon added to
+增加了EPSILON
+
+203
+00:07:05,350 --> 00:07:06,590
+it, and then it go down to
+然后一直到θn
+
+204
+00:07:06,780 --> 00:07:08,440
+theta n. So this is what theta plus is.
+这就是thetaPlus的作用
+
+205
+00:07:08,690 --> 00:07:11,340
+And similarly these two
+类似的 这两行
+
+206
+00:07:11,530 --> 00:07:13,380
+lines set theta minus to
+给thetaMinus
+
+207
+00:07:13,480 --> 00:07:15,090
+something similar except that
+类似地赋值
+
+208
+00:07:15,560 --> 00:07:16,720
+this, instead of theta i
+只是θi不是加EPSILON
+
+209
+00:07:16,930 --> 00:07:19,150
+plus epsilon, this now becomes theta i minus epsilon.
+而是减EPSILON
+
+210
+00:07:20,670 --> 00:07:22,320
+And then finally, you implement
+最后 你运行这个
+
+211
+00:07:22,830 --> 00:07:24,370
+this gradApprox i,
+gradApprox(i)
+
+212
+00:07:25,190 --> 00:07:26,430
+and this will give you
+它会给你近似的
+
+213
+00:07:27,210 --> 00:07:28,420
+your approximation to the partial
+J(θ)对θi的
+
+214
+00:07:28,800 --> 00:07:30,250
+derivative with respect to
+偏导数
+
+215
+00:07:30,430 --> 00:07:32,430
+theta i of J of theta.
+我们实现
+
+216
+00:07:35,330 --> 00:07:36,420
+And the way we use this
+神经网络时
+
+217
+00:07:36,760 --> 00:07:38,530
+in our neural network implementation is
+是这样用的
+
+218
+00:07:38,850 --> 00:07:41,530
+we would implement this, implement this
+我们要实现这个
+
+219
+00:07:41,770 --> 00:07:43,310
+FOR loop to compute, you know, the top partial
+用for循环来计算
+
+220
+00:07:44,080 --> 00:07:45,570
+derivative of the cost
+代价函数对
+
+221
+00:07:45,860 --> 00:07:48,570
+function with respect to every parameter in our network.
+每个网络中的参数的偏导数
+
+222
+00:07:49,450 --> 00:07:51,120
+And we can then take the
+然后我们用从
+
+223
+00:07:51,350 --> 00:07:53,070
+gradient that we got from back prop.
+反向传播得到的梯度
+
+224
+00:07:53,740 --> 00:07:55,110
+So DVec was the derivatives
+DVec是我们从反向传播中
+
+225
+00:07:55,770 --> 00:07:57,150
+we got from back prop.
+得到的导数
+
+226
+00:07:58,380 --> 00:08:00,610
+Right, so back prop, back-propagation was
+所以后向传播是一个
+
+227
+00:08:00,890 --> 00:08:02,030
+a relatively efficient way to
+相对比较有效率的
+
+228
+00:08:02,090 --> 00:08:03,350
+compute the derivatives or the
+计算代价函数
+
+229
+00:08:03,430 --> 00:08:04,970
+partial derivatives of a
+对参数的导数
+
+230
+00:08:05,110 --> 00:08:06,850
+cost function with respect to all of our parameters.
+或偏导数的方法
+
+231
+00:08:07,820 --> 00:08:08,960
+And what I usually do
+接下来
+
+232
+00:08:09,350 --> 00:08:10,820
+is then take my numerically
+我通常做的是
+
+233
+00:08:11,440 --> 00:08:12,830
+computed derivative, that is
+计算数值导数
+
+234
+00:08:12,960 --> 00:08:14,080
+this gradApprox that we
+就是gradApprox
+
+235
+00:08:14,250 --> 00:08:15,830
+just had from up here and
+我们刚从上面这里得到的
+
+236
+00:08:15,920 --> 00:08:17,030
+make sure that that is
+来确定它等于
+
+237
+00:08:17,290 --> 00:08:19,420
+equal or approximately equal
+或者近似于
+
+238
+00:08:19,980 --> 00:08:21,080
+up to, you know, small values
+差距很小
+
+239
+00:08:21,810 --> 00:08:22,770
+of numerical round off that is
+非常接近我们
+
+240
+00:08:22,970 --> 00:08:25,640
+pretty close to the DVec that I got from back prop.
+从反向传播得到的DVec
+
+241
+00:08:26,510 --> 00:08:27,460
+And if these two ways
+如果这两种
+
+242
+00:08:27,930 --> 00:08:29,550
+of computing the derivative give me
+计算导数的方法
+
+243
+00:08:29,650 --> 00:08:31,040
+the same answer or at least give me
+给你相同的结果或者非常接近结果
+
+244
+00:08:31,300 --> 00:08:33,670
+very similar answers, you know, up to a few decimal places.
+最多几位小数的差距
+
+245
+00:08:34,720 --> 00:08:36,560
+Then I'm much more confident that
+那么我就非常确信
+
+246
+00:08:36,710 --> 00:08:38,720
+my implementation of back prop is correct.
+我实现的反向传播时正确的
+
+247
+00:08:40,000 --> 00:08:41,230
+And when I plug these DVec
+然后我把这些DVec向量用在
+
+248
+00:08:41,660 --> 00:08:43,320
+vectors into gradient descent
+梯度下降法或者
+
+249
+00:08:43,760 --> 00:08:45,610
+or some advanced optimization algorithm,
+其他高级优化算法里
+
+250
+00:08:45,760 --> 00:08:46,850
+I can then be much
+然后我就可以比较确信
+
+251
+00:08:47,100 --> 00:08:48,870
+more confident that I'm computing
+我计算的导数
+
+252
+00:08:49,360 --> 00:08:51,010
+the derivatives correctly and therefore,
+是正确的
+
+253
+00:08:51,450 --> 00:08:52,670
+that hopefully my codes will
+那么 我的代码
+
+254
+00:08:52,790 --> 00:08:53,890
+run correctly and do a
+应该也可以正确运行
+
+255
+00:08:53,980 --> 00:08:55,570
+good job optimizing J of theta.
+可以很好地优化J(θ)
+
+256
+00:08:57,700 --> 00:08:58,680
+Finally, I want to put
+最后 我想把
+
+257
+00:08:58,860 --> 00:09:00,050
+everything together and tell you
+所有的东西放在一起
+
+258
+00:09:00,310 --> 00:09:02,950
+how to implement this numerical gradient checking.
+然后告诉你怎么实现这个数值梯度检验
+
+259
+00:09:03,630 --> 00:09:04,370
+Here's what I usually do.
+这是我通常做的
+
+260
+00:09:04,970 --> 00:09:06,020
+First thing I do, is implement
+第一件事
+
+261
+00:09:06,500 --> 00:09:08,180
+back-propagation to compute defects.
+是实现反向传播来计算DVec
+
+262
+00:09:08,490 --> 00:09:09,560
+So, this is a procedure we talked
+这个步骤是我们
+
+263
+00:09:09,830 --> 00:09:11,250
+about in an earlier video to
+之前的视频中讲过的
+
+264
+00:09:11,490 --> 00:09:13,530
+compute DVec which may be our unrolled version of these matrices.
+计算DVec 它可能是这些矩阵的展开形式
+
+265
+00:09:15,410 --> 00:09:16,550
+Then what I do, is implement
+然后我要做的是
+
+266
+00:09:17,010 --> 00:09:20,130
+a numerical gradient checking to compute gradApprox.
+用gradApprox实现数值梯度检验
+
+267
+00:09:20,590 --> 00:09:23,550
+So this is what I described earlier in this video, in the previous slide.
+这是我在这节视频前面部分讲的 在之前的幻灯片里
+
+268
+00:09:24,900 --> 00:09:27,680
+Then you should make sure that DVec and gradApprox
+然后你要确定DVec和gradApprox给出接近的结果
+
+269
+00:09:28,170 --> 00:09:30,860
+gives similar values, you know, let's say up to a few decimal places.
+可能最多差几位小数
+
+270
+00:09:32,270 --> 00:09:33,160
+And finally, and this
+最后
+
+271
+00:09:33,240 --> 00:09:35,230
+the important step, the more
+这是最重要的一步
+
+272
+00:09:35,480 --> 00:09:36,690
+you start to use your code
+在使用你的代码去学习
+
+273
+00:09:37,000 --> 00:09:38,220
+for learning, for seriously training
+训练你的网络之前
+
+274
+00:09:38,570 --> 00:09:40,960
+your network, it is important to turn off gradient checking.
+重要的是要关掉梯度检验
+
+275
+00:09:41,490 --> 00:09:42,800
+And to no longer compute
+不再使用
+
+276
+00:09:43,630 --> 00:09:44,940
+this gradApprox thing using
+这节视频前面讲的
+
+277
+00:09:45,250 --> 00:09:47,660
+the numerical derivative formulas that
+这个数值导数公式
+
+278
+00:09:47,980 --> 00:09:48,950
+we talked about earlier in this
+来计算
+
+279
+00:09:50,560 --> 00:09:50,560
+video.
+gradApprox
+
+280
+00:09:50,960 --> 00:09:52,180
+And the reason for that is the
+这样做的原因是
+
+281
+00:09:52,330 --> 00:09:53,800
+numeric code gradient checking code,
+我们之前讲的这个
+
+282
+00:09:54,120 --> 00:09:54,930
+the stuff we talked about in
+数值梯度检验代码
+
+283
+00:09:55,010 --> 00:09:56,220
+this video, that's a very
+是一个计算量
+
+284
+00:09:56,650 --> 00:09:58,570
+computationally expensive, that's a
+非常大的程序
+
+285
+00:09:58,600 --> 00:10:00,960
+very slow way to try to approximate the derivative.
+它是一个非常慢的计算近似导数的方法
+
+286
+00:10:02,080 --> 00:10:03,490
+Whereas in contrast, the back-propagation
+而相对地
+
+287
+00:10:03,900 --> 00:10:04,710
+algorithm that we talked about
+我们之前讲的
+
+288
+00:10:04,940 --> 00:10:06,120
+earlier, that is the
+反向传播算法
+
+289
+00:10:06,370 --> 00:10:07,270
+thing that we talked about earlier
+也就是那个
+
+290
+00:10:07,460 --> 00:10:08,900
+for computing, you know, D1, D2,
+DVec的D(1) D(2) D(3)的算法
+
+291
+00:10:09,320 --> 00:10:11,620
+D3, or for DVec. Back prop is
+反向传播是一个在计算导数上
+
+292
+00:10:11,790 --> 00:10:14,930
+a much more computationally efficient way of computing the derivatives.
+效率更高的方法
+
+293
+00:10:17,070 --> 00:10:18,650
+So once you've verified that
+所以当你确认了
+
+294
+00:10:18,770 --> 00:10:20,270
+your implementation of back-propagation is
+你的反向传播算法是正确的
+
+295
+00:10:20,620 --> 00:10:21,840
+correct, you should turn
+你应该关掉梯度检验
+
+296
+00:10:22,160 --> 00:10:24,140
+off gradient checking, and just stop using that.
+就是不使用它
+
+297
+00:10:25,090 --> 00:10:26,380
+So just to reiterate, you
+再重申一下
+
+298
+00:10:26,540 --> 00:10:27,720
+should be sure to disable your
+在为了训练分类器
+
+299
+00:10:27,840 --> 00:10:29,380
+gradient checking code before running
+运行你的算法
+
+300
+00:10:29,690 --> 00:10:30,840
+your algorithm for many
+做很多次梯度下降
+
+301
+00:10:31,140 --> 00:10:32,560
+iterations of gradient descent, or
+或高级优化算法的迭代之前
+
+302
+00:10:32,670 --> 00:10:33,690
+for many iterations of the
+要确定你
+
+303
+00:10:33,890 --> 00:10:34,990
+advanced optimization algorithms in
+不再使用
+
+304
+00:10:35,820 --> 00:10:37,140
+order to train your classifier.
+梯度检验的程序
+
+305
+00:10:37,980 --> 00:10:39,120
+Concretely, if you were
+具体来说
+
+306
+00:10:39,290 --> 00:10:40,830
+to run numerical gradient checking
+如果你在每次的梯度下降法迭代时
+
+307
+00:10:41,340 --> 00:10:43,710
+on every single integration of gradient
+都运行数值梯度检验
+
+308
+00:10:44,040 --> 00:10:44,650
+descent, or if you were in the
+或者你用在
+
+309
+00:10:44,850 --> 00:10:45,780
+inner loop of your cost function,
+代价函数的内循环里
+
+310
+00:10:46,670 --> 00:10:47,910
+then your code will be very slow.
+你的程序会变得非常慢
+
+311
+00:10:48,240 --> 00:10:49,860
+Because the numerical gradient checking
+因为数值梯度检验程序
+
+312
+00:10:50,180 --> 00:10:51,690
+code is much slower than
+比反向传播算法
+
+313
+00:10:51,900 --> 00:10:53,960
+the back-propagation algorithm, than
+要慢很多
+
+314
+00:10:54,160 --> 00:10:56,160
+a back-propagation method where you
+反向传播算法
+
+315
+00:10:56,340 --> 00:10:57,650
+remember we were computing delta
+就是我们计算
+
+316
+00:10:58,000 --> 00:10:59,820
+4, delta 3, delta 2, and so on.
+δ(4) δ(3) δ(2) 等等的
+
+317
+00:10:59,900 --> 00:11:02,470
+That was the back-propagation algorithm.
+那就是反向传播算法
+
+318
+00:11:02,990 --> 00:11:05,770
+That is a much faster way to compute derivatives than gradient checking.
+那是一个比梯度检验更快的计算导数的方法
+
+319
+00:11:06,620 --> 00:11:08,400
+So when you're ready, once
+所以当你准备好了
+
+320
+00:11:08,620 --> 00:11:10,190
+you verify the implementation of back-propagation
+一旦你验证了
+
+321
+00:11:10,480 --> 00:11:12,140
+is correct, make sure you
+反向传播的实现是正确的
+
+322
+00:11:12,220 --> 00:11:13,050
+turn off, or you disable
+要确定你在训练算法时把它关闭了
+
+323
+00:11:13,640 --> 00:11:15,070
+your gradient checking code while
+或者说不再使用梯度检验程序
+
+324
+00:11:15,270 --> 00:11:17,880
+you train your algorithm, or else your code could run very slowly.
+否则你的程序会运行得非常慢
+
+325
+00:11:20,420 --> 00:11:22,470
+So that's how you take gradients numerically.
+所以如果你计算用数值方法计算导数
+
+326
+00:11:23,110 --> 00:11:24,300
+And that's how you can verify that
+那是你用来确定反向传播实现
+
+327
+00:11:24,420 --> 00:11:26,300
+your implementation of back-propagation is correct.
+是否正确的的方法
+
+328
+00:11:27,230 --> 00:11:29,290
+Whenever I implement back-propagation or
+当我实现反向传播
+
+329
+00:11:29,450 --> 00:11:31,130
+similar gradient descent algorithm for
+或者类似的复杂模型的梯度下降算法
+
+330
+00:11:31,250 --> 00:11:33,410
+a complicated model, I always use gradient checking.
+我经常使用梯度检验
+
+331
+00:11:33,730 --> 00:11:36,230
+This really helps me make sure that my code is correct.
+这的确能帮我确定我的代码是正确的
+
diff --git a/srt/9 - 6 - Random Initialization (7 min).srt b/srt/9 - 6 - Random Initialization (7 min).srt
new file mode 100644
index 00000000..543d8103
--- /dev/null
+++ b/srt/9 - 6 - Random Initialization (7 min).srt
@@ -0,0 +1,965 @@
+1
+00:00:00,540 --> 00:00:01,820
+In the previous videos, we put
+在前面的视频中
+
+2
+00:00:01,950 --> 00:00:03,220
+together almost all
+我们总结了
+
+3
+00:00:03,270 --> 00:00:04,620
+the pieces you need in order
+所有
+
+4
+00:00:04,820 --> 00:00:07,170
+to implement and train in your network.
+需要在网络中实施和训练的部分
+
+5
+00:00:07,940 --> 00:00:09,060
+There's just one last idea I
+现在是最后一个
+
+6
+00:00:09,120 --> 00:00:09,980
+need to share with you, which
+我需要分享给你们的想法
+
+7
+00:00:10,200 --> 00:00:11,570
+is the idea of random initialization.
+这就是随机初始化的思想
+
+8
+00:00:13,220 --> 00:00:14,360
+When you're running an algorithm like
+当你运行一个算法
+
+9
+00:00:14,510 --> 00:00:15,990
+gradient descent or also the
+例如梯度下降算法
+
+10
+00:00:16,280 --> 00:00:17,810
+advanced optimization algorithms, we
+或者是先进的优化算法时
+
+11
+00:00:17,940 --> 00:00:20,770
+need to pick some initial value for the parameters theta.
+我们需要给变量θ一些初始值
+
+12
+00:00:21,610 --> 00:00:22,990
+So for the advanced optimization algorithm, you know,
+所以对于那些先进的优化算法
+
+13
+00:00:23,570 --> 00:00:24,620
+it assumes that you will
+假设
+
+14
+00:00:24,780 --> 00:00:26,090
+pass it some initial value
+给变量θ
+
+15
+00:00:26,700 --> 00:00:27,640
+for the parameters theta.
+传递一些初始值
+
+16
+00:00:29,010 --> 00:00:30,680
+Now let's consider gradient descent.
+现在让我们考虑梯度下降
+
+17
+00:00:31,320 --> 00:00:34,090
+For that, you know, we also need to initialize theta to something.
+同样 我们需要把θ初始化成一些值
+
+18
+00:00:34,580 --> 00:00:36,030
+And then we can slowly take steps
+接下来使用梯度下降方法
+
+19
+00:00:36,680 --> 00:00:38,830
+go downhill, using graded descent,
+慢慢地执行这些步骤使其下降
+
+20
+00:00:38,910 --> 00:00:40,920
+to go downhill to minimize the function J of theta.
+使θ的函数J下降到最小
+
+21
+00:00:41,990 --> 00:00:43,960
+So what do we set the initial value of theta to?
+那么θ的初始值该设置为多少呢?
+
+22
+00:00:44,240 --> 00:00:47,000
+Is it possible to set
+是否可以
+
+23
+00:00:47,520 --> 00:00:48,930
+the initial value of theta
+将θ的初始值设为
+
+24
+00:00:49,250 --> 00:00:50,450
+to the vector of all zeroes.
+全部是0的向量
+
+25
+00:00:51,870 --> 00:00:54,800
+Whereas this worked okay when we were using logistic regression.
+当使用逻辑回归
+
+26
+00:00:55,630 --> 00:00:56,690
+Initializing all of your
+初始化所有变量为0
+
+27
+00:00:56,760 --> 00:00:57,970
+parameters to zero actually
+这样做是否可行
+
+28
+00:00:58,310 --> 00:01:00,290
+does not work when you're trading a neural network.
+实际上在训练神经网络时这样做是不可行的
+
+29
+00:01:01,410 --> 00:01:03,150
+Consider training the following neural network.
+以训练这个神经网络为例
+
+30
+00:01:03,650 --> 00:01:06,430
+And let's say we initialized all of the parameters in the network to zero.
+照之前所说将网络中的所有变量初始化为0
+
+31
+00:01:07,970 --> 00:01:09,210
+And if you do that then
+如果是这样的话
+
+32
+00:01:09,780 --> 00:01:10,920
+what that means is that
+具体来说就是
+
+33
+00:01:11,160 --> 00:01:13,870
+at the initialization this blue weight, that I'm covering blue
+当初始化这条蓝色权重
+
+34
+00:01:15,390 --> 00:01:16,540
+is going to equal to that weight.
+使这条被涂为蓝色的权重等于那条蓝色的权重
+
+35
+00:01:17,510 --> 00:01:17,510
+So, they're both zero.
+他们都是0
+
+36
+00:01:18,580 --> 00:01:19,880
+And this weight that I'm covering
+这条被涂上红色的权重
+
+37
+00:01:20,330 --> 00:01:21,940
+in in red, is equal to that weight.
+同样等于
+
+38
+00:01:22,550 --> 00:01:23,040
+Which I'm covering it in red.
+被涂上红色的这条权重
+
+39
+00:01:23,790 --> 00:01:25,280
+And also this weight, well
+同样这个权重
+
+40
+00:01:25,620 --> 00:01:26,500
+which I'm covering it in green
+这个被涂成绿色的权重也一样
+
+41
+00:01:26,680 --> 00:01:28,940
+is going to be equal to the value of that weight.
+等于那条绿色的权重
+
+42
+00:01:30,030 --> 00:01:32,820
+And what that means is that both of your hidden units: a1 and a2
+那样是否就意味着这两个隐藏单元a1 a2
+
+43
+00:01:32,950 --> 00:01:35,940
+are going to be computing the same function
+以同一个输入函数
+
+44
+00:01:36,660 --> 00:01:36,810
+of your inputs.
+计算
+
+45
+00:01:37,810 --> 00:01:38,900
+And thus, you end up with
+这样
+
+46
+00:01:39,500 --> 00:01:40,870
+for everyone of your training your examples.
+对每个例子进行训练
+
+47
+00:01:41,430 --> 00:01:43,640
+You end up with a(2)1 equals a(2)2.
+最后a(2)1与a(2)2结果必然相等
+
+48
+00:01:46,950 --> 00:01:48,700
+and moreover because, I'm not
+更多的原因
+
+49
+00:01:48,960 --> 00:01:50,050
+going to show this too much
+我就不详细讲述了
+
+50
+00:01:50,310 --> 00:01:51,420
+detail, but because these out
+而由于
+
+51
+00:01:51,580 --> 00:01:52,990
+going weights are the same you
+这些权重相同
+
+52
+00:01:53,080 --> 00:01:54,630
+can also show that the
+同样可以得出
+
+53
+00:01:54,710 --> 00:01:56,560
+delta values are also going to be the same.
+这些δ值也相同
+
+54
+00:01:56,790 --> 00:01:57,790
+So concretely, you end up
+具体地说
+
+55
+00:01:57,970 --> 00:02:00,070
+with delta 1 1,
+δ(2)1=δ(2)2
+
+56
+00:02:00,760 --> 00:02:02,900
+delta 2 1, equals delta 2 2.
+δ(2)1=δ(2)2
+
+57
+00:02:06,120 --> 00:02:07,150
+And if you work through the
+同时
+
+58
+00:02:07,230 --> 00:02:08,480
+map further, what you can
+从图中进一步可以得出
+
+59
+00:02:08,760 --> 00:02:09,990
+show is that the partial derivatives
+这些变量的偏导数
+
+60
+00:02:11,560 --> 00:02:14,080
+with respect to your parameters will satisfy the following.
+满足以下条件
+
+61
+00:02:15,120 --> 00:02:16,710
+That the partial derivative
+成本函数的
+
+62
+00:02:17,550 --> 00:02:19,260
+of the cost
+偏导数
+
+63
+00:02:19,580 --> 00:02:21,020
+function with respect to
+求出的导数
+
+64
+00:02:21,800 --> 00:02:23,680
+writing out the derivatives respect to
+代表了
+
+65
+00:02:23,900 --> 00:02:25,320
+these two blue weights neural network.
+神经网络中两条蓝色的权重
+
+66
+00:02:26,190 --> 00:02:27,290
+You'll find that these two partial
+可以注意到
+
+67
+00:02:27,680 --> 00:02:30,340
+derivatives are going to be equal to each other.
+这两个偏导数互为相等
+
+68
+00:02:31,970 --> 00:02:33,180
+And so, what this means, is
+这也就意味着
+
+69
+00:02:33,320 --> 00:02:35,820
+that even after say, one gradient descent update.
+甚至可以说 一旦更新梯度下降方法
+
+70
+00:02:36,690 --> 00:02:38,200
+You're going to update, say this
+第一个蓝色权重也会更新
+
+71
+00:02:38,470 --> 00:02:40,800
+first blue weight with, you know, learning rate times this.
+等于学习率乘以这个式子
+
+72
+00:02:41,580 --> 00:02:42,500
+And you're going to update the second
+第二条蓝色权重更新为
+
+73
+00:02:42,920 --> 00:02:44,620
+blue weight to a sum learning rate times this.
+学习率乘上这个式子
+
+74
+00:02:44,820 --> 00:02:45,870
+But what this means is
+但是 这就意味着
+
+75
+00:02:45,980 --> 00:02:47,090
+that even after one gradient
+一旦更新梯度下降
+
+76
+00:02:47,420 --> 00:02:49,330
+descent update, those two
+这两条
+
+77
+00:02:49,680 --> 00:02:50,710
+blue weights, those two blue
+蓝色权重的值
+
+78
+00:02:51,430 --> 00:02:53,050
+color parameters will end
+最后
+
+79
+00:02:53,240 --> 00:02:54,960
+up the same as each other.
+将互为相等
+
+80
+00:02:55,190 --> 00:02:56,210
+So they'll be some non-zero
+因此 即使权重现在不都为0
+
+81
+00:02:56,750 --> 00:02:57,720
+value now, but this value
+但参数的值
+
+82
+00:02:58,550 --> 00:02:59,520
+will be equal to that value.
+最后也互为相等
+
+83
+00:03:00,360 --> 00:03:02,790
+And similarly, even after one gradient descent update.
+同样地 即使更新一个梯度下降
+
+84
+00:03:03,690 --> 00:03:05,740
+This value will equal to that value.
+这条红色的权重也会等于这条红色的权重
+
+85
+00:03:06,170 --> 00:03:07,200
+There will be some non-zero values.
+也许会有些非0的值
+
+86
+00:03:07,640 --> 00:03:09,450
+Just that the two red values will be equal to each other.
+仅仅是两条红色的值会互为相等
+
+87
+00:03:10,240 --> 00:03:11,760
+And similarly the two green
+同样两条绿色的权重
+
+88
+00:03:12,060 --> 00:03:13,720
+weights, they'll both change values
+开始它们有不同的值
+
+89
+00:03:13,860 --> 00:03:16,350
+but they'll both end up the same value as each other.
+最后这两个权重也会互为相等
+
+90
+00:03:17,590 --> 00:03:19,020
+So after each update, the parameters corresponding
+所以每次更新后
+
+91
+00:03:19,740 --> 00:03:20,890
+to the inputs going to each
+两个隐藏单元的输入参数
+
+92
+00:03:21,060 --> 00:03:22,870
+of the two hidden units identical.
+相同
+
+93
+00:03:23,700 --> 00:03:24,490
+That's just saying that the two
+这只是说
+
+94
+00:03:24,710 --> 00:03:25,590
+green weights must be sustained,
+两条绿色的权重必须持续
+
+95
+00:03:25,640 --> 00:03:26,310
+the two red weights must be
+两条红色的权重必须持续
+
+96
+00:03:26,550 --> 00:03:27,750
+sustained, the two blue weights
+两条蓝色的权重
+
+97
+00:03:28,010 --> 00:03:30,000
+are still the same and what
+仍然相同
+
+98
+00:03:30,160 --> 00:03:31,590
+that means is that even after
+这就意味着
+
+99
+00:03:31,770 --> 00:03:33,070
+one iteration of say, gradient
+即使重复的说
+
+100
+00:03:33,460 --> 00:03:34,860
+descent, you find that
+梯度下降
+
+101
+00:03:35,600 --> 00:03:37,250
+your two hidden units are still
+你们会发现两个隐藏单元
+
+102
+00:03:37,800 --> 00:03:40,380
+computing exactly the same function that the input.
+仍然使用完全相同的输入函数计算
+
+103
+00:03:40,830 --> 00:03:43,040
+So you still have this a(1)2 equals a(2)2.
+因此a(1)2仍然等于a(2)2
+
+104
+00:03:43,510 --> 00:03:45,200
+And so you're back to this case.
+回到这个例子
+
+105
+00:03:45,930 --> 00:03:47,380
+And as keep running gradient descent.
+保持梯度下降
+
+106
+00:03:48,390 --> 00:03:50,940
+The blue weights, the two blue weights will stay the same as each other.
+这两条蓝色的权重仍然相同
+
+107
+00:03:51,190 --> 00:03:52,920
+The two red weights will stay the same as each other.
+两条红色的权重 两条绿色的权重
+
+108
+00:03:53,060 --> 00:03:54,990
+The two green weights will stay the same as each other.
+也是同样的情况
+
+109
+00:03:55,160 --> 00:03:56,860
+And what this means
+这也就意味着
+
+110
+00:03:57,130 --> 00:03:58,260
+is that your neural network really
+这个神经网络
+
+111
+00:03:58,470 --> 00:03:59,980
+can't compute very interesting functions.
+的确不能计算非常有意思的功能
+
+112
+00:04:00,700 --> 00:04:01,910
+Imagine that you had
+想象一下
+
+113
+00:04:02,240 --> 00:04:03,670
+not only two hidden
+不止有两个隐藏单元
+
+114
+00:04:04,010 --> 00:04:05,470
+units but imagine
+而是
+
+115
+00:04:05,640 --> 00:04:07,100
+that you had many many hidden units.
+有很多很多的隐藏单元
+
+116
+00:04:08,080 --> 00:04:09,160
+Then what this is saying is that
+这就是说
+
+117
+00:04:09,430 --> 00:04:10,680
+all of your hidden units are
+所有的隐藏单元
+
+118
+00:04:10,740 --> 00:04:12,320
+computing the exact same
+被完全相同的功能计算
+
+119
+00:04:12,540 --> 00:04:16,300
+feature, all of your hidden units are computing all of the exact same function of the input.
+所有的隐藏单元被完全相同的输入函数计算
+
+120
+00:04:17,030 --> 00:04:18,980
+And this is a highly redundant representation.
+这代表了高度冗余
+
+121
+00:04:20,140 --> 00:04:21,010
+Because that means that your
+因为 这意味着
+
+122
+00:04:21,110 --> 00:04:24,160
+final logistic regression unit, you know, really only gets to see one feature.
+最后的逻辑回归单元只会得到一个功能
+
+123
+00:04:24,730 --> 00:04:25,460
+Because all of these are the same
+因为所有的逻辑回归单元都一样
+
+124
+00:04:26,330 --> 00:04:28,690
+and this prevents your neural network from learning something interesting.
+这样阻止了神经网络学习一些有趣的事
+
+125
+00:04:31,600 --> 00:04:32,830
+In order to get around this
+为了解决这个问题
+
+126
+00:04:32,960 --> 00:04:34,050
+problem, the way we initialize
+神经网络变量
+
+127
+00:04:34,590 --> 00:04:35,680
+the parameters of a neural network
+初始化的方式
+
+128
+00:04:36,050 --> 00:04:37,660
+therefore, is with random initialization.
+采用随机初始化
+
+129
+00:04:41,820 --> 00:04:43,130
+Concretely, the problem we
+具体地
+
+130
+00:04:43,250 --> 00:04:44,470
+saw on the previous slide
+在上一张幻灯片中看到的
+
+131
+00:04:44,760 --> 00:04:46,240
+is sometimes called the problem
+有时被我们称为对称权重的问题是
+
+132
+00:04:46,640 --> 00:04:49,040
+of symmetric weights, that is if the weights all being the same.
+所有的权重相同
+
+133
+00:04:49,810 --> 00:04:51,470
+And so this random initialization
+所以这种随机初始化
+
+134
+00:04:52,590 --> 00:04:54,240
+is how we perform symmetry breaking.
+解决的是如何打破这种对称性
+
+135
+00:04:55,520 --> 00:04:56,480
+So what we do is we
+所以 我们需要做的是
+
+136
+00:04:56,680 --> 00:04:58,200
+initialize each value of
+对θ的每个值
+
+137
+00:04:58,310 --> 00:04:59,460
+theta to a random
+进行初始化
+
+138
+00:04:59,830 --> 00:05:01,300
+number between minus epsilon and epsilon.
+范围在负?到正?之间
+
+139
+00:05:02,080 --> 00:05:03,200
+So this is a notation to
+这个符号意味着
+
+140
+00:05:03,310 --> 00:05:05,350
+mean numbers between minus epsilon and plus epsilon.
+负?到正?之间
+
+141
+00:05:06,330 --> 00:05:07,430
+So my weights on my
+因此
+
+142
+00:05:07,540 --> 00:05:08,660
+parameters are all going
+变量的权重通常初始化为
+
+143
+00:05:08,710 --> 00:05:11,470
+to be randomly initialized between minus epsilon and plus epsilon.
+负?到正?之间的任意一个数
+
+144
+00:05:12,300 --> 00:05:13,330
+The way I write code to do
+我在octave里编写了这样的代码
+
+145
+00:05:13,420 --> 00:05:16,770
+this in octave, this I've said you know theta 1 to be equal to this.
+我之前讲过的θ1等于这个等式
+
+146
+00:05:17,550 --> 00:05:19,620
+So this rand 10 by 11.
+所以 这个10*11的矩阵
+
+147
+00:05:19,910 --> 00:05:21,060
+That's how you compute
+该怎样计算
+
+148
+00:05:21,640 --> 00:05:23,620
+a random 10 by 11
+一个任意的10*11维矩阵
+
+149
+00:05:24,670 --> 00:05:26,640
+dimensional matrix, and all
+矩阵中的所有值
+
+150
+00:05:27,070 --> 00:05:30,380
+of the values are between 0 and 1.
+都介于0到1之间
+
+151
+00:05:30,580 --> 00:05:31,350
+So these are going to
+所以
+
+152
+00:05:31,520 --> 00:05:32,700
+be real numbers that take on
+这些实数
+
+153
+00:05:32,870 --> 00:05:34,860
+any continuous values between 0 and 1.
+取0到1之间的连续值
+
+154
+00:05:35,450 --> 00:05:36,290
+And so, if you take a
+因此
+
+155
+00:05:36,320 --> 00:05:37,440
+number between 0 and
+如果取0到1之间的一个数
+
+156
+00:05:37,550 --> 00:05:38,310
+1, multiply it by 2
+和
+
+157
+00:05:38,590 --> 00:05:39,550
+times an epsilon, and
+2ε相乘
+
+158
+00:05:39,600 --> 00:05:41,050
+minus an epsilon, then you
+再减去ε
+
+159
+00:05:41,160 --> 00:05:42,270
+end up with a number that's
+然后得到
+
+160
+00:05:42,690 --> 00:05:44,160
+between minus epsilon and plus epsilon.
+一个在负ε到正ε间的数
+
+161
+00:05:45,640 --> 00:05:46,970
+And incidentally, this epsilon here
+顺便说一句
+
+162
+00:05:47,230 --> 00:05:48,410
+has nothing to do
+这个ε在这没有什么用处
+
+163
+00:05:48,730 --> 00:05:49,860
+with the epsilon that we were
+通常在进行梯度检查中
+
+164
+00:05:50,070 --> 00:05:51,710
+using when we were doing gradient checking.
+使用
+
+165
+00:05:52,590 --> 00:05:54,070
+So when we were doing numerical gradient checking,
+因此在进行数字化梯度检查时
+
+166
+00:05:54,850 --> 00:05:57,060
+there we were adding some values of epsilon to theta.
+会加一些ε值给θ
+
+167
+00:05:57,430 --> 00:05:59,560
+This is, you know, an unrelated value of epsilon.
+这些值与ε不相关
+
+168
+00:05:59,780 --> 00:06:00,590
+Which is why I am denoting
+这就是为什么我要在这里用ε表示
+
+169
+00:06:00,990 --> 00:06:02,200
+in it epsilon, just to distinguish
+仅仅是为了区分
+
+170
+00:06:02,480 --> 00:06:04,970
+it from the value of epsilon we were using in gradient checking.
+在梯度检查中使用的ε值
+
+171
+00:06:06,490 --> 00:06:07,590
+Absolutely, if you want to
+当然
+
+172
+00:06:07,690 --> 00:06:09,620
+initialize theta 2
+如果想要初始化θ2
+
+173
+00:06:09,640 --> 00:06:10,820
+to a random 1 by
+为任意一个1乘11的矩阵
+
+174
+00:06:10,920 --> 00:06:13,430
+11 matrix, you can do so using this piece of code here.
+可以使用这里的这段代码
+
+175
+00:06:15,910 --> 00:06:17,460
+So, to summarize, to
+总结来说
+
+176
+00:06:17,660 --> 00:06:18,910
+train a neural network, what you
+为了训练神经网络
+
+177
+00:06:19,060 --> 00:06:20,850
+should do is randomly initialize the
+应该对权重进行随机初始化
+
+178
+00:06:20,930 --> 00:06:21,810
+weights to, you know, small
+初始化为
+
+179
+00:06:22,120 --> 00:06:23,370
+values close to 0, between
+负ε到正ε间
+
+180
+00:06:23,740 --> 00:06:24,740
+minus epsilon and plus epsilon,
+接近于0的小数
+
+181
+00:06:25,160 --> 00:06:27,150
+say, and then implement
+然后进行反向传播
+
+182
+00:06:27,620 --> 00:06:29,330
+back-propagation; do gradient checking;
+执行梯度检查
+
+183
+00:06:30,220 --> 00:06:31,300
+and use either gradient
+使用梯度下降
+
+184
+00:06:31,660 --> 00:06:32,620
+descent or one of the
+或者
+
+185
+00:06:32,880 --> 00:06:34,860
+advanced optimization algorithms to try
+使用先进的优化算法
+
+186
+00:06:35,100 --> 00:06:36,250
+to minimize J of theta
+试着使J最小
+
+187
+00:06:36,790 --> 00:06:37,860
+as a function of the
+作为变量θ的功能
+
+188
+00:06:38,050 --> 00:06:39,610
+parameters theta starting from just
+从任意选择变量的初始值
+
+189
+00:06:39,890 --> 00:06:41,900
+randomly chosen initial value for the parameters.
+开始
+
+190
+00:06:42,970 --> 00:06:45,440
+And by doing symmetry breaking, which is this process.
+通过打破对称性这一进程
+
+191
+00:06:46,000 --> 00:06:47,110
+Hopefully, gradient descent or the
+希望梯度下降或是
+
+192
+00:06:47,580 --> 00:06:48,820
+advanced optimization algorithms will be
+先进的优化算法
+
+193
+00:06:48,980 --> 00:06:50,710
+able to find a good value of theta.
+可以找到θ的最优值
+
diff --git a/srt/9 - 7 - Putting It Together (14 min).srt b/srt/9 - 7 - Putting It Together (14 min).srt
new file mode 100644
index 00000000..ff6beb1f
--- /dev/null
+++ b/srt/9 - 7 - Putting It Together (14 min).srt
@@ -0,0 +1,1981 @@
+1
+00:00:00,240 --> 00:00:01,560
+So, it's taken us a
+我们已经用了
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,700 --> 00:00:02,690
+lot of videos to get through
+几节视频的内容
+
+3
+00:00:03,120 --> 00:00:04,480
+the neural network learning algorithm.
+来介绍神经网络算法
+
+4
+00:00:05,620 --> 00:00:06,640
+In this video, what I'd like
+在这段视频中
+
+5
+00:00:06,800 --> 00:00:08,090
+to do is try to
+我想结合我们所讲的
+
+6
+00:00:08,350 --> 00:00:10,040
+put all the pieces together, to
+所有这些内容
+
+7
+00:00:10,370 --> 00:00:12,120
+give a overall summary or
+来做一个总体的回顾
+
+8
+00:00:12,360 --> 00:00:13,410
+a bigger picture view, of how
+看看这些零散的内容
+
+9
+00:00:13,650 --> 00:00:15,290
+all the pieces fit together and
+相互之间有怎样的联系
+
+10
+00:00:15,530 --> 00:00:16,990
+of the overall process of how
+以及神经网络学习算法的
+
+11
+00:00:17,260 --> 00:00:18,830
+to implement a neural network learning algorithm.
+总体实现过程
+
+12
+00:00:21,870 --> 00:00:23,210
+When training a neural network, the
+当我们在训练一个神经网络时
+
+13
+00:00:23,280 --> 00:00:24,290
+first thing you need to do
+我们要做的第一件事
+
+14
+00:00:24,400 --> 00:00:25,920
+is pick some network architecture
+就是搭建网络的大体框架
+
+15
+00:00:26,680 --> 00:00:27,950
+and by architecture I just
+这里我说的框架 意思是
+
+16
+00:00:28,200 --> 00:00:30,510
+mean connectivity pattern between the neurons.
+神经元之间的连接模式
+
+17
+00:00:31,080 --> 00:00:31,840
+So, you know, we might choose
+我们可能会从以下几种结构中选择
+
+18
+00:00:32,700 --> 00:00:33,770
+between say, a neural network
+第一种神经网络的结构是
+
+19
+00:00:34,230 --> 00:00:35,440
+with three input units
+包含三个输入单元
+
+20
+00:00:35,960 --> 00:00:37,400
+and five hidden units and
+五个隐藏单元
+
+21
+00:00:37,500 --> 00:00:39,560
+four output units versus one
+和四个输出单元
+
+22
+00:00:39,800 --> 00:00:41,460
+of 3, 5 hidden, 5
+第二种结构是 三个输入单元作为输入层
+
+23
+00:00:41,700 --> 00:00:43,430
+hidden, 4 output and
+两组五个隐藏单元作为隐藏层 四个输出单元的输出层
+
+24
+00:00:43,910 --> 00:00:45,220
+here are 3, 5,
+然后第三种是3 5 5 5
+
+25
+00:00:45,550 --> 00:00:47,060
+5, 5 units in each
+其中每个隐藏层包含五个单元
+
+26
+00:00:47,320 --> 00:00:48,870
+of three hidden layers and four
+然后是四个输出单元
+
+27
+00:00:49,120 --> 00:00:50,250
+open units, and so these
+这些就是可能选择的结构
+
+28
+00:00:50,430 --> 00:00:52,000
+choices of how many hidden
+每一层可以选择
+
+29
+00:00:52,270 --> 00:00:53,410
+units in each layer
+多少个隐藏单元
+
+30
+00:00:53,810 --> 00:00:55,560
+and how many hidden layers, those
+以及可以选择多少个隐藏层
+
+31
+00:00:55,780 --> 00:00:57,580
+are architecture choices.
+这些都是你构建时的选择
+
+32
+00:00:57,910 --> 00:00:58,680
+So, how do you make these choices?
+那么我们该如何做出选择呢?
+
+33
+00:00:59,710 --> 00:01:01,270
+Well first, the number
+首先 我们知道
+
+34
+00:01:01,530 --> 00:01:03,840
+of input units well that's pretty well defined.
+我们已经定义了输入单元的数量
+
+35
+00:01:04,680 --> 00:01:05,960
+And once you decides on the fix
+一旦你确定了特征集x
+
+36
+00:01:06,580 --> 00:01:07,870
+set of features x the
+对应的输入单元数目
+
+37
+00:01:08,080 --> 00:01:09,420
+number of input units will just be, you know, the
+也就确定了
+
+38
+00:01:10,140 --> 00:01:12,180
+dimension of your features x(i)
+也就是等于特征x{i}的维度
+
+39
+00:01:12,330 --> 00:01:14,470
+would be determined by that.
+输入单元数目将会由此确定
+
+40
+00:01:14,760 --> 00:01:15,970
+And if you are doing multiclass
+如果你正在进行
+
+41
+00:01:16,210 --> 00:01:17,370
+classifications the number of
+多类别分类
+
+42
+00:01:17,520 --> 00:01:18,320
+output of this will be
+那么输出层的单元数目
+
+43
+00:01:18,420 --> 00:01:19,720
+determined by the number
+将会由你分类问题中
+
+44
+00:01:20,060 --> 00:01:22,860
+of classes in your classification problem.
+所要区分的类别个数确定
+
+45
+00:01:23,260 --> 00:01:24,890
+And just a reminder if you have
+值得提醒的是
+
+46
+00:01:25,160 --> 00:01:27,290
+a multiclass classification where y
+如果你的多元分类问题
+
+47
+00:01:27,570 --> 00:01:28,970
+takes on say values between
+y的取值范围
+
+48
+00:01:30,040 --> 00:01:31,350
+1 and 10, so that
+是在1到10之间
+
+49
+00:01:31,470 --> 00:01:33,560
+you have ten possible classes.
+那么你就有10个可能的分类
+
+50
+00:01:34,690 --> 00:01:37,200
+Then remember to right, your
+别忘了把你的y
+
+51
+00:01:37,820 --> 00:01:39,340
+output y as these were the vectors.
+重新写成向量的形式
+
+52
+00:01:40,130 --> 00:01:41,560
+So instead of clause one, you
+所以现在我们的y不是一个数了
+
+53
+00:01:41,730 --> 00:01:42,840
+recode it as a vector
+我们重新把y写成
+
+54
+00:01:43,150 --> 00:01:44,600
+like that, or for
+这种形式的向量
+
+55
+00:01:44,670 --> 00:01:47,280
+the second class you recode it as a vector like that.
+第二个分类我们可以写成这样的向量
+
+56
+00:01:48,130 --> 00:01:49,080
+So if one of these
+所以 比如说
+
+57
+00:01:49,210 --> 00:01:51,000
+apples takes on
+如果要表达
+
+58
+00:01:51,140 --> 00:01:53,910
+the fifth class, you know, y equals 5, then
+第五个分类 也就是说y等于5
+
+59
+00:01:54,120 --> 00:01:55,130
+what you're showing to your neural
+那么在你的神经网络中
+
+60
+00:01:55,380 --> 00:01:56,840
+network is not actually a value
+就不能直接用
+
+61
+00:01:57,250 --> 00:01:59,520
+of y equals 5, instead here
+数值5来表达
+
+62
+00:02:00,030 --> 00:02:00,950
+at the upper layer which would
+因为这里的输出层
+
+63
+00:02:01,280 --> 00:02:02,650
+have ten output units, you
+有十个输出单元
+
+64
+00:02:02,740 --> 00:02:03,920
+will instead feed to the
+你应该用一个向量
+
+65
+00:02:04,070 --> 00:02:05,710
+vector which you know
+来表示
+
+66
+00:02:07,470 --> 00:02:08,430
+with one in the fifth
+这个向量的第五个位置值是1
+
+67
+00:02:08,770 --> 00:02:11,050
+position and a bunch of zeros down here.
+其它的都是0
+
+68
+00:02:11,420 --> 00:02:12,470
+So the choice of number
+所以对于输入单元
+
+69
+00:02:12,890 --> 00:02:14,330
+of input units and number of output units
+和输出单元数目的选择
+
+70
+00:02:14,970 --> 00:02:16,600
+is maybe somewhat reasonably straightforward.
+还是比较容易理解的
+
+71
+00:02:18,000 --> 00:02:18,950
+And as for the number
+那么对于隐藏层
+
+72
+00:02:19,410 --> 00:02:21,040
+of hidden units and the
+单元的个数
+
+73
+00:02:21,140 --> 00:02:23,110
+number of hidden layers, a
+以及隐藏层的数目
+
+74
+00:02:23,210 --> 00:02:24,350
+reasonable default is to
+我们有一个默认的规则
+
+75
+00:02:24,540 --> 00:02:26,010
+use a single hidden layer
+那就是只使用单个隐藏层
+
+76
+00:02:26,660 --> 00:02:28,040
+and so this type of
+所以最左边所示的
+
+77
+00:02:28,880 --> 00:02:30,400
+neural network shown on the left with
+这种只有一个隐藏层的神经网络
+
+78
+00:02:30,580 --> 00:02:33,270
+just one hidden layer is probably the most common.
+一般来说是最普遍的
+
+79
+00:02:34,490 --> 00:02:35,870
+Or if you use more
+或者如果你使用
+
+80
+00:02:36,140 --> 00:02:38,410
+than one hidden layer, again the
+不止一个隐藏层的话
+
+81
+00:02:38,670 --> 00:02:39,600
+reasonable default will be to
+同样我们也有一个默认规则
+
+82
+00:02:39,760 --> 00:02:40,950
+have the same number of
+那就是每一个隐藏层
+
+83
+00:02:41,130 --> 00:02:42,560
+hidden units in every single layer.
+通常都应有相同的单元数
+
+84
+00:02:42,810 --> 00:02:44,600
+So here we have two
+所以对于这个结构
+
+85
+00:02:45,020 --> 00:02:46,370
+hidden layers and each
+我们有两个隐藏层
+
+86
+00:02:46,610 --> 00:02:47,650
+of these hidden layers have the
+每个隐藏层都有相同的单元数
+
+87
+00:02:47,860 --> 00:02:49,500
+same number five of hidden
+都是5个隐藏单元
+
+88
+00:02:49,790 --> 00:02:50,740
+units and here we have, you know,
+这里也是一样
+
+89
+00:02:51,600 --> 00:02:53,020
+three hidden layers and
+我们有三个隐藏层
+
+90
+00:02:53,170 --> 00:02:54,790
+each of them has the same
+每个隐藏层有相同的单元数
+
+91
+00:02:54,980 --> 00:02:56,400
+number, that is five hidden units.
+都是5个隐藏单元
+
+92
+00:02:57,440 --> 00:02:59,440
+Rather than doing this sort
+但实际上通常来说
+
+93
+00:02:59,740 --> 00:03:02,850
+of network architecture on the left would be a perfect ably reasonable default.
+左边这个结构是较为合理的默认结构
+
+94
+00:03:04,020 --> 00:03:04,780
+And as for the number
+而对于隐藏单元的个数
+
+95
+00:03:05,120 --> 00:03:07,040
+of hidden units - usually, the
+通常情况下
+
+96
+00:03:07,120 --> 00:03:08,100
+more hidden units the better;
+隐藏单元越多越好
+
+97
+00:03:08,560 --> 00:03:09,640
+it's just that if you have
+不过 我们需要注意的是
+
+98
+00:03:09,900 --> 00:03:11,110
+a lot of hidden units, it
+如果有大量隐藏单元
+
+99
+00:03:11,330 --> 00:03:13,150
+can become more computationally expensive, but
+计算量一般会比较大
+
+100
+00:03:13,300 --> 00:03:15,850
+very often, having more hidden units is a good thing.
+当然 一般来说隐藏单元还是越多越好
+
+101
+00:03:17,250 --> 00:03:18,560
+And usually the number of hidden
+并且一般来说 每个隐藏层
+
+102
+00:03:18,720 --> 00:03:20,820
+units in each layer will be maybe
+所包含的单元数量
+
+103
+00:03:21,080 --> 00:03:22,130
+comparable to the dimension
+还应该和输入x
+
+104
+00:03:22,490 --> 00:03:23,670
+of x, comparable to the
+的维度相匹配
+
+105
+00:03:23,810 --> 00:03:24,950
+number of features, or it could
+也要和特征的数目匹配
+
+106
+00:03:25,140 --> 00:03:26,880
+be any where from same number
+可能隐藏单元的数目
+
+107
+00:03:27,180 --> 00:03:29,590
+of hidden units of input features to
+和输入特征的数量相同
+
+108
+00:03:29,770 --> 00:03:32,430
+maybe so that three or four times of that.
+或者是它的二倍 或者三倍 四倍
+
+109
+00:03:32,680 --> 00:03:34,770
+So having the number of hidden units is comparable.
+因此 隐藏单元的数目需要和其他参数相匹配
+
+110
+00:03:35,140 --> 00:03:36,350
+You know, several times, or
+一般来说
+
+111
+00:03:36,410 --> 00:03:37,380
+some what bigger than the number
+隐藏单元的数目取为稍大于
+
+112
+00:03:37,430 --> 00:03:38,750
+of input features is often
+输入特征数目
+
+113
+00:03:39,280 --> 00:03:41,320
+a useful thing to do So,
+都是可以接受的
+
+114
+00:03:42,150 --> 00:03:43,490
+hopefully this gives you one
+希望这些能够给你
+
+115
+00:03:43,810 --> 00:03:45,140
+reasonable set of default choices
+在选择神经网络结构时
+
+116
+00:03:45,650 --> 00:03:47,770
+for neural architecture and and
+提供一些有用的建议和选择的参考
+
+117
+00:03:48,200 --> 00:03:49,460
+if you follow these guidelines, you
+如果你遵循了这些建议
+
+118
+00:03:49,540 --> 00:03:50,580
+will probably get something that works
+你一般会得到比较好的模型结构
+
+119
+00:03:50,930 --> 00:03:52,180
+well, but in a
+但是
+
+120
+00:03:52,360 --> 00:03:53,770
+later set of videos where
+在以后的一系列视频中
+
+121
+00:03:54,050 --> 00:03:55,270
+I will talk specifically about
+特别是在我谈到
+
+122
+00:03:55,580 --> 00:03:56,900
+advice for how to apply
+学习算法的应用时
+
+123
+00:03:57,410 --> 00:03:58,770
+algorithms, I will actually
+我还会更详细地介绍
+
+124
+00:03:58,840 --> 00:04:01,880
+say a lot more about how to choose a neural network architecture.
+如何选择神经网络的结构
+
+125
+00:04:02,540 --> 00:04:03,920
+Or actually have quite
+后面的视频中
+
+126
+00:04:03,970 --> 00:04:04,960
+a lot I want to
+我还会着重介绍
+
+127
+00:04:04,960 --> 00:04:06,180
+say later to make good choices
+怎样正确地选择隐藏层的个数
+
+128
+00:04:06,710 --> 00:04:08,780
+for the number of hidden units, the number of hidden layers, and so on.
+以及隐藏单元的数目 等等
+
+129
+00:04:10,620 --> 00:04:12,310
+Next, here's what we
+下面我们就来具体介绍
+
+130
+00:04:12,420 --> 00:04:13,740
+need to implement in order to
+如何实现神经网络的
+
+131
+00:04:13,860 --> 00:04:15,360
+trade in neural network, there are
+训练过程
+
+132
+00:04:15,510 --> 00:04:16,820
+actually six steps that I
+这里一共有六个步骤
+
+133
+00:04:17,080 --> 00:04:18,030
+have; I have four on this
+这页幻灯片中罗列了前四步
+
+134
+00:04:18,160 --> 00:04:19,100
+slide and two more steps
+剩下的两步
+
+135
+00:04:19,380 --> 00:04:21,480
+on the next slide.
+放在下一张幻灯片中
+
+136
+00:04:21,620 --> 00:04:22,220
+First step is to set up the neural
+首先 第一步是构建一个
+
+137
+00:04:22,430 --> 00:04:23,510
+network and to randomly
+神经网络
+
+138
+00:04:24,080 --> 00:04:25,570
+initialize the values of the weights.
+然后随机初始化权值
+
+139
+00:04:25,790 --> 00:04:27,000
+And we usually initialize the
+通常我们把权值
+
+140
+00:04:27,080 --> 00:04:29,710
+weights to small values near zero.
+初始化为很小的值 接近于零
+
+141
+00:04:31,100 --> 00:04:33,120
+Then we implement forward propagation
+然后我们执行前向传播算法
+
+142
+00:04:34,080 --> 00:04:35,060
+so that we can input
+也就是 对于该神经网络的
+
+143
+00:04:35,480 --> 00:04:37,150
+any excellent neural network and
+任意一个输入x(i)
+
+144
+00:04:37,490 --> 00:04:38,860
+compute h of x which is this
+计算出对应的h(x)值
+
+145
+00:04:39,070 --> 00:04:40,820
+output vector of the y values.
+也就是一个输出值y的向量
+
+146
+00:04:44,260 --> 00:04:45,910
+We then also implement code to
+接下来我们通过代码
+
+147
+00:04:46,010 --> 00:04:47,500
+compute this cost function j of theta.
+计算出代价函数J(θ)
+
+148
+00:04:49,770 --> 00:04:51,160
+And next we implement
+然后我们执行
+
+149
+00:04:52,120 --> 00:04:53,330
+back-prop, or the back-propagation
+反向传播算法
+
+150
+00:04:54,400 --> 00:04:55,680
+algorithm, to compute these
+来算出这些偏导数 或偏微分项
+
+151
+00:04:55,910 --> 00:04:58,000
+partial derivatives terms, partial
+也就是
+
+152
+00:04:58,440 --> 00:04:59,830
+derivatives of j of theta
+J(θ)关于参数θ的偏微分
+
+153
+00:05:00,340 --> 00:05:04,240
+with respect to the parameters. Concretely, to implement back prop.
+具体来说
+
+154
+00:05:04,960 --> 00:05:05,880
+Usually we will do that
+我们要对所有训练集数据
+
+155
+00:05:06,250 --> 00:05:08,460
+with a fore loop over the training examples.
+使用一个for循环进行遍历
+
+156
+00:05:09,700 --> 00:05:10,650
+Some of you may have heard of
+可能有部分同学之前听说过
+
+157
+00:05:10,830 --> 00:05:12,640
+advanced, and frankly very
+一些比较先进的分解方法
+
+158
+00:05:12,940 --> 00:05:14,500
+advanced factorization methods where you
+可能不需要像这里一样使用
+
+159
+00:05:14,670 --> 00:05:15,720
+don't have a four-loop over
+for循环来对所有
+
+160
+00:05:16,570 --> 00:05:18,580
+the m-training examples, that the
+m个训练样本进行遍历
+
+161
+00:05:18,660 --> 00:05:19,900
+first time you're implementing back prop
+但是 这是你第一次进行反向传播算法
+
+162
+00:05:20,250 --> 00:05:21,420
+there should almost certainly the four
+所以我建议你最好还是
+
+163
+00:05:21,420 --> 00:05:22,980
+loop in your code,
+使用一个for循环来完成程序
+
+164
+00:05:23,800 --> 00:05:25,010
+where you're iterating over the examples,
+对每一个训练样本进行迭代
+
+165
+00:05:25,810 --> 00:05:27,760
+you know, x1, y1, then so
+从x(1) y(1)开始
+
+166
+00:05:28,030 --> 00:05:29,510
+you do forward prop and
+我们对第一个样本进行
+
+167
+00:05:29,640 --> 00:05:30,400
+back prop on the first
+前向传播运算和反向传播运算
+
+168
+00:05:30,850 --> 00:05:32,510
+example, and then in
+然后在第二次循环中
+
+169
+00:05:32,710 --> 00:05:33,730
+the second iteration of the
+同样地对第二个样本
+
+170
+00:05:33,780 --> 00:05:35,360
+four-loop, you do forward propagation
+执行前向传播和反向传播算法
+
+171
+00:05:35,980 --> 00:05:38,050
+and back propagation on the second example, and so on.
+以此类推
+
+172
+00:05:38,170 --> 00:05:40,900
+Until you get through the final example.
+直到最后一个样本
+
+173
+00:05:41,680 --> 00:05:43,110
+So there should be
+因此 在你第一次做反向传播的时候
+
+174
+00:05:43,230 --> 00:05:44,250
+a four-loop in your implementation
+你还是应该用这样的for循环
+
+175
+00:05:45,050 --> 00:05:47,180
+of back prop, at least the first time implementing it.
+来实现这个过程
+
+176
+00:05:48,120 --> 00:05:49,160
+And then there are frankly
+其实实际上
+
+177
+00:05:49,390 --> 00:05:50,520
+somewhat complicated ways to do
+有复杂的方法可以实现
+
+178
+00:05:50,890 --> 00:05:52,660
+this without a four-loop, but
+并不一定要使用for循环
+
+179
+00:05:52,810 --> 00:05:53,950
+I definitely do not recommend
+但我非常不推荐
+
+180
+00:05:54,360 --> 00:05:55,340
+trying to do that much more
+在第一次实现反向传播算法的时候
+
+181
+00:05:55,660 --> 00:05:58,420
+complicated version the first time you try to implement back prop.
+使用更复杂更高级的方法
+
+182
+00:05:59,850 --> 00:06:00,920
+So concretely, we have a
+所以具体来讲 我们对所有的
+
+183
+00:06:01,010 --> 00:06:02,200
+four-loop over my m-training examples
+m个训练样本上使用了for循环遍历
+
+184
+00:06:03,240 --> 00:06:04,630
+and inside the four-loop we're
+在这个for循环里
+
+185
+00:06:04,770 --> 00:06:06,300
+going to perform fore prop
+我们对每个样本执行
+
+186
+00:06:06,580 --> 00:06:08,090
+and back prop using just this one example.
+前向和反向算法
+
+187
+00:06:09,310 --> 00:06:10,320
+And what that means is that
+具体来说就是
+
+188
+00:06:10,560 --> 00:06:12,470
+we're going to take x(i), and
+我们把x(i)
+
+189
+00:06:12,690 --> 00:06:14,010
+feed that to my input layer,
+传到输入层
+
+190
+00:06:14,770 --> 00:06:16,370
+perform forward-prop, perform back-prop
+然后执行前向传播和反向传播
+
+191
+00:06:17,370 --> 00:06:18,360
+and that will if all of
+这样我们就能得到
+
+192
+00:06:18,430 --> 00:06:19,840
+these activations and all of
+该神经网络中
+
+193
+00:06:19,930 --> 00:06:22,090
+these delta terms for all
+每一层中每一个单元对应的
+
+194
+00:06:22,300 --> 00:06:23,440
+of the layers of all my
+所有这些激励值a(l)
+
+195
+00:06:23,770 --> 00:06:24,720
+units in the neural
+和delta项
+
+196
+00:06:24,950 --> 00:06:27,170
+network then still
+接下来
+
+197
+00:06:27,610 --> 00:06:28,760
+inside this four-loop, let
+还是在for循环中
+
+198
+00:06:29,180 --> 00:06:30,450
+me draw some curly braces
+让我画一个大括号
+
+199
+00:06:30,940 --> 00:06:31,950
+just to show the scope with
+来标明这个
+
+200
+00:06:32,030 --> 00:06:32,930
+the four-loop, this is in
+for循环的范围
+
+201
+00:06:34,160 --> 00:06:35,480
+octave code of course, but it's more a sequence Java
+当然这些是octave的代码
+
+202
+00:06:36,190 --> 00:06:38,350
+code, and a four-loop encompasses all this.
+括号里是for循环的循环体
+
+203
+00:06:39,060 --> 00:06:40,060
+We're going to compute those delta
+我们要计算出这些delta值
+
+204
+00:06:40,480 --> 00:06:43,690
+terms, which are is the formula that we gave earlier.
+也就是用我们之前给出的公式
+
+205
+00:06:45,540 --> 00:06:47,370
+Plus, you know, delta l plus one times
+加上 delta(l+1)
+
+206
+00:06:48,630 --> 00:06:51,150
+a, l transpose of the code.
+a(l)的转置矩阵
+
+207
+00:06:51,490 --> 00:06:53,540
+And then finally, outside the
+最后 外面的部分
+
+208
+00:06:54,180 --> 00:06:55,630
+having computed these delta
+计算出的这些delta值
+
+209
+00:06:55,970 --> 00:06:57,550
+terms, these accumulation terms, we
+这些累加项
+
+210
+00:06:57,870 --> 00:06:59,050
+would then have some other
+我们将用别的程序
+
+211
+00:06:59,170 --> 00:07:00,430
+code and then that will
+来计算出
+
+212
+00:07:00,720 --> 00:07:03,240
+allow us to compute these partial derivative terms.
+这些偏导数项
+
+213
+00:07:03,860 --> 00:07:05,450
+Right and these partial derivative
+那么这些偏导数项
+
+214
+00:07:05,970 --> 00:07:07,020
+terms have to take
+也应该考虑使用
+
+215
+00:07:07,210 --> 00:07:10,270
+into account the regularization term lambda as well.
+正则化项lambda值
+
+216
+00:07:11,050 --> 00:07:13,240
+And so, those formulas were given in the earlier video.
+这些公式在前面的视频中已经给出
+
+217
+00:07:14,830 --> 00:07:15,720
+So, how do you done that
+那么 搞定所有这些内容
+
+218
+00:07:16,680 --> 00:07:18,080
+you now hopefully have code to
+现在你就应该已经得到了
+
+219
+00:07:18,180 --> 00:07:20,050
+compute these partial derivative terms.
+计算这些偏导数项的程序了
+
+220
+00:07:21,190 --> 00:07:23,030
+Next is step five, what I
+下面就是第五步了
+
+221
+00:07:23,240 --> 00:07:24,420
+do is then use gradient
+我要做的就是使用梯度检查
+
+222
+00:07:24,730 --> 00:07:26,700
+checking to compare these partial
+来比较这些
+
+223
+00:07:27,120 --> 00:07:28,530
+derivative terms that were computed. So, I've
+已经计算得到的偏导数项
+
+224
+00:07:29,420 --> 00:07:30,980
+compared the versions computed using
+把用反向传播算法
+
+225
+00:07:31,270 --> 00:07:33,990
+back propagation versus the
+得到的偏导数值
+
+226
+00:07:34,430 --> 00:07:36,470
+partial derivatives computed using the numerical
+与用数值方法得到的
+
+227
+00:07:37,710 --> 00:07:39,850
+estimates as using numerical estimates of the derivatives.
+估计值进行比较
+
+228
+00:07:40,350 --> 00:07:41,810
+So, I do gradient checking to make
+因此 通过进行梯度检查来
+
+229
+00:07:41,970 --> 00:07:44,340
+sure that both of these give you very similar values.
+确保两种方法得到基本接近的两个值
+
+230
+00:07:45,830 --> 00:07:47,410
+Having done gradient checking just now reassures
+通过梯度检查我们能确保
+
+231
+00:07:47,910 --> 00:07:49,280
+us that our implementation of back
+我们的反向传播算法
+
+232
+00:07:49,590 --> 00:07:51,470
+propagation is correct, and is
+得到的结果是正确的
+
+233
+00:07:51,610 --> 00:07:52,850
+then very important that we disable
+但必须要说明的一点是
+
+234
+00:07:53,530 --> 00:07:54,710
+gradient checking, because the gradient
+我们需要去掉梯度检查的代码
+
+235
+00:07:55,080 --> 00:07:57,150
+checking code is computationally very slow.
+因为梯度检查的计算非常慢
+
+236
+00:07:59,020 --> 00:08:00,880
+And finally, we then
+最后 我们就可以
+
+237
+00:08:01,120 --> 00:08:03,280
+use an optimization algorithm such
+使用一个最优化算法
+
+238
+00:08:03,510 --> 00:08:04,940
+as gradient descent, or one of
+比如说梯度下降算法
+
+239
+00:08:04,960 --> 00:08:07,520
+the advanced optimization methods such
+或者说是更加高级的优化方法
+
+240
+00:08:07,740 --> 00:08:10,020
+as LB of GS, contract gradient has
+比如说BFGS算法 共轭梯度法
+
+241
+00:08:10,250 --> 00:08:13,120
+embodied into fminunc or other optimization methods.
+或者其他一些已经内置到fminunc函数中的方法
+
+242
+00:08:13,940 --> 00:08:15,500
+We use these together with
+将所有这些优化方法
+
+243
+00:08:15,730 --> 00:08:17,380
+back propagation, so back
+和反向传播算法相结合
+
+244
+00:08:17,620 --> 00:08:18,670
+propagation is the thing
+这样我们就能计算出
+
+245
+00:08:18,770 --> 00:08:20,640
+that computes these partial derivatives for us.
+这些偏导数项的值
+
+246
+00:08:21,730 --> 00:08:22,680
+And so, we know how to
+到现在 我们已经知道了
+
+247
+00:08:22,860 --> 00:08:24,020
+compute the cost function, we know
+如何去计算代价函数
+
+248
+00:08:24,100 --> 00:08:25,550
+how to compute the partial derivatives using
+我们知道了如何使用
+
+249
+00:08:25,830 --> 00:08:27,410
+back propagation, so we
+反向传播算法来计算偏导数
+
+250
+00:08:27,480 --> 00:08:28,830
+can use one of these optimization methods
+那么 我们就能使用某个最优化方法
+
+251
+00:08:29,580 --> 00:08:30,850
+to try to minimize j of
+来最小化关于theta的函数值
+
+252
+00:08:31,130 --> 00:08:33,500
+theta as a function of the parameters theta.
+代价函数J(θ)
+
+253
+00:08:34,330 --> 00:08:35,410
+And by the way, for
+另外顺便提一下
+
+254
+00:08:35,660 --> 00:08:37,330
+neural networks, this cost function
+对于神经网络 代价函数
+
+255
+00:08:38,300 --> 00:08:39,630
+j of theta is non-convex,
+J(θ)是一个非凸函数
+
+256
+00:08:40,530 --> 00:08:42,490
+or is not convex and so
+就是说不是凸函数
+
+257
+00:08:43,260 --> 00:08:45,600
+it can theoretically be susceptible
+因此理论上是能够停留在
+
+258
+00:08:46,250 --> 00:08:47,480
+to local minima, and in
+局部最小值的位置
+
+259
+00:08:47,650 --> 00:08:49,580
+fact algorithms like gradient descent and
+实际上 梯度下降算法
+
+260
+00:08:49,840 --> 00:08:51,950
+the advance optimization methods can,
+和其他一些高级优化方法
+
+261
+00:08:52,400 --> 00:08:53,660
+in theory, get stuck in local
+理论上都能收敛于局部最小值
+
+262
+00:08:55,190 --> 00:08:56,300
+optima, but it turns out
+但一般来讲
+
+263
+00:08:56,480 --> 00:08:57,680
+that in practice this is
+这个问题其实
+
+264
+00:08:57,870 --> 00:08:59,230
+not usually a huge problem
+并不是什么要紧的事
+
+265
+00:08:59,560 --> 00:09:00,800
+and even though we can't guarantee
+尽管我们不能保证
+
+266
+00:09:01,210 --> 00:09:02,320
+that these algorithms will find a
+这些优化算法一定会得到
+
+267
+00:09:02,510 --> 00:09:04,260
+global optimum, usually algorithms like
+全局最优值 但通常来讲
+
+268
+00:09:04,390 --> 00:09:05,870
+gradient descent will do a
+像梯度下降这类的算法
+
+269
+00:09:05,930 --> 00:09:07,700
+very good job minimizing this
+在最小化代价函数
+
+270
+00:09:07,850 --> 00:09:09,230
+cost function j of
+J(θ)的过程中
+
+271
+00:09:09,280 --> 00:09:10,350
+theta and get a
+还是表现得很不错的
+
+272
+00:09:10,420 --> 00:09:11,820
+very good local minimum, even
+通常能够得到一个很小的局部最小值
+
+273
+00:09:12,060 --> 00:09:13,690
+if it doesn't get to the global optimum.
+尽管这可能不一定是全局最优值
+
+274
+00:09:14,500 --> 00:09:16,950
+Finally, gradient descents for
+最后 梯度下降算法
+
+275
+00:09:17,230 --> 00:09:19,500
+a neural network might still seem a little bit magical.
+似乎对于神经网络来说还是比较神秘
+
+276
+00:09:20,170 --> 00:09:21,680
+So, let me just show one
+希望下面这幅图
+
+277
+00:09:21,890 --> 00:09:22,990
+more figure to try to get
+能让你对梯度下降法在神经网络中的应用
+
+278
+00:09:23,170 --> 00:09:25,660
+that intuition about what gradient descent for a neural network is doing.
+产生一个更直观的理解
+
+279
+00:09:27,020 --> 00:09:28,460
+This was actually similar to the
+这实际上有点类似
+
+280
+00:09:28,590 --> 00:09:31,190
+figure that I was using earlier to explain gradient descent.
+我们早先时候解释梯度下降时的思路
+
+281
+00:09:31,730 --> 00:09:32,750
+So, we have some cost
+我们有某个代价函数
+
+282
+00:09:33,090 --> 00:09:34,480
+function, and we have
+并且在我们的神经网络中
+
+283
+00:09:34,710 --> 00:09:36,590
+a number of parameters in our neural network. Right
+有一系列参数值
+
+284
+00:09:36,810 --> 00:09:39,190
+here I've just written down two of the parameter values.
+这里我只写下了两个参数值
+
+285
+00:09:40,080 --> 00:09:41,250
+In reality, of course, in
+当然实际上
+
+286
+00:09:41,520 --> 00:09:43,570
+the neural network, we can have lots of parameters with these.
+在神经网络里 我们可以有很多的参数值
+
+287
+00:09:44,190 --> 00:09:46,980
+Theta one, theta two--all of these are matrices, right?
+theta1 theta2 等等 所有的这些都是矩阵 是吧
+
+288
+00:09:47,030 --> 00:09:48,130
+So we can have very high dimensional
+因此我们参数的维度就会很高了
+
+289
+00:09:48,580 --> 00:09:49,870
+parameters but because of
+由于绘图所限 我们不能绘出
+
+290
+00:09:49,960 --> 00:09:51,620
+the limitations the source of
+更高维度情况的图像
+
+291
+00:09:51,790 --> 00:09:52,970
+parts we can draw. I'm pretending
+所以这里我们假设
+
+292
+00:09:53,410 --> 00:09:55,840
+that we have only two parameters in this neural network.
+这个神经网络中只有两个参数值
+
+293
+00:09:56,270 --> 00:09:56,890
+Although obviously we have a lot more in practice.
+实际上应该有更多参数
+
+294
+00:09:59,280 --> 00:10:00,700
+Now, this cost function j of
+那么 代价函数J(θ)
+
+295
+00:10:00,800 --> 00:10:02,470
+theta measures how well
+度量的就是这个神经网络
+
+296
+00:10:02,880 --> 00:10:04,730
+the neural network fits the training data.
+对训练数据的拟合情况
+
+297
+00:10:06,000 --> 00:10:06,920
+So, if you take a point
+所以 如果你取某个参数
+
+298
+00:10:07,120 --> 00:10:08,590
+like this one, down here,
+比如说这个 下面这点
+
+299
+00:10:10,270 --> 00:10:11,180
+that's a point where j
+在这个点上 J(θ)
+
+300
+00:10:11,460 --> 00:10:12,580
+of theta is pretty low,
+的值是非常小的
+
+301
+00:10:12,870 --> 00:10:16,170
+and so this corresponds to a setting of the parameters.
+这一点的位置所对应的
+
+302
+00:10:17,020 --> 00:10:17,840
+There's a setting of the parameters
+参数theta的情况是
+
+303
+00:10:18,350 --> 00:10:19,920
+theta, where, you know, for most
+对于大部分
+
+304
+00:10:20,140 --> 00:10:22,450
+of the training examples, the output
+的训练集数据
+
+305
+00:10:24,120 --> 00:10:26,270
+of my hypothesis, that may
+我的假设函数的输出
+
+306
+00:10:26,410 --> 00:10:27,420
+be pretty close to y(i)
+会非常接近于y(i)
+
+307
+00:10:27,650 --> 00:10:28,720
+and if this is
+那么如果是这样的话
+
+308
+00:10:28,840 --> 00:10:31,560
+true than that's what causes my cost function to be pretty low.
+那么我们的代价函数值就会很小
+
+309
+00:10:32,690 --> 00:10:33,770
+Whereas in contrast, if you were
+而反过来 如果我们
+
+310
+00:10:33,820 --> 00:10:35,140
+to take a value like that, a
+取这个值
+
+311
+00:10:35,510 --> 00:10:37,260
+point like that corresponds to,
+也就是这个点对应的值
+
+312
+00:10:38,080 --> 00:10:39,260
+where for many training examples,
+那么对于大部分的训练集样本
+
+313
+00:10:39,890 --> 00:10:40,780
+the output of my neural
+该神经网络的输出
+
+314
+00:10:41,040 --> 00:10:42,860
+network is far from
+应该是远离
+
+315
+00:10:43,110 --> 00:10:44,340
+the actual value y(i)
+y(i)的实际值的
+
+316
+00:10:44,540 --> 00:10:45,850
+that was observed in the training set.
+也就是我们在训练集观测到的输出值
+
+317
+00:10:46,610 --> 00:10:47,480
+So points like this on the
+因此 像这样的点
+
+318
+00:10:47,590 --> 00:10:50,100
+line correspond to where the
+右边的这个点
+
+319
+00:10:50,450 --> 00:10:51,450
+hypothesis, where the neural
+对应的假设就是
+
+320
+00:10:51,740 --> 00:10:53,330
+network is outputting values
+神经网络的输出值
+
+321
+00:10:53,770 --> 00:10:54,810
+on the training set that are
+在这个训练集上的测试值
+
+322
+00:10:55,020 --> 00:10:56,260
+far from y(i). So, it's not
+应该是远离y(i)的
+
+323
+00:10:56,470 --> 00:10:57,970
+fitting the training set well, whereas
+因此这一点对应着对训练集拟合得不好的情况
+
+324
+00:10:58,170 --> 00:10:59,640
+points like this with low
+而像这些点
+
+325
+00:10:59,970 --> 00:11:01,300
+values of the cost function corresponds
+代价函数值很小的点
+
+326
+00:11:02,130 --> 00:11:03,380
+to where j of theta
+对应的J(θ)值
+
+327
+00:11:04,130 --> 00:11:05,270
+is low, and therefore corresponds
+是很小的 因此对应的是
+
+328
+00:11:05,950 --> 00:11:07,590
+to where the neural network happens
+神经网络对训练集数据
+
+329
+00:11:07,850 --> 00:11:09,290
+to be fitting my training set
+拟合得比较好的情况
+
+330
+00:11:09,510 --> 00:11:11,340
+well, because I mean this is what's
+我想表达的是 如果是这种情况的话
+
+331
+00:11:11,550 --> 00:11:14,070
+needed to be true in order for j of theta to be small.
+那么J(θ)的值应该是比较小的
+
+332
+00:11:15,480 --> 00:11:16,810
+So what gradient descent does is
+因此梯度下降算法的原理是
+
+333
+00:11:16,870 --> 00:11:18,330
+we'll start from some random
+我们从某个随机的
+
+334
+00:11:18,730 --> 00:11:20,300
+initial point like that
+初始点开始 比如这一点
+
+335
+00:11:20,430 --> 00:11:22,990
+one over there, and it will repeatedly go downhill.
+它将会不停的往下下降
+
+336
+00:11:24,040 --> 00:11:25,400
+And so what back propagation is
+那么反向传播算法
+
+337
+00:11:25,570 --> 00:11:27,220
+doing is computing the direction
+的目的就是算出
+
+338
+00:11:27,940 --> 00:11:29,370
+of the gradient, and what
+梯度下降的方向
+
+339
+00:11:29,520 --> 00:11:30,740
+gradient descent is doing is
+而梯度下降的过程
+
+340
+00:11:31,040 --> 00:11:32,060
+it's taking little steps downhill
+就是沿着这个方向
+
+341
+00:11:32,880 --> 00:11:34,220
+until hopefully it gets to,
+一点点的下降 一直到我们希望得到的点
+
+342
+00:11:34,610 --> 00:11:36,410
+in this case, a pretty good local optimum.
+在这里我们希望找到的就是局部最优点
+
+343
+00:11:37,880 --> 00:11:39,250
+So, when you implement back
+所以 当你在执行反向传播算法
+
+344
+00:11:39,410 --> 00:11:40,840
+propagation and use gradient
+并且使用梯度下降
+
+345
+00:11:41,200 --> 00:11:42,420
+descent or one of the
+或者其他别的什么
+
+346
+00:11:42,840 --> 00:11:44,750
+advanced optimization methods, this picture
+更高级的优化方法时
+
+347
+00:11:45,330 --> 00:11:47,290
+sort of explains what the algorithm is doing.
+这幅图片很好地帮你解释了基本的原理
+
+348
+00:11:47,450 --> 00:11:48,820
+It's trying to find a value
+也就是 试图找到某个最优的参数值
+
+349
+00:11:49,260 --> 00:11:50,920
+of the parameters where the
+这个值使得
+
+350
+00:11:51,260 --> 00:11:52,180
+output values in the neural
+我们神经网络的输出值
+
+351
+00:11:52,450 --> 00:11:54,300
+network closely matches the
+与y(i)的实际值
+
+352
+00:11:54,410 --> 00:11:55,520
+values of the y(i)'s
+也就是训练集的输出观测值
+
+353
+00:11:55,660 --> 00:11:58,800
+observed in your training set.
+尽可能的接近
+
+354
+00:11:58,910 --> 00:12:00,250
+So, hopefully this gives you
+希望这节课的内容能让你对
+
+355
+00:12:00,400 --> 00:12:01,610
+a better sense of how
+这些零散的神经网络知识
+
+356
+00:12:01,920 --> 00:12:03,930
+the many different pieces of
+如何有机地结合起来
+
+357
+00:12:04,120 --> 00:12:05,760
+neural network learning fit together.
+能有一个更直观的认识
+
+358
+00:12:07,120 --> 00:12:09,010
+In case even after this video, in
+也许看完这段视频
+
+359
+00:12:09,120 --> 00:12:10,130
+case you still feel like there
+你可能还是觉得
+
+360
+00:12:10,360 --> 00:12:11,420
+are, like, a lot of different pieces
+有许多的细节
+
+361
+00:12:12,070 --> 00:12:13,450
+and it's not entirely clear what
+不能完全明白
+
+362
+00:12:13,690 --> 00:12:14,670
+some of them do or how all
+为什么这么做 或者说是这些是如何
+
+363
+00:12:14,860 --> 00:12:17,760
+of these pieces come together, that's actually okay.
+联系在一起的 没关系
+
+364
+00:12:18,790 --> 00:12:21,780
+Neural network learning and back propagation is a complicated algorithm.
+神经网络和反向传播算法本身就是非常复杂的算法
+
+365
+00:12:23,000 --> 00:12:23,960
+And even though I've seen
+尽管我已经完全理解了
+
+366
+00:12:24,290 --> 00:12:25,340
+the math behind back propagation
+反向传播算法背后的数学原理
+
+367
+00:12:25,860 --> 00:12:26,710
+for many years and I've used
+尽管我使用反向传播已经很多年了
+
+368
+00:12:27,030 --> 00:12:28,470
+back propagation, I think very
+我认为 这么多年的使用还算是成功的
+
+369
+00:12:28,680 --> 00:12:30,210
+successfully, for many years, even
+但尽管如此
+
+370
+00:12:30,380 --> 00:12:31,510
+today I still feel like I
+到现在我还是觉得
+
+371
+00:12:31,570 --> 00:12:32,670
+don't always have a great
+我自己也并不是总能
+
+372
+00:12:33,400 --> 00:12:35,610
+grasp of exactly what back propagation is doing sometimes.
+很好地理解反向传播到底在做什么
+
+373
+00:12:36,200 --> 00:12:37,850
+And what the optimization process
+以及最优化过程是如何
+
+374
+00:12:38,520 --> 00:12:41,480
+looks like of minimizing j if theta.
+使J(θ)值达到最小值的
+
+375
+00:12:41,920 --> 00:12:42,830
+Much this is a much harder algorithm
+因为这本身的确是一个很难的算法
+
+376
+00:12:43,450 --> 00:12:44,680
+to feel like I have a
+很难让你感觉到
+
+377
+00:12:44,830 --> 00:12:46,590
+much less good handle on
+自己已经完全理解
+
+378
+00:12:46,690 --> 00:12:47,690
+exactly what this is doing
+它不像线性回归
+
+379
+00:12:48,240 --> 00:12:49,360
+compared to say, linear regression or logistic regression.
+或者逻辑回归那样
+
+380
+00:12:51,390 --> 00:12:53,180
+Which were mathematically and conceptually
+数学上和概念上都很简单
+
+381
+00:12:53,510 --> 00:12:55,090
+much simpler and much cleaner algorithms.
+反向传播算法不是那样的直观
+
+382
+00:12:56,200 --> 00:12:57,030
+But so in case if you feel the
+如果你也有同感
+
+383
+00:12:57,070 --> 00:12:58,560
+same way, you know, that's actually perfectly
+那么完全不必担心
+
+384
+00:12:58,970 --> 00:13:01,010
+okay, but if you
+但如果你自己动手
+
+385
+00:13:01,170 --> 00:13:02,790
+do implement back propagation, hopefully
+完成一次反向传播算法
+
+386
+00:13:03,160 --> 00:13:04,260
+what you find is that this
+你一定会发现
+
+387
+00:13:04,460 --> 00:13:05,410
+is one of the most powerful
+这的确是一个很强大的
+
+388
+00:13:05,790 --> 00:13:08,030
+learning algorithms and if you
+学习算法 如果你
+
+389
+00:13:08,130 --> 00:13:09,510
+implement this algorithm, implement back propagation,
+执行一下这个算法 执行反向传播
+
+390
+00:13:10,250 --> 00:13:11,230
+implement one of these optimization
+执行其中的优化方法
+
+391
+00:13:11,340 --> 00:13:13,260
+methods, you find that
+你一定会发现
+
+392
+00:13:13,610 --> 00:13:14,940
+back propagation will be able
+反向传播算法能够很好的
+
+393
+00:13:15,390 --> 00:13:17,330
+to fit very complex, powerful, non-linear
+让更复杂 维度更大的 非线性的
+
+394
+00:13:17,830 --> 00:13:19,370
+functions to your data,
+函数模型跟你的数据很好地拟合
+
+395
+00:13:20,080 --> 00:13:21,060
+and this is one of the
+因此它的确是一种
+
+396
+00:13:21,190 --> 00:13:22,790
+most effective learning algorithms we have today.
+最为高效的学习算法
+
diff --git a/srt/9 - 8 - Autonomous Driving (7 min).srt b/srt/9 - 8 - Autonomous Driving (7 min).srt
new file mode 100644
index 00000000..4b13e10c
--- /dev/null
+++ b/srt/9 - 8 - Autonomous Driving (7 min).srt
@@ -0,0 +1,716 @@
+1
+00:00:00,090 --> 00:00:01,100
+In this video, I'd like to
+在这段视频中
+(字幕整理:中国海洋大学 黄海广,haiguang2000@qq.com )
+
+2
+00:00:01,200 --> 00:00:02,840
+show you a fun and historically
+我想向你介绍一个具有历史意义的
+
+3
+00:00:03,390 --> 00:00:05,820
+important example of Neural Network Learning.
+神经网络学习的重要例子
+
+4
+00:00:06,720 --> 00:00:09,300
+Of using a Neural Network for autonomous driving
+那就是使用神经网络来实现自动驾驶
+
+5
+00:00:09,870 --> 00:00:12,430
+that is getting a car to learn to drive itself.
+也就是说使汽车通过学习来自己驾驶
+
+6
+00:00:13,810 --> 00:00:14,980
+The video that I
+接下来我将演示的
+
+7
+00:00:15,130 --> 00:00:16,450
+showed a minute, was something
+这段视频
+
+8
+00:00:16,820 --> 00:00:18,290
+that I've gotten from Dean Pomilieu,
+是我从 Dean Pomerleau那里拿到的
+
+9
+00:00:18,470 --> 00:00:20,000
+who Colleague who works
+他是我的同事
+
+10
+00:00:20,260 --> 00:00:22,000
+out in Carnegie Mellon University out
+任职于美国东海岸的
+
+11
+00:00:22,140 --> 00:00:23,440
+on the east coast of the United States,
+卡耐基梅隆大学
+
+12
+00:00:24,460 --> 00:00:25,310
+and in part of the
+在这部分视频中
+
+13
+00:00:25,350 --> 00:00:27,980
+video you see visualizations like
+你就会明白可视化技术到底是什么
+
+14
+00:00:28,230 --> 00:00:29,930
+this, and what I should tell you what the visualization
+在看这段视频之前
+
+15
+00:00:30,080 --> 00:00:31,170
+looks like before starting to
+我会告诉你可视化技术是什么
+
+16
+00:00:31,260 --> 00:00:32,830
+video. Down here
+在下面
+
+17
+00:00:33,170 --> 00:00:34,860
+on the lower left is the
+也就是左下方
+
+18
+00:00:35,100 --> 00:00:36,150
+view seen by the car
+就是汽车所看到的
+
+19
+00:00:36,750 --> 00:00:37,580
+of what's in front of it
+前方的路况图像
+
+20
+00:00:37,840 --> 00:00:38,980
+and so here you know, you will kind
+在图中你依稀能看出
+
+21
+00:00:39,130 --> 00:00:40,250
+of see you know, a road that's
+一条道路
+
+22
+00:00:40,450 --> 00:00:41,390
+maybe going a bit to
+朝左延伸了一点
+
+23
+00:00:41,470 --> 00:00:42,100
+the left and going a little bit to
+又向右了一点
+
+24
+00:00:42,670 --> 00:00:45,060
+the right, and up
+然后上面的这幅图
+
+25
+00:00:45,250 --> 00:00:47,230
+here on top, this
+你可以看到一条
+
+26
+00:00:47,820 --> 00:00:49,820
+first horizontal bar shows the
+水平的菜单栏
+
+27
+00:00:49,940 --> 00:00:51,510
+direction selected by the
+显示的是驾驶操作人
+
+28
+00:00:51,750 --> 00:00:53,110
+human driver and is the
+所选择的方向
+
+29
+00:00:53,580 --> 00:00:54,630
+location of this bright
+就是这里的
+
+30
+00:00:55,180 --> 00:00:56,830
+white band that shows the
+这条白亮的区段
+
+31
+00:00:57,100 --> 00:00:58,490
+steering direction selected by the
+显示的就是
+
+32
+00:00:58,690 --> 00:01:00,450
+human driver, where, you
+人类驾驶者选择的方向
+
+33
+00:01:00,680 --> 00:01:01,780
+know, here, far to the left
+比如 最左边的区段
+
+34
+00:01:02,150 --> 00:01:03,280
+corresponds to steering hard left;
+对应的操作就是向左急转
+
+35
+00:01:03,910 --> 00:01:05,180
+here corresponds to steering hard
+而最右端则对应
+
+36
+00:01:05,450 --> 00:01:06,830
+to the right; and so
+向右急转的操作
+
+37
+00:01:06,980 --> 00:01:08,280
+this location, which is a
+因此 稍微靠左的区段
+
+38
+00:01:08,500 --> 00:01:09,390
+little bit to the left,
+也就是这里
+
+39
+00:01:09,720 --> 00:01:10,730
+a little bit left of
+中心稍微向左一点的位置
+
+40
+00:01:10,890 --> 00:01:12,120
+center, means that the human
+则表示在这一点上
+
+41
+00:01:12,280 --> 00:01:13,350
+driver, at this point, was
+人类驾驶者的操作是
+
+42
+00:01:13,520 --> 00:01:14,600
+steering slightly to the left. A
+慢慢的向左拐
+
+43
+00:01:16,020 --> 00:01:17,340
+nd this second part here
+这幅图的第二部分
+
+44
+00:01:17,880 --> 00:01:18,800
+corresponds to the steering
+对应的就是
+
+45
+00:01:19,140 --> 00:01:20,720
+direction selected by the
+学习算法选出的行驶方向
+
+46
+00:01:20,870 --> 00:01:22,020
+learning algorithm; and again, the
+并且 类似的
+
+47
+00:01:22,110 --> 00:01:23,060
+location of this sort
+这一条白亮的区段
+
+48
+00:01:23,310 --> 00:01:24,790
+of white band, means the
+显示的就是
+
+49
+00:01:24,850 --> 00:01:26,560
+neural network was here, selecting
+神经网络在这里
+
+50
+00:01:27,040 --> 00:01:28,300
+a steering direction just slightly to
+选择的行驶方向
+
+51
+00:01:28,380 --> 00:01:29,440
+the left and in fact,
+是稍微的左转 并且实际上
+
+52
+00:01:29,970 --> 00:01:30,980
+before the neural network starts
+在神经网络开始
+
+53
+00:01:31,390 --> 00:01:33,020
+learning initially, you
+学习之前
+
+54
+00:01:33,170 --> 00:01:34,990
+see that the network outputs a
+你会看到网络的输出是
+
+55
+00:01:35,170 --> 00:01:36,410
+grey band, like a
+一条灰色的区段
+
+56
+00:01:36,580 --> 00:01:38,500
+grey uniform, grey band throughout
+就像这样的一条灰色区段
+
+57
+00:01:38,890 --> 00:01:40,260
+this region, so the uniform
+覆盖着整个区域 这些均称的
+
+58
+00:01:40,740 --> 00:01:42,210
+grey fuzz corresponds to the
+灰色区域显示出
+
+59
+00:01:42,330 --> 00:01:43,880
+neural network having been randomly
+神经网络已经随机初始化了
+
+60
+00:01:44,450 --> 00:01:46,180
+initialized, and initially having
+并且初始化时
+
+61
+00:01:46,510 --> 00:01:47,960
+no idea how to
+我们并不知道 汽车如何行驶
+
+62
+00:01:48,020 --> 00:01:49,650
+drive the car, or initially having
+或者说
+
+63
+00:01:50,050 --> 00:01:52,500
+no idea what direction to steer in.
+我们并不知道所选行驶方向
+
+64
+00:01:52,590 --> 00:01:53,640
+And it's only after it's learned
+只有在学习算法运行了
+
+65
+00:01:53,860 --> 00:01:55,290
+for a while that it will then start
+足够长的时间之后
+
+66
+00:01:55,700 --> 00:01:57,520
+to output like a solid white
+才会有这条白色的区段
+
+67
+00:01:57,770 --> 00:01:58,640
+band in just a small
+出现在整条灰色区域之中
+
+68
+00:01:58,800 --> 00:02:00,260
+part of the region corresponding
+显示出一个具体的
+
+69
+00:02:00,700 --> 00:02:01,870
+to choosing a particular steering direction.
+行驶方向
+
+70
+00:02:02,960 --> 00:02:04,710
+And that corresponds to when a neural network.
+这就表示神经网络算法
+
+71
+00:02:05,340 --> 00:02:06,890
+Becomes more confident in selecting, you know,
+在这时候已经选出了一个
+
+72
+00:02:08,080 --> 00:02:09,250
+a and in one
+明确的行驶方向
+
+73
+00:02:10,220 --> 00:02:11,560
+location rather than outputting
+不像刚开始的时候
+
+74
+00:02:11,920 --> 00:02:13,110
+a sort of light gray
+输出一段模糊的浅灰色区域
+
+75
+00:02:13,300 --> 00:02:14,570
+fuzz, but instead outputting
+而是输出
+
+76
+00:02:14,970 --> 00:02:17,010
+a white band that's
+一条白亮的区段
+
+77
+00:02:17,410 --> 00:02:19,220
+more constantly selecting one steering direction.
+表示已经选出了明确的行驶方向
+
+78
+00:02:21,340 --> 00:02:21,880
+Alban is a system
+ALVINN (Autonomous Land Vehicle In a Neural Network)
+
+79
+00:02:22,340 --> 00:02:24,850
+of artificial neural networks, that learns to steer
+是一个基于神经网络的智能系统
+
+80
+00:02:25,280 --> 00:02:26,400
+by watching a person drive. Alban
+通过观察人类的驾驶来学习驾驶
+
+81
+00:02:27,590 --> 00:02:29,550
+is designed to control the
+ALVINN能够控制NavLab载具——
+
+82
+00:02:29,670 --> 00:02:31,310
+tube a modified army
+一辆改装版军用悍马
+
+83
+00:02:31,920 --> 00:02:32,840
+Humvee who could put
+这辆悍马装载了
+
+84
+00:02:33,020 --> 00:02:35,200
+sensors, computers and actuators
+传感器 计算机和驱动器
+
+85
+00:02:36,160 --> 00:02:37,800
+for autonomous navigation experiments.
+用来进行自动驾驶的导航试验
+
+86
+00:02:41,190 --> 00:02:42,480
+The initial spec in
+实现ALVINN功能的第一步
+
+87
+00:02:42,750 --> 00:02:44,730
+configuring Alban is training in
+是对它进行训练
+
+88
+00:02:46,770 --> 00:02:48,160
+the training the person drives
+也就是训练一个人驾驶汽车
+
+89
+00:02:48,580 --> 00:02:50,640
+to be a car while Alban watches.
+然后让ALVINN观看
+
+90
+00:02:55,810 --> 00:02:58,420
+Once every two seconds, Alban
+ALVINN每两秒
+
+91
+00:02:58,690 --> 00:02:59,800
+digitizes a video image
+将前方的路况图生成一张数字化图片
+
+92
+00:03:00,320 --> 00:03:03,260
+of the road ahead, and records the person's steering direction.
+并且记录驾驶者的驾驶方向
+
+93
+00:03:11,790 --> 00:03:13,140
+This training image is reduced
+得到的训练集图片
+
+94
+00:03:13,560 --> 00:03:15,260
+in resolution to 30 by
+被压缩为30x32像素
+
+95
+00:03:15,470 --> 00:03:16,980
+32 pixels and provided
+并且作为输入
+
+96
+00:03:17,590 --> 00:03:19,100
+as input to Alban's three-layer
+提供给ALVINN的三层
+
+97
+00:03:21,920 --> 00:03:21,920
+network.
+神经网络
+
+98
+00:03:22,460 --> 00:03:25,330
+Using the back propagation learning algorithm; Alban
+通过使用反向传播学习算法
+
+99
+00:03:25,370 --> 00:03:26,590
+is training to output the same
+ALVINN会训练得到一个
+
+100
+00:03:26,940 --> 00:03:28,450
+steering direction as the
+与人类驾驶员操纵方向
+
+101
+00:03:28,530 --> 00:03:29,970
+human driver for that image
+基本相近的结果
+
+102
+00:03:33,450 --> 00:03:35,970
+Initially, the network's steering response is random.
+一开始 我们的网络选择出的方向是随机的
+
+103
+00:03:43,930 --> 00:03:45,010
+After about two minutes of
+大约经过两分钟的训练后
+
+104
+00:03:45,100 --> 00:03:46,760
+training, the network learns
+我们的神经网络
+
+105
+00:03:47,080 --> 00:03:48,730
+to accurately imitate the steering
+便能够准确地模拟
+
+106
+00:03:49,110 --> 00:03:56,430
+reactions of the
+人类驾驶者的
+
+107
+00:03:58,670 --> 00:04:03,440
+human driver. This same
+驾驶方向
+
+108
+00:04:03,710 --> 00:04:06,440
+training procedure is repeated for other road types.
+对其他道路类型 也重复进行这个训练过程
+
+109
+00:04:09,940 --> 00:04:11,680
+After the networks have been trained the
+当网络被训练完成后
+
+110
+00:04:11,770 --> 00:04:12,900
+operator pushes the run
+操作者就可按下运行按钮
+
+111
+00:04:13,200 --> 00:04:14,650
+switch and often begins
+车辆便开始行驶
+
+112
+00:04:15,050 --> 00:04:20,380
+driving. 12 times
+每秒钟ALVINN生成
+
+113
+00:04:20,870 --> 00:04:23,010
+per second, Alban digitizes an
+12次数字化图片
+
+114
+00:04:23,090 --> 00:04:25,130
+image and feeds it to its neural networks.
+并且将图像传送给神经网络进行训练
+
+115
+00:04:33,210 --> 00:04:34,980
+Each network, running in parallel,
+多个神经网络同时工作
+
+116
+00:04:35,930 --> 00:04:38,140
+produces a steering direction and a measure of it's
+每一个网络都生成一个行驶方向
+
+117
+00:04:40,050 --> 00:04:40,050
+confidence in its response.
+以及一个预测自信度的参数
+
+118
+00:04:46,610 --> 00:04:49,480
+The steering direction
+预测自信度最高的
+
+119
+00:04:50,140 --> 00:04:51,120
+from the most confident network.
+那个神经网络得到的行驶方向
+
+120
+00:04:52,340 --> 00:04:53,650
+In this case, the network trained
+比如这里 在这条单行道上训练出的网络
+
+121
+00:04:54,000 --> 00:04:56,640
+for the one-lane road is used to control the vehicle.
+将被最终用于控制车辆方向
+
+122
+00:05:04,750 --> 00:05:07,840
+Suddenly, an intersection appears ahead
+车辆前方突然出现了
+
+123
+00:05:08,310 --> 00:05:09,350
+of the vehicle.
+一个交叉十字路口
+
+124
+00:05:23,090 --> 00:05:24,450
+As the vehicle approaches the intersection,
+当车辆到达这个十字路口时
+
+125
+00:05:25,680 --> 00:05:27,740
+the confidence of the one-lane network decreases.
+我们单行道网络对应的自信度骤减
+
+126
+00:05:38,070 --> 00:05:40,030
+As it crosses the intersection, and
+当它穿过这个十字路口时
+
+127
+00:05:40,130 --> 00:05:41,160
+the two-lane road ahead comes
+前方的双车道将进入其视线
+
+128
+00:05:41,440 --> 00:05:44,610
+into view, the confidence of the two-lane network rises.
+双车道网络的自信度便开始上升
+
+129
+00:05:51,260 --> 00:05:53,000
+When it's confidence rises, the two-lane
+当它的自信度上升时 双车道的网络
+
+130
+00:05:53,420 --> 00:05:54,630
+network is selected to steer,
+将被选择来控制行驶方向
+
+131
+00:05:55,050 --> 00:05:56,780
+safely guiding the vehicle
+车辆将被安全地引导
+
+132
+00:05:57,380 --> 00:05:59,030
+into it's lane, on the two-lane road.
+进入双车道路
+
+133
+00:06:05,400 --> 00:06:06,670
+So that was autonomous
+这就是基于神经网络的
+
+134
+00:06:07,010 --> 00:06:09,790
+driving using a neural network. Of course, there are more
+自动驾驶技术 当然 我们还有很多
+
+135
+00:06:10,150 --> 00:06:11,070
+recently more modern attempts
+更加先进的试验
+
+136
+00:06:11,910 --> 00:06:14,000
+to do autonomous driving in a few properties, in
+来实现自动驾驶技术
+
+137
+00:06:14,180 --> 00:06:15,730
+the U.S., in Europe, and so on.
+在美国 欧洲等一些国家和地区
+
+138
+00:06:16,270 --> 00:06:18,040
+They're giving more robust driving
+他们提供了一些比这个方法
+
+139
+00:06:18,400 --> 00:06:19,760
+controllers than this, but I
+更加稳定的驾驶控制技术
+
+140
+00:06:20,080 --> 00:06:21,910
+think it's still pretty remarkable and
+但我认为 使用这样一个简单的
+
+141
+00:06:22,040 --> 00:06:23,190
+pretty amazing how a simple
+基于反向传播的神经网络
+
+142
+00:06:23,690 --> 00:06:25,440
+neural network trained with back-propagation
+训练出如此强大的自动驾驶汽车
+
+143
+00:06:26,140 --> 00:06:28,990
+can, you know, actually learn to drive a car somewhat well.
+的确是一次令人惊讶的成就
+
diff --git "a/srt/\350\247\206\351\242\221\344\270\213\350\275\275\345\222\214\345\255\227\345\271\225\344\275\277\347\224\250.md" "b/srt/\350\247\206\351\242\221\344\270\213\350\275\275\345\222\214\345\255\227\345\271\225\344\275\277\347\224\250.md"
new file mode 100644
index 00000000..2df67142
--- /dev/null
+++ "b/srt/\350\247\206\351\242\221\344\270\213\350\275\275\345\222\214\345\255\227\345\271\225\344\275\277\347\224\250.md"
@@ -0,0 +1,3 @@
+机器学习视频下载链接:http://pan.baidu.com/s/1dFCQvxZ 密码:dce8
+
+包含mp4视频和字幕
diff --git a/week1.html b/week1.html
new file mode 100644
index 00000000..a7cd5556
--- /dev/null
+++ b/week1.html
@@ -0,0 +1,233 @@
+
+
+
+
+week1.md
+
+
+
在这段视频中,我将介绍有关向量化的内容,无论你是用Octave,还是别的语言,比如MATLAB或者你正在用Python、NumPy 或 Java C C++,所有这些语言都具有各种线性代数库,这些库文件都是内置的,容易阅读和获取,他们通常写得很好,已经经过高度优化,通常是数值计算方面的博士或者专业人士开发的。
你可以想象实现这三个方程的方式之一,就是用一个 for 循环,就是让 等于0、等于1、等于2,来更新。但让我们用向量化的方式来实现,看看我们是否能够有一个更简单的方法。基本上用三行代码或者一个for 循环,一次实现这三个方程。让我们来看看怎样能用这三步,并将它们压缩成一行向量化的代码来实现。做法如下:
当使用梯度下降法来实现逻辑回归时,我们有这些不同的参数,就是 一直到,我们需要用这个表达式来更新这些参数。我们还可以使用 for循环来更新这些参数值,用 for i=1 to n,或者 for i=1 to n+1。当然,不用 for循环也是可以的,理想情况下,我们更提倡使用向量化的实现,可以把所有这些 n个参数同时更新。
在这段视频中,我想向你介绍一个具有历史意义的神经网络学习的重要例子。那就是使用神经网络来实现自动驾驶,也就是说使汽车通过学习来自己驾驶。接下来我将演示的这段视频是我从 Dean Pomerleau那里拿到的,他是我的同事,任职于美国东海岸的卡耐基梅隆大学。在这部分视频中,你就会明白可视化技术到底是什么?在看这段视频之前,我会告诉你可视化技术是什么。
ALVINN (Autonomous Land Vehicle In a Neural Network)是一个基于神经网络的智能系统,通过观察人类的驾驶来学习驾驶,ALVINN能够控制NavLab,装在一辆改装版军用悍马,这辆悍马装载了传感器、计算机和驱动器用来进行自动驾驶的导航试验。实现ALVINN功能的第一步,是对它进行训练,也就是训练一个人驾驶汽车。
假如有这样一些假设,在这些假设下有大量我们认为有用的训练集,我们假设在我们的机器学习问题中,特征值x包含了足够的信息,这些信息可以帮助我们用来准确地预测y,例如,如果我们采用了一些容易混淆的词,如:two、to、too,假如说它能够描述x,捕捉到需要填写的空白处周围的词语,那么特征捕捉到之后,我们就希望有对于“早饭我吃了__鸡蛋”,那么这就有大量的信息来告诉我中间我需要填的词是“两个”(two),而不是单词 to 或too,因此特征捕捉,哪怕是周围词语中的一个词,就能够给我足够的信息来确定出标签 y是什么。换句话说,从这三组易混淆的词中,我应该选什么词来填空。