字符串匹配的后缀算法

字符串匹配问题是算法领域的经典问题,C/C++中常用的 strstr函数就是这个问题的定义:

const char* strstr( const char* str, const char* target );
char* strstr( char* str, const char* target );

Finds the first occurrence of the byte string target in the byte string pointed to by str. The terminating null characters are not compared.

在目标字符串 str中寻找是否存在子串 target,字符串 str的长度为\(n\), target的长度为\(m\)。这个问题最为人所熟知的算法应该是KMP(Knuth-Morris-Pratt)算法,其时间复杂度为\(O(n)\),想法非常酷。

但是,Can we do better?

继续阅读:

SRM657 DIV1 Problem Sets

原题链接:Problem Sets

Cat Snuke came up with some problems. He wants to construct as many problem sets as possible using those problems. Each problem set must contain exactly three problems: one for the Easy slot, one for the Medium slot, and one for the Hard slot. Each problem can only be assigned to a single slot in a single problem set. He came up with E + EM + M + MH + H problems in total. The distribution of the problems is as follows:

  • E problems can only be used in the Easy slot.
  • EM problems can be used either in the Easy slot or the Medium slot.
  • M problems can only be used in the Medium slot.
  • MH problems can be used either in the Medium slot or the Hard slot.
  • H problems can only be used in the Hard slot.

Return the maximal number of problem sets he can construct.

继续阅读:

SRM656 DIV1 Random Pancake Stack

被虐了……这个题目其实不难,但是从一开始想法有个漏洞没有发现…一直没有转过弯来…还是需要训练训练…

原题链接:Random Pancake Stack

Charlie has N pancakes. He wants to serve some of them for breakfast. We will number the pancakes 0 through N-1. For each i, pancake i has width i+1 and deliciousness d[i].

Charlie chooses the pancakes he is going to serve using the following randomized process: He starts by choosing the first pancake uniformly at random from all the pancakes he has. He places the chosen pancake onto a plate. This pancake now forms the bottom of a future stack of pancakes. Then, Charlie repeats the following procedure:

  1. If there are no more pancakes remaining, terminate.
  2. Choose a pancake uniformly at random from the pancakes that have not been chosen yet.
  3. If the width of this pancake is greater than the width of the pancake on top of the stack, terminate without taking it.
  4. Place the chosen pancake on top of the stack and go back to step 1.

You are given the vector d with N elements. The total deliciousness of a serving of pancakes is the sum of the deliciousness of all pancakes used in the serving. Compute and return the expected value of the total deliciousness of the pancakes chosen by Charlie.

继续阅读:

有向图强连通分支:Kosaraju’s algorithm

有向图强连通分支算是个基础算法,不过总是忘记,写下来备忘。

无向图强连通分支非常简单,使用图的遍历算法(DFS或BFS)即可,而有向图的强连通分支计算则要复杂一些,Kosaraju’s algorithm实现了\(O(n+m)\)时间复杂度的有向图强连通分支算法。

算法的核心思想在于:从有向图中任何一个点出发做DFS,必然能从图中“拖”出一个点集,和无向图中不同的是,这个点集不一定构成强连通分支,但是如果我们能通过一个合适的顺序进行DFS(“sink” vertex),则可以依次把每一个强连通分支“拖”出来,得到正确的结果,那么算法的要点则在于如何寻找这个合适的顺序。

继续阅读:

全局最小割:Karger’s Min Cut Algorithm

Cut in an undirected graph

Cut in an undirected graph

提到无向图的最小割问题,首先想到的就是Ford-Fulkerson算法解s-t最小割,通过Edmonds–Karp实现可以在\(O(nm^2)\)时间内解决这个问题(\(n\)为图中的顶点数,\(m\)为图中的边数)。

但是全局最小割和s-t最小割不同,并没有给定的指定的源点s和汇点t,如果通过Ford-Fulkerson算法来解这一问题,则需要枚举汇点t(共\(n-1\)),时间复杂度为\(O\left(n^2m^2\right)\)。

Can we do better?

继续阅读:

Largest Rectangle in Histogram

Given n non-negative integers representing the histogram’s bar height where the width of each bar is 1, find the area of largest rectangle in the histogram.
histogram
Above is a histogram where width of each bar is 1, given height = [2,1,5,6,2,3].
histogram_area
The largest rectangle is shown in the shaded area, which has area = 10 unit.

For example,
Given height = [2,1,5,6,2,3],
return 10.

题目链接在这里

继续阅读:

Copy List with Random Pointer

A linked list is given such that each node contains an additional random pointer which could point to any node in the list or null.

Return a deep copy of the list.

题目链接在这里

题目要求对链表进行复制,不过这个链表稍微有点特殊:在每一个节点中除了指向下一个节点的指针,还有一个指向链表中随机节点的指针,如下:

这个链表看起来大概是这个样子:

带有随机指针的链表

带有随机指针的链表

这个随机指针对链表的拷贝造成了不小的麻烦…

继续阅读:

Single Number

Given an array of integers, every element appears twice except for one. Find that single one.

Note:
Your algorithm should have a linear runtime complexity. Could you implement it without using extra memory?

题目链接在这里

这是一道很有意思的题目,大意是说在一个整型数组中,所有的数字都出现了两次,只有一个数是例外,找出这个数。

乍一看感觉很简单啊,开个数组计数就好了~可惜题目还要求要在不使用额外空间的情况下找到解,这就有点蛋疼了…如果使用二分查找倒是不会用到额外空间,可是时间复杂度为O(nlogn),又不符合题目要求的线性复杂度…

该怎么办呢?该怎么办呢?

继续阅读:

[译]理解timsort, 第一部分:适应性归并排序(Adaptive Mergesort)

Python2.3中开始使用的timsort应该说算是声名在外了,不管是在稳定性还是在速度上都十分的惊人。
前一段刚刚看了《Python CookBook》中的一些章节,对timsort产生了一些兴趣。于是在网上看到了这边文章,讲的相当清楚明了,于是产生了翻译的念头,也于是有了这篇文章。

这应该算是我翻译的第一篇技术文章,真正做一次才明白能看懂和能翻译出来还是有蛮大的差距的。翻译质量不可谓不差,诸位如果英文阅读无障碍,强烈建议移步原文:Understanding timsort, Part 1: Adaptive Mergesort,如果你不幸看了下面的坑爹译文,欢迎留下各种吐槽!闲话少说,上主菜:


Python的timsort常常被认为是很复杂、可怕的。这是可以理解的,因为其中包含了太多的细节。但是,如果你真正的了解它,你会发现它其实只是对归并排序进行了一系列的改进。其中有一些是很聪明的,而也有一些是相当简单直接的。这些大大小小的改进聚集起来使得算法的效率变得十分的吸引人。

我将会通过一些例子告诉你如何从一个最基本的归并排序开始逐步得到timsort。在本文中我会讲述如何得到timsort的“核心”:基本的适应性归并排序。后续的文章会在此基础上讲述timsort中使用的其他特别的优化。
继续阅读: