diff --git a/pinecone/chunks.jsonl b/pinecone/chunks.jsonl index a37521d..91ec687 100644 --- a/pinecone/chunks.jsonl +++ b/pinecone/chunks.jsonl @@ -8,3 +8,1567 @@ {"id": "../pages/digitalGarden/index.mdx#8", "metadata": {"Header 1": "My Digital Garden", "Header 2": "The Features", "Header 3": "PlantUML", "path": "../pages/digitalGarden/index.mdx", "id": "../pages/digitalGarden/index.mdx#8", "page_content": "If you ever need to create diagrams and especially UML diagrams, PlantUML is the way to go. I started with Mermaid\nto create UML diagrams but swapped to PlantUML for the additional features and the ability to create custom themes\n(so everything can be minimalist and purple :D). \nTo render PlantUML diagrams the [Remark plugin Simple PlantUML](https://github.com/akebifiky/remark-simple-plantuml) is\nused which uses the official PlantUML server to generate an image and then adds it. \nAn Example can be seen below, on the [official website](https://plantuml.com/) and also on [REAL WORLD PlantUML](https://real-world-plantuml.com/?type=class). \n```plantuml\n@startuml\n\ninterface Command {\nexecute()\nundo()\n}\nclass Invoker{\nsetCommand()\n}\nclass Client\nclass Receiver{\naction()\n}\nclass ConcreteCommand{\nexecute()\nundo()\n}\n\nCommand <|-down- ConcreteCommand\nClient -right-> Receiver\nClient --> ConcreteCommand\nInvoker o-right-> Command\nReceiver <-left- ConcreteCommand\n\n@enduml\n``` \nTo use my custom theme you can use the following line at the beginning of the PlantUML file: \n```\n@startuml\n!theme purplerain from http://raw.githubusercontent.com/LuciferUchiha/georgerowlands.ch/main\n\n...\n\n@enduml\n``` \nHowever, it seems like when using a custom theme There can not be more then one per page? My custom theme also has some processes built in for simple text coloring for example in cases of success, failure etc. \n```plantuml\n@startuml\n!theme purplerain from http://raw.githubusercontent.com/LuciferUchiha/georgerowlands.ch/main\n\nBob -> Alice : normal\nBob <- Alice : $success(\"success: Hi Bob\")\nBob -x Alice : $failure(\"failure\")\nBob ->> Alice : $warning(\"warning\")\nBob ->> Alice : $info(\"finished\")\n\n@enduml\n```"}} {"id": "../pages/digitalGarden/index.mdx#9", "metadata": {"Header 1": "My Digital Garden", "Header 2": "How can I Contribute?", "path": "../pages/digitalGarden/index.mdx", "id": "../pages/digitalGarden/index.mdx#9", "page_content": "Do you enjoy the content and want to contribute to the garden by adding some new plants or watering the existing ones?\nThen feel free to make a pull request. There are however some rules to keep in mind before adding or changing content. \n- Markdown filenames and folders are written in camelCase.\n- Titles should follow the\n[IEEE Editorial Style Manual](https://www.ieee.org/content/dam/ieee-org/ieee/web/org/conferences/style_references_manual.pdf).\nThey should also be added to the markdown file and specified in the `_meta.json` which maps files to titles and is also\nresponsible for the ordering.\n- LaTeX should conform with my notation and guideline, if something is not defined there you can of course add it."}} {"id": "../pages/digitalGarden/cs/algorithmsDataStructures/analysisOfAlgorithms.mdx#1", "metadata": {"Header 1": "Analysis of Algorithms", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/analysisOfAlgorithms.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/analysisOfAlgorithms.mdx#1", "page_content": "Asymptotic Complexity / Analysis of Algorithms \nThe master method and how to calculate it and stuff, go back to algd1, MIT 6.006 and Algorithms Illuminated will help. \nTelescoping? How to get to recurrance relation and then asymptotic complexity."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx#1", "metadata": {"Header 1": "Bags", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx#1", "page_content": "A bag is a data structure that can contain the same element multiple times which is why it is often also called a multiset. The order of adding elements is not necessarily given, this depends on the implementation. Common operations on a bag are adding elements, removing elements and searching for a specific element."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx#2", "metadata": {"Header 1": "Bags", "Header 2": "Implementing a Bag", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx#2", "page_content": "One of the simplest ways of implementing data structures is by using arrays. When implementing a data structure the time complexities can be different on whether the data is always in a sorted state or not. \n\n```java filename=\"UnsortedBag.java\"\n// TODO\n```\n \nWhen implementing a sorted collection in Java you can either implement your own binary search or you can use `java.util.Arrays.binarysearch(a, from, to, key)` which returns the index of the key, if it is contained and otherwise $(-(insertion point) - 1)$ with insertion point being the point where the key would be inserted, i.e the index of the first element greater than the key. \n\n```java filename=\"SortedBag.java\"\n// TODO\n```\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx#3", "metadata": {"Header 1": "Bags", "Header 2": "Implementing a Bag", "Header 3": "Time Complexities", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx#3", "page_content": "| Operation | UnsortedBag | SortedBag |\n| ---------------- | ------------------------------------------ | ----------------------------------------------------- |\n| add(E e) | $O(1)$
no search, or shift | $O(n)$
search + shift right $O(\\log{n}) + O(n)$ |\n| search(Object o) | $O(n)$
linear search | $O(\\log{n})$
binary search |\n| remove(Object o) | $O(n)$
search + remove $O(n) + O(1)$ | $O(n)$
search + shift left $O(\\log{n}) + O(n)$ |\n| Ideal use case | When adding a lot | When searching a lot |"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx#4", "metadata": {"Header 1": "Bags", "Header 2": "Bag of Words", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/bags.mdx#4", "page_content": " \nWhat is a bag of words? How is it used in NLP?\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/collections.mdx#1", "metadata": {"Header 1": "Collections", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/collections.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/collections.mdx#1", "page_content": "Collections are containers/data structures that can hold elements of the same type. Most programming languages have some basic implementations as part of their standard library. Depending on the problem to be solved certain data structures are better options than others. In Java, there is the `java.util.Collections` package which contains some of the most common collections. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#1", "metadata": {"Header 1": "Hash Tables", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#1", "page_content": "In an ideal world, we would want to be able to access data with $O(1)$ using the data's unique identifier (key). \n \nFor this, to work we need to be able to generate a unique hash code from the key. From this hash code (a number) we then want to get an index in a hash table by using a hash function. For this approach to work, two conditions must be met. Firstly we need to be able to know if two objects are the same (the `equals` function) secondly we need to be able to generate a hash code from the unique identifier which can consist of a combination of attributes or just one. \nImportantly the following must be true: \n$$\n(a.equals(b)) \\Rightarrow (a.hashCode() == b.hashCode())\n$$ \nSo if two objects are the same then their hash Code must be the same as well. However, if two hash codes are the same it does not necessarily mean that the objects are the same, this is a so-called collision."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#2", "metadata": {"Header 1": "Hash Tables", "Header 2": "Hashing Function", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#2", "page_content": "We want to be able to calculate the index as fast as possible. From the above requirements, we also want the same keys to produce the same indices. We also want the hash codes and therefore the indices to be evenly distributed to minimize collisions. \nFor starters we could use the following hashing function: \n$$\nindex = hash\\,code \\mod table.length()\n$$"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#3", "metadata": {"Header 1": "Hash Tables", "Header 2": "Hash Code", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#3", "page_content": "We want the generated hash code to be randomly and if possible evenly spread across the entire range of possible numbers. \nIf the unique identifier is a 32-bit data type, like boolean, byte, short, int, char and float we can just take its value straight as an int. \nIf the unique identifier is a 64-bit data type, like long or double we can use an exclusive or (XOR, only true if they are different) between the two 32-bit parts. \n```java\npublic int hashCode() {\n// XOR of two 32-bit parts\nreturn (int)(value ^ (value >>> 32));\n}\n``` \nFor strings, it gets a bit harder. You might think it would be a good idea to add the characters represented as integers together. However, this is a very bad idea because for example AUS and USA would then have the same hash code. Instead, we create a polynomial using the character values as coefficients. \n```java\npublic final class String {\nprivate final char value[];\n/** Cache the hash code for the string, to avoid recalculation */\nprivate int hash; // Default to 0\n...\npublic int hashCode() {\nint h = hash;\nif (h == 0 && value.length > 0) {\nchar val[] = value;\nfor (int i = 0; i < value.length; i++) {\nh = 31 * h + val[i];\n}\nhash = h;\n}\nreturn h;\n}\n...\n}\n```"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#4", "metadata": {"Header 1": "Hash Tables", "Header 2": "HashMap", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#4", "page_content": "In Java, a HashMap always has a size equal to a power of 2. This leads to the map reserving in the worst case twice as much memory as it needs. However, the advantage of this implementation is that it is very easy to calculate powers of 2 with bit shifts. It also allows us to change the hash function `(hashCode() & 0x7FFFFFFF) % length` to `hashCode() & (length -1)`. The bitmask with `0x7FFFFFFF` ensures that the hash code is positive. \n```java\npublic HashMap(int initialCapacity) {\nint capacity = 1;\nwhile (capacity < initialCapacity)\ncapacity <<= 1;\ntable = new Entry[capacity];\n}\n\nprivate int indexFor(int h) {\nreturn h & (table.length - 1);\n}\n```"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#5", "metadata": {"Header 1": "Hash Tables", "Header 2": "Collision Resolution", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#5", "page_content": "As mentioned before collisions are when different objects have the same hash code and therefore the same index. This can happen and can't be avoided. This is why they need to be handled."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#6", "metadata": {"Header 1": "Hash Tables", "Header 2": "Collision Resolution", "Header 3": "Separate Chaining", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#6", "page_content": "With this strategy when there is a collision, the colliding elements are chained together just like with a linked list. The advantage of this strategy is that it is very simple and the table never becomes full. The problem however is that it needs additional memory and the memory needs to be dynamic. \n \nThe class for a HashMap would then look something like this: \n```java\npublic class HashMap < K, V > implements Map < K, V > {\nNode [] table;\n...\nstatic class Node implements Map.Entry {\nfinal K key;\nV value;\nNode next;\n...\n}\n}\n``` \nIf the table has the size $m$ and we insert $n$ elements we can calculate the probability of a collision using the following formula: \n$$\n\\prod_{i=0}^{n-1}{\\frac{m-i}{m}}\n$$ \nFrom this we can then also calcualte the probability of there being at least 1 collision: \n$$\n1 - \\prod_{i=0}^{n-1}{\\frac{m-i}{m}}\n$$"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#7", "metadata": {"Header 1": "Hash Tables", "Header 2": "Collision Resolution", "Header 3": "Open Addressing", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#7", "page_content": "With this strategy when there is a collision, we look for a free space in the hash table. The advantage of this strategy is that it does not need any additional space however the table can become full. The process of finding a free space is called probing. \n#### Linear Probing \nWhen using linear probing we try the next highest index until we find a free space. If we reach the end of the table we restart the search at index 0 until we are back to the initial area of collision which means the table is full. \nSo if the hash code is $x$ and the table has the size $m$ the index after $k$ collisions is: \n$$\nindex= (x \\mod m + k) \\mod m\n$$ \n```java\npublic void add(T elem) {\nint i = (elem.hashCode() & 0x7FFFFFFF) % size;\nwhile (array[i] != null)\ni = (i + 1) % size;\narray[i] = elem;\n}\n``` \nThe above code however doesn't check if the hash table should only hold unique values (set semantic) or if the table is already full. However, with this strategy clusters of values can form. When adding a value you then just make the cluster even bigger and therefore also the probability of hitting a cluster. \nWhen inserting into a table of size $n$ with a cluster of size $k$ we can calculate the probability of hitting the cluster and therefore also increasing the size of the cluster: \n$$\n\\frac{k+2}{n}\n$$ \nWe can also calculate the probability of needing at least 3 probe steps when adding which is: \n$$\n\\frac{k-2}{n}\n$$ \n##### Double Hashing \nThe idea here is that we don't look at the next highest free space, which is equivalent to a step size of 1 but that each element calculates a step size for itself. This is done to avoid creating clusters. This strategy is called double hashing as you have a hash function for the index and the step size. \nSo if the hash code is $x$ and the table has the size $m$ the index after $k$ collisions is: \n$$\nindex= (x \\mod m + k \\times step) \\mod m\n$$ \n```java\npublic void add(T elem) {\nint i = (elem.hashCode() & 0x7FFFFFFF) % size;\nint step = ...?\nwhile (array[i] != null) {"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#8", "metadata": {"Header 1": "Hash Tables", "Header 2": "Collision Resolution", "Header 3": "Open Addressing", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#8", "page_content": "$$\nindex= (x \\mod m + k \\times step) \\mod m\n$$ \n```java\npublic void add(T elem) {\nint i = (elem.hashCode() & 0x7FFFFFFF) % size;\nint step = ...?\nwhile (array[i] != null) {\ni = (i + step) % size;\n}\narray[i] = elem;\n}\n``` \nHowever, we need to be very careful when choosing the step size otherwise the problem of clusters becomes even worse. Some obvious bad examples would be a step size of 0 or the size of the table. To avoid this we can restrict the step size with the following condition: \n$$\nggt(step, m)= 1 \\text{ (coprime/teilerfremd) } \\land 0 < step < m\n$$ \nSome common choices are: \n- The size of the table $m$ is a power of 2 and a step is an odd number $\\in [1, m-1]$. \n```java\n1 + 2 * ((elem.hashCode() & 0x7FFFFFFF) % (m / 2))\n``` \n- The size of the table $m$ is a prime number and a step is $\\in [1, m-1]$. \n```java\n1 + (elem.hashCode() & 0x7FFFFFFF) % (m – 2)\n```"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#9", "metadata": {"Header 1": "Hash Tables", "Header 2": "Removing Elements", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#9", "page_content": "When removing an element it can't just be set to `null` because otherwise when looking for an element after the deletion we could hit a null reference and crash before we find the element we are looking for (depending on language and implementation). Instead of setting it to `null` it is common practice to set it to a sentinel object. If we are then looking for an element and we hit a sentinel we can just carry on our search. This then also means that when we add an element and we come across a sentinel we can add the element in place of the sentinel. \n```java\npublic class HashTable < T > {\nprivate final Object[] arr;\nprivate static final Object sentinel = new Object();\n...\npublic void remove(Object o) {\nassert o != null;\nint i = (o.hashCode() & 0x7FFFFFFF) % arr.length;\nint cnt = 0;\nwhile (arr[i] != null && !o.equals(arr[i]) && cnt != arr.length) {\ni = (i + 1) % arr.length;\ncnt++;\n}\nif (o.equals(arr[i])) arr[i] = sentinel;\n}\npublic boolean contains(Object o) {\nassert o != null;\nint i = (o.hashCode() & 0x7FFFFFFF) % arr.length;\nint cnt = 0;\nwhile (arr[i] != null && !o.equals(arr[i]) && cnt != arr.length) {\ni = (i + 1) % arr.length;\ncnt++;\n}\nreturn cnt != arr.length && arr[i] != null;\n}\n}\n```"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#10", "metadata": {"Header 1": "Hash Tables", "Header 2": "Performance Improvements", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#10", "page_content": "Using modulo in the probe loop causes is not optimal because of the multiple divisions that need to be calculated. \nSo instead of `i = (i + step) % size;` we can use one of the following: \n- If the table size $m$ is a power of 2 we can use a bitmask which is very fast. \n```java\ni = (i + step) & (size - 1);\n``` \n- Instead of using modulo, we could also manually detect an overflow. \n```java\ni = i + step; if (i >= size) i -= size;\n``` \n- Because a comparison with 0 is faster than with a given number we could also probe backward and check for an underflow. \n```java\ni = i - step; if (i < 0) i += size;\n```"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#11", "metadata": {"Header 1": "Hash Tables", "Header 2": "Load Factor", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#11", "page_content": "The number of collisions increases with the number of elements in the table. To be able to make statements on the status of the table there is the so-called load factor which is defined as followed: \n$$\n\\lambda = \\frac{\\text{Number of element in table}}{\\text{table size}}\n$$ \nIf we know the number of elements to be added we can then also calculate an optimal size for the table depending on the desired load factor. \nWe can also create a new table and copy all the elements to the new table if a certain threshold load factor has been reached. However, it is important to recalculate the indices when doing this. This process is called **rehashing**. \nWhen searching for an element in a hash table that is using the separate chained strategy we expect to find the element after half the load factor so $O(1+\\frac{\\lambda}{2})$. If a search is unsuccessful then the waste is $O(1+\\lambda)$ because the entire list was searched."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#12", "metadata": {"Header 1": "Hash Tables", "Header 2": "Load Factor", "Header 3": "Separate Chaining", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#12", "page_content": "There is no upper limit for the load factor as the chains can be of any length. The average length is equivalent to the load factor. For the table to be efficient the load factor should be $\\lambda < 1$."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#13", "metadata": {"Header 1": "Hash Tables", "Header 2": "Load Factor", "Header 3": "Open Addressing", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/hashTables.mdx#13", "page_content": "The load factor is limited to $\\lambda \\leq 1$. As long as $\\lambda < 1$ there is still space in the table. For optimal performance, it is recommended to have a load factor of $\\lambda < 0.75$ for linear probing and double hashing $\\lambda < 0.9$."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#1", "metadata": {"Header 1": "Linked Lists", "Header 2": "Linked Lists vs Arrays", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#1", "page_content": "When implementing collections with arrays we can encounter a few issues. An array uses a fixed given size which leads us to implementing algorithms that only work for that fixed amount. To solve this issue when adding elements we could also make an array that is one size larger, copy everything over and then add the new element. Another approach is when the array gets full we increase its size by either a fixed amount that could also change depending on how many times we have already increased the size. Meaning the array is either always full or we use to much space. \nYou can imagine a linked to be like a chain. It consists of nodes that have a value and a reference of the next node. The linked list then just needs to know the first node and can then make its way through the list. With this method the size of the collection is dynamic and we can add as many elements as we want (limited by memory). \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#2", "metadata": {"Header 1": "Linked Lists", "Header 2": "Variations", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#2", "page_content": "There are various variations of linked lists which all have there use cases."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#3", "metadata": {"Header 1": "Linked Lists", "Header 2": "Variations", "Header 3": "Singly Linked List", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#3", "page_content": "This is the common implementation when talking about linked lists. A node has a value and a reference to the next element."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#4", "metadata": {"Header 1": "Linked Lists", "Header 2": "Variations", "Header 3": "Doubly Linked List", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#4", "page_content": "Here unlike the singly linked list a node has a value, a reference to the next element and additionally also a reference to the previous element. This makes removal of node much easier. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#5", "metadata": {"Header 1": "Linked Lists", "Header 2": "Variations", "Header 3": "Circular Linked List", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#5", "page_content": "In a circular linked list the last element does not have a reference to null as the next element but instead the head which allows the linked list to be visualized as a circle. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#6", "metadata": {"Header 1": "Linked Lists", "Header 2": "Implementing a Linked List", "Header 3": "Adding", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#6", "page_content": "When implementing the `add(E e)` function there are a few options: \n- You can iterate your way through the linked list to the end and then add the new element onto the end. This however has a complexity of $O(n)$ which is not ideal for a simple operation.\n- To solve the above issue we can keep a private reference in the list of not only the head but also the tail (last element) of the linked list.\n- There is no rule saying you have to add an element at the end. You can also just add it to the front of the list, so it becomes the new head and its reference to the next node is the old head."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#7", "metadata": {"Header 1": "Linked Lists", "Header 2": "Implementing a Linked List", "Header 3": "Removing", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#7", "page_content": "When implementing the `remove(Object o)` function there is only really one way of doing it and that is to find the node that holds the value to be removed `curr` whilst also remembering the previous node `prev` and then setting the reference of the `prev.next` to `curr.next`. This can be made easier as mentioned above by storing in each node a reference to the previous element to make it a doubly linked list. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#8", "metadata": {"Header 1": "Linked Lists", "Header 2": "Implementing a Linked List", "Header 3": "Containing", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#8", "page_content": "When implementing the `boolean contains(Object o)` you have to iterate over the entire linked list to see if you find the element or reach the end."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#9", "metadata": {"Header 1": "Linked Lists", "Header 2": "Implementing a Linked List", "Header 3": "Example", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/linkedLists.mdx#9", "page_content": "\n```java filename=\"MySingleLinkedList.java\"\n// TODO\n```\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#1", "metadata": {"Header 1": "What is NP-Hard?", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#1", "page_content": "Lots of Euler diagrams and examples needed. Clear formulations seem to be hard to find."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#2", "metadata": {"Header 1": "What is NP-Hard?", "Header 2": "Deterministic vs Non-Deterministic Algorithms", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#2", "page_content": "example of deterministic and non-deterministic algorithms \nleaving a blank part"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#3", "metadata": {"Header 1": "What is NP-Hard?", "Header 2": "P and NP", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#3", "page_content": "What is the stuff with the verification in polynomial time? Is mentioned but unsure how exactly need and example"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#4", "metadata": {"Header 1": "What is NP-Hard?", "Header 2": "NP-Complete and NP-Hard", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#4", "page_content": "NP Complete is NP-Hard but als has an algo in NP? \nBut solving one NP-Complete problem in polynomial time means all NP-Complete problems can be solved in polynomial time???\nSame goes for if one NP-Hard problem can be solved in polynomial time then all NP problems can be solved in polynomial time?"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#5", "metadata": {"Header 1": "What is NP-Hard?", "Header 2": "NP-Complete and NP-Hard", "Header 3": "Reduction", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#5", "page_content": "The conversion of one problem to another has to be in polynomial time???"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#6", "metadata": {"Header 1": "What is NP-Hard?", "Header 2": "NP-Complete and NP-Hard", "Header 3": "Boolean Satisfiability Problem (SAT)", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#6", "page_content": "CNF (Conjunctive Normal Form) ??? and then reduce to 0/1 Knapsack Problem"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#7", "metadata": {"Header 1": "What is NP-Hard?", "Header 2": "NP-Complete and NP-Hard", "Header 3": "Cook-Levin Theorem", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#7", "page_content": "Got prize for proving what it means if P = NP???"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#8", "metadata": {"Header 1": "What is NP-Hard?", "Header 2": "BQP", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/np.mdx#8", "page_content": "Quantum stuff"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/queues.mdx#1", "metadata": {"Header 1": "Queues", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/queues.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/queues.mdx#1", "page_content": "A queue is as the name says like a queue of people. Meaning it follows the FIFO policy (first in first out). The most common operations on queues are: \n- `enqueue(E e)`: Adds an element to the rear of the queue.\n- `E dequeue()`: Takes the element from the front of the queue.\n- `E peek()`: Returns the element at the front of the queue, which corresponds to the element to next be dequeued. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/queues.mdx#2", "metadata": {"Header 1": "Queues", "Header 2": "Implementing a Queue", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/queues.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/queues.mdx#2", "page_content": "\n```java filename=\"MyQueue.java\"\n// TODO\n```\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/queues.mdx#3", "metadata": {"Header 1": "Queues", "Header 2": "Implementing a Queue", "Header 3": "Queue Using two Stacks", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/queues.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/queues.mdx#3", "page_content": "Although the most common way of implementing a queue is with a [linked list](./linkedLists) it is also possible to implement a queue by using two stacks. Just like when [implementing a stack with two queues](./stacks#stack-using-two-queues) you need to decide if adding or removing an element will be expensive."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/recursion.mdx#1", "metadata": {"Header 1": "Recursion", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/recursion.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/recursion.mdx#1", "page_content": "SRTBOT from MIT to design a recursive function merge sort as example:\n- Subproblem identification and definition\n- Relate subproblem solution to the original problem with a recurrence relation\n- Topological order of subproblems (order in which subproblems are solved) to avoid circular dependencies, i.e.\nwe want it to be a DAG\n- Base case(s) to terminate recursion\n- Original problem solution via subproblem solutions\n- Time and space complexity analysis \nsome blabla about recursion. Can every recursive function be written as an iterative function? What about the other way around? \nWhy would you use recursion? What are the advantages and disadvantages? \nTailed recursion and the impact on the stack."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/sets.mdx#1", "metadata": {"Header 1": "Sets", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/sets.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/sets.mdx#1", "page_content": "A set is a data structure that can hold unique elements. It represents a mathematical set which in german is called a \"Menge\". This means that an element is either in the set or it isn't. Just like with a bag you have the common operations of adding elements, removing elements and searching for a specific element."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/sets.mdx#2", "metadata": {"Header 1": "Sets", "Header 2": "Implementing a Set", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/sets.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/sets.mdx#2", "page_content": "\n```java filename=\"UnsortedSet.java\"\n// TODO\n```\n \nJust like when [implementing the bag](./bags#array-implementations) we can use `java.util.Arrays.binarysearch(a, from, to, key)` which returns the index of the key, if it is contained and otherwise $(-(insertion point) - 1)$ with insertion point being the point where the key would be inserted, i.e the index of the first element greater than the key. \n\n```java filename=\"SortedSet.java\"\n// TODO\n```\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/sets.mdx#3", "metadata": {"Header 1": "Sets", "Header 2": "Implementing a Set", "Header 3": "Time Complexities", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/sets.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/sets.mdx#3", "page_content": "When implementing a set and bag there is also the question of whether the data should be sorted or not. Depending on the answer the time complexities will be different and the implementation changes. \n| Operation | UnsortedSet | SortedSet |\n| ---------------- | ----------------------------------------------- | ----------------------------------------------------------------------------- |\n| add(E e) | $O(n)$
check (search) + add $O(n) + O(1)$ | $O(n)$
search insertion point (check) + shift right $O(\\log{n}) + O(n)$ |\n| search(Object o) | $O(n)$
linear search | $O(\\log{n})$
binary search |\n| remove(Object o) | $O(n)$
search + remove $O(n) + O(1)$ | $O(n)$
search insertion point (check) + shift left $O(\\log{n}) + O(n)$ |\n| Ideal use case | When set is needed but don't search a lot | When set is needed and a lot of searching |"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/stacks.mdx#1", "metadata": {"Header 1": "Stacks", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/stacks.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/stacks.mdx#1", "page_content": "A stack is as the name says like a stack of paper. Meaning it follows the LIFO policy (last in first out). The most common operations on queues are: \n- `push(E e)`: Puts the element onto the top of the stack.\n- `E pop()`: Takes the element from the top of the stack.\n- `E peek()`: Returns the element at the top of the stack, which corresponds to the element to next be popped. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/stacks.mdx#2", "metadata": {"Header 1": "Stacks", "Header 2": "Implementing a Stack", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/stacks.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/stacks.mdx#2", "page_content": "\n```java filename=\"MyStack.java\"\n// TODO\n```\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/stacks.mdx#3", "metadata": {"Header 1": "Stacks", "Header 2": "Implementing a Stack", "Header 3": "Stack Using two Queues", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/stacks.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/stacks.mdx#3", "page_content": "Although the most common way of implementing a stack is with a [linked list](./linkedLists) it is also possible to implement a stack by using two queues. Just like when [implementing a queue with two stacks](./queues#queue-using-two-stacks) you need to decide if adding or removing an element will be expensive."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx#1", "metadata": {"Header 1": "Coins in a Line", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx#1", "page_content": "This game is a tricky little coding problem that works has the following rules: \n- There are an even number $n$ of coins in a line, with values $v_1, v_2, ..., v_n$, i.e. $v_i$ is the value of the i-th coin.\n- Two players, often called Alice and Bob, take turns to take a coin either from the left or the right end of the line\nuntil there are no more coins left.\n- The player whose coins have the higher total value wins. \n \nThe goal is to find an algorithm that maximizes the value of the coins that the first player (Alice) gets. \n\nThere are 4 coins with values [1, 2, 3, 4], Alice will get the maximum value of 6 by taking the\nlast coin twice (4 + 2), assuming Bob also plays optimally.\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx#2", "metadata": {"Header 1": "Coins in a Line", "Header 2": "Greedy Algorithm", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx#2", "page_content": "This game isn't as simple as it seems, and it's not immediately obvious how to solve it. Most commonly people will\nstart with a greedy algorithm, which is to take the coin with the highest value at each turn. This is a good start,\nand will win in the example above, but it's not optimal. Consider the following example: \n\nThere are again 4 coins but with the values [5, 10, 25, 10]. \n1. Alice takes the right coin with value 10.\n2. Bob takes the right coin with value 25.\n3. Alice takes the right coin with value 10.\n4. Bob takes the last coin with value 5. \nAlice will have a total value of 20, and Bob will have a total value of 30. Bob wins!\n \nBy tweaking the greedy algorithm, we can get an algorithm that will always win, but not necessarily get the maximum\nvalue. Instead of taking the coin with the highest value, Alice first calculates the total value of coins in the odd\npositions, and then calculates the total value of coins in the even positions (starting at 0). She then takes the coin\nin the positions with the highest total sum. \n\nThere are again 6 coins but with the values [1,3,6,3,1,3]. First Alice calculates the total value of coins in the\neven positions, which is 1 + 6 + 1 = 8. Then she calculates the total value of coins in the odd positions, which\nis 3 + 3 + 3 = 9. So she takes the coins in the odd positions. If Bob uses the greedy approach we get the following: \n1. Alice takes the right coin with value 3 (original position=5).\n2. Bob takes the left coin with value 1.\n3. Alice takes the left coin with value 3 (original position=1).\n4. Bob takes the left coin with value 6.\n5. Alice takes the left coin with value 3 (original position=3).\n6. Bob takes the last coin with value 1. \nAlice will have a total value of 9, and Bob will have a total value of 8. Alice wins, but there is a way to get 10! \nIf Bob uses the same tweaked greedy approach as Alice, we get the following: \n1. Alice takes the right coin with value 3 (original position=5)."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx#3", "metadata": {"Header 1": "Coins in a Line", "Header 2": "Greedy Algorithm", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx#3", "page_content": "If Bob uses the same tweaked greedy approach as Alice, we get the following: \n1. Alice takes the right coin with value 3 (original position=5).\n2. Bob can't take an odd position coin, so he can take either coin as they are both odd positions and have the same value.\nLet's say he takes the left coin with value 1 because he built his algorithm to scan from left to right.\n3. Alice takes the left coin with value 3 (original position=1).\n4. Bob again can't take an odd position coin, but he takes the left coin with value 6 because it has a higher value\nthan the right coin with value 1.\n5. Alice takes the left coin with value 3 (original position=3)\n6. Bob takes the last coin with value 1. \nThe result is the same as if Bob used the normal greedy approach, because Alice always take the coins away from Bob\nas she gets to go first.\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx#4", "metadata": {"Header 1": "Coins in a Line", "Header 2": "Dynamic Programming Algorithm", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/coinsLine.mdx#4", "page_content": "We always assume that Bob will play optimally, optimally meaning that he will always take the coin which minimizes the\n**total amount** of coins that Alice can get."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/introduction.mdx#1", "metadata": {"Header 1": "Introduction to DP", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/introduction.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/introduction.mdx#1", "page_content": "Dynamic programming, short DP, is a problem-solving technique or more formally an algorithmic design paradigm just like \"divide\nand conquer\" or a \"greedy algorithm\". It is used to solve problems that can be broken down into sub-problems (just like\ndivide and conquer) which are then solved recursively. For a problem to be solved using dynamic programming, it must\nhave two properties: \n- **Overlapping Sub-problems**: When the problem is broken down into sub-problems, the same sub-problems are solved\nmultiple times, i.e. there is an overlap.\n- **Optimal Substructure**: When the most optimal solution for the original problem can be constructed using the\noptimal solutions of the sub-problems. \nWe can illustrate these two properties using the Fibonacci sequence. The Fibonacci sequence is defined as follows: \n```java\npublic int fib(int n) {\nif (n <= 1)\nreturn n;\nreturn fib(n - 1) + fib(n - 2);\n}\n``` \nWhen we illustrate the recursive calls of the `fib` function as a tree (always a good idea when working with dynamic\nprogramming problems), we can see that the same sub-problems are solved multiple times. For example for `fib(6)` we can\nsee that `fib(3)` is solved three times, so there is an overlap.\nThe other property, optimal substructure, is also satisfied. The optimal solution for `fib(6)` is constructed using the\noptimal solutions of `fib(5)` and `fib(4)`. \n \nFrom the tree above we can also see that the time complexity of the `fib` function is exponential, i.e. `O(2^n)`. This\nis because the same sub-problems are solved multiple times. As we will see later, dynamic programming can be used to\nimprove the time complexity of the `fib` function to `O(n)`. This is a huge improvement and is most often the reason why\ndynamic programming is used because it can drastically improve the time complexity of a function."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/introduction.mdx#2", "metadata": {"Header 1": "Introduction to DP", "Header 2": "Top-Down Approach (Memoization)", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/introduction.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/introduction.mdx#2", "page_content": "The top-down approach is the most common way to solve dynamic programming problems. It is also called **memoization**.\nThe idea is to store the results of the sub-problems so that we do not have to re-compute them when they are needed\nagain later due to the overlapping sub-problems property. This technique is called memoization because we store the\nresults of the sub-problems in a lookup table (memo). \nIt is called top-down because we still start with the original problem and break it down into sub-problems and solve\nthem recursively. \nWhen implementing memoization it is important to think about the data structure that will be used to store the results\nas we want quick lookups. This leads to most implementations using either just a simple array where the index is the\ninput to the function or a hash map where the key is the input to the function. \n```java\npublic int fib(int n) {\nif (n < 0)\nthrow new IllegalArgumentException(\"n must be greater than or equal to 0\");\nif (n <= 1)\nreturn n;\n\nInteger[] memo = new Integer[n + 1]; // This uses more memory than a simple array but is more convenient\n\n// base cases\nmemo[0] = 0;\nmemo[1] = 1;\nreturn fibMemo(n, memo);\n}\n\npublic int fibMemo(int n, int[] memo) {\nif (memo[n] != null)\nreturn memo[n];\nmemo[n] = fibMemo(n - 1, memo) + fibMemo(n - 2, memo);\nreturn memo[n];\n}\n``` \nAfter implementing the memoization technique, we can see in the tree below that the time complexity of the `fib`\nfunction is now `O(n)` as each sub-problem is only solved once. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/introduction.mdx#3", "metadata": {"Header 1": "Introduction to DP", "Header 2": "Bottom-Up Approach (Tabulation)", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/introduction.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/introduction.mdx#3", "page_content": "The bottom-up approach is the other way to solve dynamic programming problems. It is also called **tabulation**. The\nidea is to solve the sub-problems first, i.e. some of the base cases, and then use the results of those sub-problems to\nsolve the original problem, hence the name bottom-up. This technique is called tabulation because we store the results\nof the sub-problems in a table (depending on the problem, this can be a 1D or 2D array). \nWhen implementing memoization it helped to visualize the recursive calls as a tree. When implementing tabulation it\nis also additionally helpful to visualize the results as a table or list (depending on the problem) to find a pattern. \nFor a visualisation of the tabulation technique I can recommend watching [this video](https://youtu.be/oBt53YbR9Kk?t=11513)\nat the 3:11:50 mark. The whole video is great and I can recommend watching it all and also the 4 part [video series by\nMIT on dynamic programming from 2020](https://www.youtube.com/watch?v=r4-cftqTcdI&t=7s). \nFor the Fibonacci sequence, we can see that the base cases are `fib(0)` and `fib(1)`. We can then use those results to\nthen iteratively solve the rest of the sub-problems until we reach the original problem. \n```java\npublic int fib(int n) {\nif (n < 0)\nthrow new IllegalArgumentException(\"n must be greater than or equal to 0\");\nif (n <= 1)\nreturn n;\n\nint[] memo = new int[n + 1];\n\n// base cases\nmemo[0] = 0;\nmemo[1] = 1;\n\nfor (int i = 2; i <= n; i++) {\nmemo[i] = memo[i - 1] + memo[i - 2];\n}\nreturn memo[n];\n}\n``` \nThe above code then again results in a time complexity of `O(n)`, much better than the original `O(2^n)`."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/knapsack.mdx#1", "metadata": {"Header 1": "Knapsack Problem", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/knapsack.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/knapsack.mdx#1", "page_content": "The knapsack problem is a very popular problem with many different variations. The problem is as follows: \n> Given a set of items, each with a weight and a value, determine which items you should pick to maximize the value\n> while keeping the overall weight smaller than the limit of your knapsack (backpack). \n \nSome popular variations of the knapsack problem are: \n- 0/1 Knapsack: You can either take an item or not take it.\n- Unbounded Knapsack: You can take an item multiple times.\n- Bounded Knapsack: You can take an item a limited number of times.\n- Fractional Knapsack: You can take a fraction of an item. \nThe [subset sum problem](./subsetSum) is a variation of the knapsack problem where the weight of each item is equal to its value and\nthe goal is not to maximize the value but to get a specific value and weight. In my definition of the subset sum problem I allowed\nan item to be used multiple times, so it is a variation of the unbounded knapsack problem. \n\nActually implement the knapsack problem with the different variations.\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#1", "metadata": {"Header 1": "Subset Sum Problem", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#1", "page_content": "For the subset sum problem, we are given an array of integers and a target sum, to keep it simple we will assume that\nthe array only contains positive integers and that the target sum is also positive. We will also allow an element in the\narray to be used multiple times. \nFrom this input we can then ask the following questions: \n- Is there a subset of the array that sums to the target sum? I will call this the `canSum` problem.\n- How many subsets of the array sum to the target sum? I will call this the `countSum` problem.\n- If there is a subset that sums to the target sum, what is the subset? I will call this the `howSum` problem.\n- If there is a subset that sums to the target sum, what is the minimum number of elements in the subset? I will call\nthis the `bestSum` problem. \n \nIf we are given the array `[2, 3, 5]` and the target sum `8`, then the answers to the above questions are: \n- `canSum(8, [2, 3, 5]) = true`\n- `countSum(8, [2, 3, 5]) = 2` (the subsets are `[2, 2, 2, 2]` and `[3, 5]`)\n- `howSum(8, [2, 3, 5]) = [2, 2, 2, 2]`\n- `bestSum(8, [2, 3, 5]) = [3, 5]` \nAnd for the array `[2, 4]` and the target sum `7` we get: \n- `canSum(7, [2, 4]) = false`\n- `countSum(7, [2, 4]) = 0`\n- `howSum(7, [2, 4]) = null`\n- `bestSum(7, [2, 4]) = null` \nAnd for an example that is not so trivial, we can use the array `[1, 2, 5, 25]` and the target sum `100`: \n- `canSum(100, [1, 2, 5, 25]) = true`\n- `countSum(100, [1, 2, 5, 25]) = 154050750` seems about right\n- `howSum(100, [1, 2, 5, 25]) = [1,1,1,1,1...1]` (100 times) because of the order of the for loop\n- `bestSum(100, [1, 2, 5, 25]) = [25, 25, 25, 25]` \n \nThe subset sum problem is a very popular problem but also a very hard problem computationally. As will become clearer\nbelow the time complexity of the subset sum problem is `O(n^m)` where `n` is the length of the array and `m` is the\ntarget sum. This is because we have to try all possible combinations of the elements in the array to find a subset that"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#2", "metadata": {"Header 1": "Subset Sum Problem", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#2", "page_content": "below the time complexity of the subset sum problem is `O(n^m)` where `n` is the length of the array and `m` is the\ntarget sum. This is because we have to try all possible combinations of the elements in the array to find a subset that\nsums to the target sum. This is also why dynamic programming is so useful for this problem because it can drastically\nimprove the time complexity. \n\nWhat does it mean for a problem to be NP-complete? Is the subset sum problem NP-complete etc.?\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#3", "metadata": {"Header 1": "Subset Sum Problem", "Header 2": "Can Sum", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#3", "page_content": "Our first approach to this problem is most lightly a brute force approach. We can use recursion to solve this problem\nby trying to subtract each element in the array from the target sum and then recursively calling the function again with\nthe new target sum. If the target sum is 0 then we have found a subset that sums to the target, and we can return\ntrue. If the target sum is negative then we have not found a subset that sums to the target sum and we can return false.\nThese results are then propagated back up the call stack until we reach the original call (the parent node in the tree\nbecomes true if any of its children are true and otherwise false). \nWe can construct the following tree to visualize the recursive calls: \n \n```java\npublic boolean canSum(int targetSum, int[] numbers) {\nif (targetSum == 0)\nreturn true;\nif (targetSum < 0)\nreturn false;\n\nfor (int num : numbers) {\nint remainder = targetSum - num;\nif (canSum(remainder, numbers))\nreturn true;\n}\nreturn false;\n}\n``` \nFrom the tree above we can see that the time complexity of the `canSum` function is `O(n^m)` where `n` is the length of\nthe array (the number of children per node) and `m` is the target sum (the depth of the tree, which would be maximal if\nthe array contained a 1). We can improve the time complexity of the `canSum` function to `O(n*m)` by using memoization. \n```java\npublic boolean canSum(int targetSum, int[] numbers) {\nif (targetSum < 0)\nthrow new IllegalArgumentException(\"targetSum must be greater than or equal to 0\");\n\nboolean[] memo = new boolean[targetSum + 1];\nArrays.fill(memo, false); // not needed but makes it more clear\nmemo[0] = true;\n\nreturn canSumMemo(targetSum, numbers, memo);\n}\n\npublic boolean canSumMemo(int targetSum, int[] numbers, boolean[] memo) {\nif (memo[targetSum])\nreturn true;\nif (targetSum < 0)\nreturn false;"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#4", "metadata": {"Header 1": "Subset Sum Problem", "Header 2": "Can Sum", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#4", "page_content": "return canSumMemo(targetSum, numbers, memo);\n}\n\npublic boolean canSumMemo(int targetSum, int[] numbers, boolean[] memo) {\nif (memo[targetSum])\nreturn true;\nif (targetSum < 0)\nreturn false;\n\nfor (int num : numbers) {\nint remainder = targetSum - num;\nif (canSumMemo(remainder, numbers, memo)) {\nmemo[targetSum] = true;\nreturn true;\n}\n}\nmemo[targetSum] = false;\nreturn false;\n}\n``` \nTo use tabulation instead of memoization we would need to construct a table (array) of size `targetSum + 1` and then\nfill it with the base cases and find some sort of pattern. So we would initially fill the list with `false` and then\nset the index 0 to `true` because the target sum 0 can always be constructed using an empty array. Then we need to\ndo something thinking to find the pattern. \nIf we think of our current position in the array as the target sum, i.e. in the first iteration we are at index 0, then\nwe know that we can construct the target sums where we add each number in the array to the current position. For example\nif we are at index 0 and the array is `[5,4,3]` and we have the target 7 then we know that we can construct the target\nsums 5,4 and 3 by adding the number at index 0 to the current position. So we can set the values at index 5, 4 and 3 to\n`true`. We can then move on and set our current index to 1 and we know that we can't construct the target sum 1 using\nthe array so we can skip it, same goes for index 2. But we can construct the target sum 3, so it gets interesting again.\nWe can then again add each number in the array to the current position and set the values at index 8, 7 and 6 to `true`.\nThis process continues until we reach the end of the array. If we then return the value at the last index we will have\nour result. \nThis [blog post](https://teepika-r-m.medium.com/dynamic-programming-basics-part-2-758b00e0a4b0) visualizes the process very well. \n```java\npublic boolean canSum(int targetSum, int[] numbers) {\nif (targetSum < 0)\nthrow new IllegalArgumentException(\"targetSum must be greater than or equal to 0\");"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#5", "metadata": {"Header 1": "Subset Sum Problem", "Header 2": "Can Sum", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#5", "page_content": "boolean[] table = new boolean[targetSum + 1];\nArrays.fill(table, false); // not needed but makes it more clear\ntable[0] = true;\n\nfor (int i = 0; i <= targetSum; i++) {\nif (table[i]) {\nfor (int num : numbers) {\nif (i + num < table.length)\ntable[i + num] = true;\n}\n}\n}\nreturn memo[targetSum];\n}\n```"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#6", "metadata": {"Header 1": "Subset Sum Problem", "Header 2": "Count Sum", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#6", "page_content": "The `countSum` problem is very similar to the `canSum` problem. The only difference is that when the target sum is 0 we\nreturn 1 instead of true and when the target sum is negative we return 0 instead of false and then in the parent node\nwe sum up the results of the children. \n \nThe brute force approach would look like this with a time complexity of `O(n^m)`: \n```java\npublic int countSum(int targetSum, int[] numbers) {\nif (targetSum == 0)\nreturn 1;\nif (targetSum < 0)\nreturn 0;\n\nint count = 0;\nfor (int num : numbers) {\nint remainder = targetSum - num;\ncount += countSum(remainder, numbers);\n}\nreturn count;\n}\n``` \nAnd the memoized version would look like this with a time complexity of `O(n*m)`: \n```java\npublic int countSum(int targetSum, int[] numbers) {\nif (targetSum < 0)\nthrow new IllegalArgumentException(\"targetSum must be greater than or equal to 0\");\n\nint[] memo = new int[targetSum + 1];\nArrays.fill(memo, -1);\nmemo[0] = 1;\n\nreturn countSumMemo(targetSum, numbers, memo);\n}\n\npublic int countSumMemo(int targetSum, int[] numbers, int[] memo) {\nif (targetSum < 0)\nreturn 0;\nif (memo[targetSum] != -1)\nreturn memo[targetSum];\n\nint count = 0;\nfor (int num : numbers) {\nint remainder = targetSum - num;\ncount += countSumMemo(remainder, numbers, memo);\n}\nmemo[targetSum] = count;\nreturn count;\n}\n``` \nOne issue is that it will count the same subset multiple times but with different ordering of the elements, as we can\nsee in the tree above."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#7", "metadata": {"Header 1": "Subset Sum Problem", "Header 2": "How Sum", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#7", "page_content": "The `howSum` problem is again a variation of the `canSum` problem. The only difference is that when the target sum is 0\nwe return an empty array instead of true and when the target sum is negative we return null instead of false and then\nin the parent node we return the array with the element that was used to get to the target sum. To solve this problem\nit doesn't matter if the array is the shortest or longest possible array that sums to the target sum it will just be\none of the possible solutions (The furthest left solution in the tree above because of the order of the for loop and\nthe recursive call). \n```java\npublic int[] howSum(int targetSum, int[] numbers) {\nif (targetSum == 0)\nreturn new int[0];\nif (targetSum < 0)\nreturn null;\n\nfor (int num : numbers) {\nint remainder = targetSum - num;\nint[] result = howSum(remainder, numbers);\nif (result != null) {\nint[] newArray = new int[result.length + 1];\nSystem.arraycopy(result, 0, newArray, 0, result.length); // O(n)\nnewArray[result.length] = num;\nreturn newArray;\n}\n}\nreturn null;\n}\n``` \nWith memoization: \n```java\npublic int[] howSum(int targetSum, int[] numbers) {\nif (targetSum < 0)\nthrow new IllegalArgumentException(\"targetSum must be greater than or equal to 0\");\n\nint[][] memo = new int[targetSum + 1][]; // will be jagged array\nArrays.fill(memo, null); // not needed but makes it more clear\nmemo[0] = new int[0];\n\nreturn howSumMemo(targetSum, numbers, memo);\n}\n\npublic int[] howSumMemo(int targetSum, int[] numbers, int[][] memo) {\nif (targetSum < 0)\nreturn null;\nif (memo[targetSum] != null)\nreturn memo[targetSum];\n\nfor (int num : numbers) {\nint remainder = targetSum - num;\nint[] result = howSumMemo(remainder, numbers, memo);\nif (result != null) {\nint[] newArray = new int[result.length + 1];\nSystem.arraycopy(result, 0, newArray, 0, result.length); // O(n)\nnewArray[result.length] = num;\nmemo[targetSum] = newArray;\nreturn newArray;\n}\n}\nmemo[targetSum] = null;\nreturn null;\n}\n```"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#8", "metadata": {"Header 1": "Subset Sum Problem", "Header 2": "Best Sum", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#8", "page_content": "The `bestSum` problem is again a variation of the `howSum` problem. It is very similar to the `howSum` problem but\ninstead of returning the first array that sums to the target sum, we return the shortest array that sums to the target\nsum. \n```java\npublic int[] bestSum(int targetSum, int[] numbers) {\nif (targetSum == 0)\nreturn new int[0];\nif (targetSum < 0)\nreturn null;\n\nint[] shortestArray = null;\nfor (int num : numbers) {\nint remainder = targetSum - num;\nint[] result = bestSum(remainder, numbers);\nif (result != null) {\nint[] newArray = new int[result.length + 1];\nSystem.arraycopy(result, 0, newArray, 0, result.length); // O(n)\nnewArray[result.length] = num;\nif (shortestArray == null || newArray.length < shortestArray.length)\nshortestArray = newArray;\n}\n}\nreturn shortestArray;\n}\n``` \nWith memoization: \n```java\npublic int[] bestSum(int targetSum, int[] numbers) {\nif (targetSum < 0)\nthrow new IllegalArgumentException(\"targetSum must be greater than or equal to 0\");\n\nint[][] memo = new int[targetSum + 1][]; // will be jagged array\nArrays.fill(memo, null); // not needed but makes it more clear\nmemo[0] = new int[0];\n\nreturn bestSumMemo(targetSum, numbers, memo);\n}\n\npublic int[] bestSumMemo(int targetSum, int[] numbers, int[][] memo) {\nif (targetSum < 0)\nreturn null;\nif (memo[targetSum] != null)\nreturn memo[targetSum];\n\nint[] shortestArray = null;\nfor (int num : numbers) {\nint remainder = targetSum - num;\nint[] result = bestSumMemo(remainder, numbers, memo);\nif (result != null) {\nint[] newArray = new int[result.length + 1];\nSystem.arraycopy(result, 0, newArray, 0, result.length); // O(n)\nnewArray[result.length] = num;\nif (shortestArray == null || newArray.length < shortestArray.length)\nshortestArray = newArray;\n}\n}\nmemo[targetSum] = shortestArray;\nreturn shortestArray;\n}\n```"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#9", "metadata": {"Header 1": "Subset Sum Problem", "Header 2": "All Sum", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/dynamicProgramming/subsetSum.mdx#9", "page_content": "The `allSum` problem is again a variation of the `canSum` problem, and is almost a combination of the `countSum` and\n`howSum` problems. However, it is a bit more complicated because we need to return a list of arrays instead of just one\nresult. \n\nCan't be bothered to implement this right now. Maybe later. Same goes for the tabulation versions of the above\nproblems.\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#1", "metadata": {"Header 1": "Centrality", "Header 2": "Vertex Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#1", "page_content": "Vertex centrality measures can be used to determine the importance of a vertex in a graph. There are many different\nvertex centrality measures, each with their own advantages and disadvantages. In a communication network a vertex with\nhigh centrality is an actor that is important for the communication in the network, hence they are also often called\nactor centrality measures. An actor with high centrality can control the flow of information in the network for good or\nbad. They can also be used to determine key actors in a network, for example in a power grid it is important to know\nwhich vertices are key actors, because if they fail, the whole network fails."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#2", "metadata": {"Header 1": "Centrality", "Header 2": "Vertex Centrality", "Header 3": "Degree Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#2", "page_content": "The degree centrality is the simplest centrality measure. It is simply the number of edges connected to a vertex. The\ndegree centrality is a local measure, because it only takes into account the direct neighbors of a vertex. It can be\ncalculated using the $\\text{deg()}$ function. Or alternatively using the $\\text{indeg()}$ and $\\text{outdeg()}$\ndepending on whether the graph is directed or not and the use-case. \nexport const vertexDegreeGraph = {\nnodes: [\n{id: 1, label: \"2\", x: 0, y: 0},\n{id: 2, label: \"2\", x: 0, y: 200},\n{id: 3, label: \"3\", x: 200, y: 100, color: \"red\"},\n{id: 4, label: \"2\", x: 400, y: 100},\n{id: 5, label: \"3\", x: 600, y: 100, color: \"red\"},\n{id: 6, label: \"2\", x: 800, y: 0},\n{id: 7, label: \"2\", x: 800, y: 200}\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 2, to: 3},\n{from: 3, to: 4},\n{from: 4, to: 5},\n{from: 5, to: 6},\n{from: 5, to: 7},\n{from: 6, to: 7}\n]\n}; \n \nThe degree centrality can be normalized by dividing it by the maximum possible degree in the graph. This is rarely done\nin practice, because a lot of values will be small, and we are most often interested in the actual degree of a vertex. \nThe interpretation of the degree centrality is pretty self-explanatory. And is closely related to the\n[prestige](#prestige) of a vertex."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#3", "metadata": {"Header 1": "Centrality", "Header 2": "Vertex Centrality", "Header 3": "Closeness Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#3", "page_content": "Unlike the degree centrality, the closeness centrality is a global measure, because it takes into account the whole\ngraph, however the consequence of this is that it is more expensive to calculate. \n> The idea of the closeness centrality is that a vertex is important if it is close to the center of the graph. So a vertex\nis important if it is **close** to all other vertices in the graph, i.e. it is close to the center of the graph. \nThis also means that a vertex can be important even if it only has one edge. As seen by the green vertex in the following graph. \nexport const vertexDegreeProblemGraph = {\nnodes: [\n{id: 1, label: \"2\", x: 0, y: 0},\n{id: 2, label: \"2\", x: 0, y: 200},\n{id: 3, label: \"3\", x: 200, y: 100, color: \"red\"},\n{id: 4, label: \"2\", x: 400, y: 100},\n{id: 5, label: \"3\", x: 600, y: 100, color: \"red\"},\n{id: 6, label: \"2\", x: 800, y: 0},\n{id: 7, label: \"2\", x: 800, y: 200},\n{id: 8, label: \"1\", x: 400, y: 0, color: \"green\"}\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 2, to: 3},\n{from: 3, to: 4},\n{from: 4, to: 5},\n{from: 5, to: 6},\n{from: 5, to: 7},\n{from: 6, to: 7},\n{from: 4, to: 8},\n]\n}; \n \nThe closeness centrality for a vertex $v$ is calculated by taking the inverse distance of all shortest paths from the\nvertex $v$ to all other vertices in the graph. This can be interpreted as how efficiently can all the other vertices\nbe reached from $v$. The formula for the closeness centrality is as follows: \n$$\n\\text{closenessCentrality}(v) = \\sum_{u \\in V \\setminus \\{v\\}}{d(v,u)^{-1}} = \\sum_{u \\in V \\setminus \\{v\\}}{\\frac{1}{d(v,u)}}\n$$ \nWhere $d(v,u)$ is the length of the shortest path from $v$ to $u$. Let us calculate the closeness centrality for the\ngreen vertex in the graph above. \n$$\n\\begin{align*}\n1 + \\frac{1}{2} + \\frac{1}{2} + \\frac{1}{3} + \\frac{1}{3} + \\frac{1}{3} + \\frac{1}{3} &= \\frac{10}{3} \\\\\n\\frac{10}{3} \\cdot \\frac{1}{8-1} &= \\frac{10}{21} \\approx 0.476\n\\end{align*}\n$$"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#4", "metadata": {"Header 1": "Centrality", "Header 2": "Vertex Centrality", "Header 3": "Closeness Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#4", "page_content": "green vertex in the graph above. \n$$\n\\begin{align*}\n1 + \\frac{1}{2} + \\frac{1}{2} + \\frac{1}{3} + \\frac{1}{3} + \\frac{1}{3} + \\frac{1}{3} &= \\frac{10}{3} \\\\\n\\frac{10}{3} \\cdot \\frac{1}{8-1} &= \\frac{10}{21} \\approx 0.476\n\\end{align*}\n$$ \nTo normalize the closeness centrality, it can be divided by $|V| - 1$. \nexport const vertexClosenessGraph = {\nnodes: [\n{id: 1, label: \"0.524\", x: 0, y: 0},\n{id: 2, label: \"0.524\", x: 0, y: 200},\n{id: 3, label: \"0.596\", x: 200, y: 100},\n{id: 4, label: \"0.714\", x: 400, y: 100, color: \"red\"},\n{id: 5, label: \"0.596\", x: 600, y: 100},\n{id: 6, label: \"0.524\", x: 800, y: 0},\n{id: 7, label: \"0.524\", x: 800, y: 200},\n{id: 8, label: \"0.476\", x: 400, y: 0, color: \"green\"}\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 2, to: 3},\n{from: 3, to: 4},\n{from: 4, to: 5},\n{from: 5, to: 6},\n{from: 5, to: 7},\n{from: 6, to: 7},\n{from: 4, to: 8},\n]\n}; \n \n\nThis gives different values to the formula from wikipedia and networkx. They use the following formula: \n$$\n\\text{closenessCentrality}(v) = \\frac{1}{\\sum_{u \\in V \\setminus \\{v\\}}{d(v,u)}}\n$$ \nand for the normalized closeness centrality: \n$$\n\\text{closenessCentrality}(v) = \\frac{|V| - 1}{\\sum_{u \\in V \\setminus \\{v\\}}{d(v,u)}}\n$$ \nwhere $d(v,u)$ is the length of the shortest path from $v$ to $u$. \nThe issue with the above formula is that if no path exists between $v$ and $u$ then the distance is $\\infty$ which\nwould lead to the closeness centrality being $0$. This could be solved by just using 0 instead of $\\infty$ which would\nlead to the same result as the formula above because 1 divided by $\\infty$ is $0$, i.e. 0 is added to the sum.\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#5", "metadata": {"Header 1": "Centrality", "Header 2": "Vertex Centrality", "Header 3": "Betweenness Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#5", "page_content": "In the above example using the degree centrality we saw that the green ones are the most important. However,\nwe can clearly visually see that the vertex inbetween them is the most important one as it connects the two communities.\nBecause of this we could say that that vertex is in Brokerage position or is a Broker/Gatekeeper of information. \n \nThe betweenness centrality is a global measure that takes into account the whole graph and tries to solve the above\nissue. \n> The idea of the betweenness centrality is that a vertex is important if a lot of shortest paths go through it, i.e. it is\n> **between** a lot of vertices. \nTo calculate the betweenness centrality we need to calculate the number of shortest paths that go through a vertex $v$.\nSo for every pair of vertices $u$ and $w$ we need to calculate the shortest paths and then count how many of them go\nthrough $v$. The formula for the betweenness centrality is as follows: \n$$\n\\text{betweennessCentrality}(v) = \\sum_{u \\neq v \\neq w}{\\frac{\\sigma_{uw}(v)}{\\sigma_{uw}}}\n$$ \nWhere $\\sigma_{uw}$ is the number of shortest paths from $u$ to $w$ and $\\sigma_{uw}(v)$ is the number of shortest paths\nfrom $u$ to $w$ that go through $v$. \n\nThe fraction in the formula leads to the weight being split if there are multiple shortest paths between $u$ and $w$.\n \nBecause the calculations for the betweenness centrality are quite complex and take a while to calculate, we will use a\nsmaller graph to calculate the betweenness centrality. \n\nmake this more algorithmic and use the pictures from the script.\n \n\n\nWe start with all betweenness centralities being $0$. We start with the first vertex on the left and mark it green.\n\n\nWe then calculate the shortest path to the next one in a BFS manner. The vertex to the right is the next one so we"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#6", "metadata": {"Header 1": "Centrality", "Header 2": "Vertex Centrality", "Header 3": "Betweenness Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#6", "page_content": "We start with all betweenness centralities being $0$. We start with the first vertex on the left and mark it green.\n\n\nWe then calculate the shortest path to the next one in a BFS manner. The vertex to the right is the next one so we\nmark it green as the target vertex. Because it is directly connected to the other green one nothing changes. Now\nthat we have visited it we mark it gray.\n\n\nWe take the next vertex, the one above and mark it green. We then calculate the shortest path between the two green\nvertices. There is only one shortest path going over the previously visited gray vertex. So we add $1$ to that gray\nvertexes betweenness centrality.\n\n\nWe continue this process until we have visited all vertices once. We then mark the initial vertex on the left as red.\nAll shortest paths that start at this vertex have been calculated. We then pick a new start vertex in a BFS manner.\nRepeat the process until all shortest paths have been calculated.\n\n \nexport const vertexBetweennessGraph = {\nnodes: [\n{id: 1, label: \"0\", x: 0, y: 200},\n{id: 2, label: \"3\", x: 200, y: 200},\n{id: 3, label: \"1\", x: 400, y: 0},\n{id: 4, label: \"1\", x: 400, y: 400},\n{id: 5, label: \"0\", x: 600, y: 200},\n],\nedges: [\n{from: 1, to: 2},\n{from: 2, to: 3},\n{from: 2, to: 4},\n{from: 3, to: 4},\n{from: 3, to: 5},\n{from: 4, to: 5},\n]\n}; \n \nTo normalize the betweenness centrality, you devide by the centrality by following: \n- For an undirected graph: $\\frac{(n-1)(n-2)}{2}$\n- For a directed graph: $(n-1)(n-2)$ \nThe Image below summarizes all the centrality measures we have seen so far and compares the most central vertices. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#7", "metadata": {"Header 1": "Centrality", "Header 2": "Vertex Centrality", "Header 3": "Eigenvector Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#7", "page_content": "Before I start explaining the eigenvector centrality, I describe what an eigenvector is. An eigenvector is a\nvector that does not change its direction when multiplied by a square matrix, only its magnitude changes, i.e. it is only\nscaled. Because a matrix can have multiple eigenvectors, the solution is to allow for only eigenvectors with a magnitude of\n1, i.e. $||\\boldsymbol{v}||_2 = 1$, i.e. the normalized eigenvector. The scaling factor is then called the eigenvalue,\ndenoted by $\\lambda$. The formula for the eigenvector is as follows: \n$$\n\\boldsymbol{Av}=\\lambda \\boldsymbol{v}\n$$ \nThe eigenvector centrality is the eigenvector corresponding to the largest eigenvalue of the adjacency matrix of the\ngraph. The eigenvector corresponding to the largest eigenvalue is also commonly called the dominant eigenvalue/vector.\nThis can just be calculated but is most often calculated using the power iteration method. \nThe eigenvector centrality is an interesting centrality measure. \n> The idea is that a node is important if its neighbors are important. \nWhat makes a vertex important could be any attribute of the vertex, for example if we have\na network of people, their salary. However, the simplest and most commonly used approach is to use the degree\ncentrality as the importance measure. In an undirected graph most commonly the in-degree centrality. \nTo show the idea that the eigenvector centrality is based on the importance of the neighbors, I will use the following\ngraph and calculate the eigenvector centrality using the degree centrality as the importance measure with the power\niteration method. \n \n \n#### Power Iteration Method"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#8", "metadata": {"Header 1": "Centrality", "Header 2": "Vertex Centrality", "Header 3": "Eigenvector Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#8", "page_content": "width={400}\n/> \n \n#### Power Iteration Method \nThe power iteration method is a simple iterative method to calculate the eigenvector corresponding to the largest eigenvalue. \nThe idea is to start with an initial vector $\\boldsymbol{b_0}$ and then multiply it with the adjacency matrix $\\boldsymbol{A}$.\nThen we normalize the resulting vector $\\boldsymbol{b_1}$ and repeat the process until the vector converges. Most often to\ncheck for convergence we calculate the difference between the two vectors and check if it is smaller than a threshold. \n$$\n\\boldsymbol{b_{i+1}} = \\frac{\\boldsymbol{Ab_i}}{||\\boldsymbol{Ab_i}||_2}\n$$ \n\nThe initial vector $b_0$ in the power iteration method is the importance measure, in this case the degree centrality. However,\nthe initial vector can be any non-zero vector and the method will still converge to the same eigenvector. You could interpret\nthis as the eigenvector centrality being the \"true underlying importance\" of the vertices.\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#9", "metadata": {"Header 1": "Centrality", "Header 2": "Vertex Centrality", "Header 3": "PageRank", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#9", "page_content": "\nDo this\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#10", "metadata": {"Header 1": "Centrality", "Header 2": "Vertex Centrality", "Header 3": "Prestige", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#10", "page_content": "In a directed Graph it is possible to analyze the prestige of a vertex, i.e the stature or reputation associated with\na vertex. The vertices relationships however need to resemble this. For example, if a person has a lot of followers\nbut doesn't follow a lot of people, then that person has a high prestige and stature, for example a celebrity. \n#### Popularity \nThe simplest way to measure prestige is to count the number of incoming edges, i.e using the $\\text{indeg()}$ function.\nThis is called popularity. \nexport const localGraph = {\nnodes: [\n{id: 1, label: \"Bob, 1\"},\n{id: 2, label: \"Alice, 2\"},\n{id: 3, label: \"Michael, 4\", color: \"red\"},\n{id: 4, label: \"Urs, 2\"},\n{id: 5, label: \"Karen, 3\"},\n{id: 6, label: \"John, 2\"},\n{id: 7, label: \"Peter, 2\"},\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 1, to: 4},\n{from: 1, to: 5},\n{from: 2, to: 5},\n{from: 2, to: 6},\n{from: 2, to: 3},\n{from: 3, to: 4},\n{from: 3, to: 5},\n{from: 3, to: 6},\n{from: 3, to: 7},\n{from: 5, to: 1},\n{from: 5, to: 2},\n{from: 6, to: 3},\n{from: 6, to: 7},\n{from: 7, to: 3},\n],\n}; \n \n#### Proximity Prestige \nThe proximity prestige measure does not just account for the number of directly incoming edges, but also the number of\nindirectly incoming edges, i.e. the number of paths that lead to the vertex. However, the longer the path, the lower\nprestige from that path is weighted. \nSimply put the proximity prestige is the sum of all paths that lead to the vertex weighted by the length of the path. \nThe formula for the proximity prestige can be summarized pretty simply: \n> The proximity prestige of a vertex is the number of vertices that have a path to the vertex divided by the average\nshortest path length leading to the vertex. \nMore formally: \n$$\n\\text{proximityPrestige}(v) = \\frac{\\frac{|I|}{n-1}}{\\frac{\\sum_{i \\in I}{d(i,v)}}{|I|}}\n$$ \nWhere $I$ is the set of all vertices that have a path to $v$ and $d(u,v)$ is the length of the shortest path from $u$ to\n$v$. \n\n\n\n \n$$\n\\begin{align*}\n\\text{proximityPrestige}(2) &= \\frac{\\frac{1}{(8-1)}}{\\frac{1}{1}} = 0.14 \\\\\n\\text{proximityPrestige}(4) &= \\frac{\\frac{2}{(8-1)}}{\\frac{2}{2}} = 0.29 \\\\\n\\text{proximityPrestige}(6) &= \\frac{\\frac{7}{(8-1)}}{\\frac{10}{7}} = 0.7 \\\\\n\\end{align*}\n$$ \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#12", "metadata": {"Header 1": "Centrality", "Header 2": "Group Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#12", "page_content": "The goal of group centrality measures is to determine the importance of a group of vertices in a graph. These measures\nare based on the vertex centrality measures, but they are more complex and expensive to calculate."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#13", "metadata": {"Header 1": "Centrality", "Header 2": "Group Centrality", "Header 3": "Degree Group Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#13", "page_content": "The degree group centrality is the simplest group centrality measure. It is simply the fraction of the number of\nvertices outside the group that are directly connected to the group. So in the following graph with the group $G$ being\ndefined as $G={v_6,v_7,v_8}$ the degree group centrality would be $\\frac{3}{10}$ so $0.3$. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#14", "metadata": {"Header 1": "Centrality", "Header 2": "Group Centrality", "Header 3": "Closeness Group Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#14", "page_content": "The closeness group centrality measures how close the group is to the other vertices in the graph. It is calculted by\nadding up all inverse distances from the vertices outside the group to the closest vertex in the group. So in the\nsame graph and group $G={v_6,v_7,v_8}$ as above the closeness group centrality would be: \n$$\n1+1+1+\\frac{1}{2}+\\frac{1}{2}+\\frac{1}{2}+\\frac{1}{2}+\\frac{1}{2}+\\frac{1}{2}+\\frac{1}{3} = 6.333\n$$ \nIt can be simply normalized by dividing it by the number of vertices outside the group, which would lead to $0.6333.$"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#15", "metadata": {"Header 1": "Centrality", "Header 2": "Group Centrality", "Header 3": "Betweenness Group Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#15", "page_content": "The betweenness group centrality measures how many shortest paths go through the group. It is calculated by counting\nhow many shortest paths between all the vertices outside the group go through the group. \n\nIf we define our group to contain the vertices $C,E$ from the graph below we can calculate the betweenness group\ncentrality simply by calculating all the shortest paths between the vertices outside the group and counting how many\nof them go through the group. \n \nWe have the following shortest paths between the vertices outside the group: \n- $A \\rightarrow B$\n- $A \\rightarrow C \\rightarrow D$ goes through the group via $C$.\n- $A \\rightarrow C \\rightarrow D \\rightarrow E \\rightarrow G$ goes through the group via $C$ and $E$.\n- $A \\rightarrow C \\rightarrow D \\rightarrow E \\rightarrow F$ goes through the group via $C$ and $E$.\n- $B \\rightarrow C \\rightarrow D$, goes through the group via $C$.\n- $B \\rightarrow C \\rightarrow D \\rightarrow E \\rightarrow F$ goes through the group via $C$ and $E$.\n- $B \\rightarrow C \\rightarrow D \\rightarrow E \\rightarrow G$ goes through the group via $C$ and $E$.\n- $D \\rightarrow E \\rightarrow G$ goes through the group via $E$.\n- $D \\rightarrow E \\rightarrow F$ goes through the group via $E$.\n- $F \\rightarrow G$ \nTherefore 8 of the 10 shortest paths go through the group, so the betweenness group centrality is $\\frac{8}{10} = 0.8$. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#16", "metadata": {"Header 1": "Centrality", "Header 2": "Network Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#16", "page_content": "The idea of network centrality is to measure the centrality of the entire network, i.e. to compare the difference in\ncentrality between the vertices in the network. The goal is then to show how different the key vertices are from the\nrest of the network."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#17", "metadata": {"Header 1": "Centrality", "Header 2": "Network Centrality", "Header 3": "General Network Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#17", "page_content": "To calculate the network centrality the vertex centrality measures are used. For this Linton Freeman defined a general\nformula that returns a value between $0$ and $1$ with the following meanings: \n- $0$ means that all vertices have the same centrality, i.e. the network is a ring network.\n- $1$ means that one vertex has all the centrality, i.e. the network is a star network. \nexport const starGraph = {\nnodes: [\n{id: 1, label: \"1\"},\n{id: 2, label: \"2\"},\n{id: 3, label: \"3\"},\n{id: 4, label: \"4\"},\n{id: 5, label: \"5\"},\n{id: 6, label: \"6\"},\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 1, to: 4},\n{from: 1, to: 5},\n{from: 1, to: 6},\n],\n}; \nexport const ringGraph = {\nnodes: [\n{id: 1, label: \"1\"},\n{id: 2, label: \"2\"},\n{id: 3, label: \"3\"},\n{id: 4, label: \"4\"},\n{id: 5, label: \"5\"},\n{id: 6, label: \"6\"},\n],\nedges: [\n{from: 1, to: 2},\n{from: 2, to: 3},\n{from: 3, to: 4},\n{from: 4, to: 5},\n{from: 5, to: 6},\n{from: 6, to: 1},\n],\n}; \n\n\n\n\n\n\n\n\n\n\n\n \nThe formula is as follows: \n$$\n\\text{networkCentrality}(G) = \\frac{\\sum_{v \\in V}{C_{max} - C(v)}}{Star_n}\n$$ \nWhere:\n- $C(v)$ is the centrality function for a vertex $v$.\n- $C_{max}$ is the maximum centrality of all vertices in the graph, i.e $ C_{max}= \\argmax_{v \\in V}{C(v)}$.\n- The denominator $Star_n$ is the maximal sum of differences between\nthe centrality of a vertex and the maximum centrality of all vertices in the graph, i.e. if the graph was a star graph\nwith the same amount of vertices as the graph $G$, so $n=|V|$ (Is this always the case, no matter the centrality measure?). \nWith the definition above it is now logical why the value is $1$ when the graph is a star graph because the numerator and\ndenominator are the same. Whereas if the graph is a ring graph, i.e. all vertices have the same centrality, then the"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#18", "metadata": {"Header 1": "Centrality", "Header 2": "Network Centrality", "Header 3": "General Network Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#18", "page_content": "With the definition above it is now logical why the value is $1$ when the graph is a star graph because the numerator and\ndenominator are the same. Whereas if the graph is a ring graph, i.e. all vertices have the same centrality, then the\nsum of differences in the numerator is $0$ and the denominator is the maximum sum of differences, which leads to the\nvalue being $0$. \n\nDepending on the definition of the general formula the Sum in the nominator skips the vertex with the maximum\ncentrality since the difference would be $0$. I find the definition above more intuitive, but it is important to\nknow that there are different definitions.\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#19", "metadata": {"Header 1": "Centrality", "Header 2": "Network Centrality", "Header 3": "Degree Network Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#19", "page_content": "For the degree network centrality the denominator is pretty simple, because for a star graph the key vertex will have a\ndegree of $n-1$ and the other vertices will have a degree of $1$. So the denominator is simply $(n-1)(n-2)$ for an\nundirected Graph, if it is a directed Graph then the nominator can just be doubled. \nIf you are working with the normalized degree centrality, then the denominator can be even further simplified to just\n$n-2$."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#20", "metadata": {"Header 1": "Centrality", "Header 2": "Network Centrality", "Header 3": "Closeness Network Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#20", "page_content": "When using the normalized closeness centrality, the denominator is simply $\\frac{n-2}{2}$. I will save you the details\njust trust me bro."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#21", "metadata": {"Header 1": "Centrality", "Header 2": "Network Centrality", "Header 3": "Betweenness Network Centrality", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/centrality.mdx#21", "page_content": "When using the normalized betweenness centrality, the denominator is simply $n-1$, just like with the degree centrality."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#1", "metadata": {"Header 1": "Communities", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#1", "page_content": "Communities are subgraphs (subsets or groups of vertices of the original graph), that are better connected to each\nother than to the rest of the graph. Communities are very important when analyzing social networks and networks in\ngeneral as they often form around a context or a topic such as family, friends, work, hobbies, etc. \nThese communities can then be further analyzed such as to find out who are the most important people in a community,\nwhat is there impact on the community, and how do they relate to other communities. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#2", "metadata": {"Header 1": "Communities", "Header 2": "Neighborhoods", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#2", "page_content": "The neighborhood of a vertex $v$ is the set of all vertices that are connected to $v$ by an edge, and is denoted by\n$N(v)$ or $N_G(v)$ if the graph is not ambiguous. The neighborhood of a vertex is also sometimes referred to as the open\nneighborhood when it does not include the vertex itself $v$, and the closed neighborhood when it does include the vertex\nitself. The default is the open neighborhood, whereas the closed neighborhood is denoted by $N[v]$ or $N_G[v]$. \nexport const neighborhoodGraph = {\nnodes: [\n{id: 1, label: \"a\", x: 0, y: 0, color: \"green\"},\n{id: 2, label: \"b\", x: 0, y: 200, color: \"green\"},\n{id: 3, label: \"c\", x: 200, y: 100, color: \"red\"},\n{id: 4, label: \"d\", x: 400, y: 100, color: \"green\"},\n{id: 5, label: \"e\", x: 600, y: 100},\n{id: 6, label: \"f\", x: 800, y: 0},\n{id: 7, label: \"g\", x: 800, y: 200}\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 2, to: 3},\n{from: 3, to: 4},\n{from: 4, to: 5},\n{from: 5, to: 6},\n{from: 5, to: 7},\n{from: 6, to: 7}\n]\n}; \n\n\nFor the given Graph $G$ and the vertex $c$, the neighborhood $N[c]$ is the set of vertices $\\{a, b, d\\}$.\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#3", "metadata": {"Header 1": "Communities", "Header 2": "Connected Components", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#3", "page_content": "Simply put a connected component is a subgraph of the original graph where all vertices are connected to each other. So\nThere are no disconnected vertices in a connected component. These can quiet easily be seen by eye but the definition\ncan become more complex when we look at directed graphs."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#4", "metadata": {"Header 1": "Communities", "Header 2": "Connected Components", "Header 3": "Undirected Graphs", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#4", "page_content": "In an undirected graph, a connected component is a subset of vertices such that there is a path between every pair of\nvertices in the subset. In other words, a connected component is a subgraph of the original graph where all vertices\nare connected to each other. \nThis could be useful for example to find out if a graph is fully connected or not. If the graph has only one connected\ncomponent, then it is fully connected. If it has more than one connected component, then it is not fully connected. \nIf we think of a communication network, then a connected component would be a group of people that can communicate with\neach other. If there are multiple connected components, then there are groups of people that cannot communicate with\neach other. \n \nTo find the connected components of a graph, we can simply use either a breadth-first search or a depth-first search\nover all vertices. The algorithm would then look something like this: \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#5", "metadata": {"Header 1": "Communities", "Header 2": "Connected Components", "Header 3": "Directed Graphs", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#5", "page_content": "In a directed graph the directions of the edges matter. This gives us two types of connected components, weakly\nconnected components and strongly connected components. \n#### Weakly Connected Components \nWeakly connected components are the same as connected components in an undirected graphs. So you just ignore the\ndirections of the edges. \n \n#### Strongly Connected Components \nStrongly connected components are a bit more complex. In a directed graph, a strongly connected component is a subset of\nvertices such that there is a path between every pair of vertices in the subset, but the path must follow the direction\nof the edges. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#6", "metadata": {"Header 1": "Communities", "Header 2": "Connected Components", "Header 3": "Giant Components", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#6", "page_content": "If a connected component includes a large portion of the graph, then it is commonly referred to as a\n**\"giant component\"**. There is no strict definition of what a giant component is, but it is commonly used to refer to\nconnected components that include more than 50% of the vertices in the graph."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#7", "metadata": {"Header 1": "Communities", "Header 2": "Cliques", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#7", "page_content": "Cliques focus on undirected graphs. A clique is a complete subgraph of the original graph, i.e a subgraph where all\nvertices are connected to each other. Cliques are very important in social networks as they represent groups of people\nthat all know each other, however in a communication network they would represent a group with redundant connections. \nBecause cliques are complete subgraphs, they are very easy to see but also happen to be very rare and hard to find\nalgorithmically. In the graph below the two cliques have been highlighted in red and blue. \n \n\nAlogrithms seem to be seperated in finding a maximal clique or finding cliques of a certain size.\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#8", "metadata": {"Header 1": "Communities", "Header 2": "Cliques", "Header 3": "Clustering Coefficient", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#8", "page_content": "The clustering coefficient is a metric that measures how close a graph is to being a clique (don't ask me why it isn't\ncalled Clique Coefficient). There are two different versions of the clustering coefficient, the global clustering\ncoefficient and the local clustering coefficient, where the global clustering coefficient is just the average of the\nlocal clustering coefficients. \nThe idea behind the local clustering coefficient is to check how many of the neighbors of a vertex are connected to each\nother. If all neighbors are connected to each other, then the local clustering coefficient for that vertex is $1$. More\nformally, the local clustering coefficient for a vertex $v$ is defined as: \n$$\n\\text{localClusterCoeff}(v) = \\frac{2 \\cdot \\text{numEdgesBetweenNeighbors}(v)}{|N(v)| \\cdot (|N(v)| - 1)}\n$$ \nwhere $N(v)$ denotes the set of neighbors of $v$. \nexport const clusterCoeff1 = {\nnodes: [\n{id: 1, label: \"a\", x: 0, y: 100, color: \"red\"},\n{id: 2, label: \"b\", x: 100, y: 200, color: \"green\"},\n{id: 3, label: \"c\", x: 200, y: 100, color: \"green\"},\n{id: 4, label: \"d\", x: 100, y: 0, color: \"green\"},\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 1, to: 4},\n{from: 2, to: 3},\n{from: 2, to: 4},\n{from: 3, to: 4},\n]\n}; \nexport const clusterCoeff0 = {\nnodes: [\n{id: 1, label: \"a\", x: 0, y: 100, color: \"red\"},\n{id: 2, label: \"b\", x: 100, y: 200, color: \"green\"},\n{id: 3, label: \"c\", x: 200, y: 100, color: \"green\"},\n{id: 4, label: \"d\", x: 100, y: 0, color: \"green\"},\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 1, to: 4},\n]\n}; \n\n\n\n\nFor the given Graph $G$ and the vertex $a$, the cluster coefficient is $1$ because all neighbors are connected.\n\n\n\n\n\nFor the given Graph $G$ and the vertex $a$, the cluster coefficient is $0$ because none of the neighbors are\nconnected.\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#9", "metadata": {"Header 1": "Communities", "Header 2": "Cliques", "Header 3": "Clustering Coefficient", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#9", "page_content": "\n\n\n\n\nFor the given Graph $G$ and the vertex $a$, the cluster coefficient is $0$ because none of the neighbors are\nconnected.\n\n\n \nThe global clustering coefficient is then just the average of the local clustering coefficients of all vertices in the\ngraph. \n$$\n\\text{globalClusterCoeff}(G) = \\frac{1}{|V|} \\sum_{v \\in V} \\text{localClusterCoeff}(v)\n$$"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#10", "metadata": {"Header 1": "Communities", "Header 2": "Cliques", "Header 3": "k-Core", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#10", "page_content": "For a k-Core the rules of a clique are slightly relaxed. A k-Core is a subgraph where all vertices are at least connected\nto $k$ other vertices in the subgraph. \n \nAlthough this is a relaxation of the rules, it is still a very strict rule and\ncan lead to vertices that don't fulfill the $k$ connections but are only connected to other vertices in a core to not\nbe included in the core. \n\nDegeneracy of a graph and k-degenerate graphs is exactly this\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#11", "metadata": {"Header 1": "Communities", "Header 2": "Cliques", "Header 3": "p-Clique", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#11", "page_content": "The idea of a p-clique is also to relax the rules of a clique whilst also solving the above-mentioned issue of the\nk-core. In a p-clique, the p stands for a percentage in decimal i.e. a ratio. So in a p-clique at least the given\npercentage of edges of a vertices must be connected to other vertices in the subgraph. \nSo if we have a 0.5-clique, then at least 50% of the edges of a vertex must be connected to other vertices in the subgraph. This\nthen allows for the vertices that don't fulfill the rule to be included in the subgraph for a k-core but are only\nconnected to other vertices in the subgraph to be included in the subgraph."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#12", "metadata": {"Header 1": "Communities", "Header 2": "Cliques", "Header 3": "n-Clique", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#12", "page_content": "\nSometimes cliques are named after the number of vertices they contain. For example a clique with 3 vertices is\ncalled a 3-clique, a clique with 4 vertices is called a 4-clique, etc. this can be generalized to a k-clique. Not an\nn-clique though, that is something else, but when it just says 4-clique it can be ambiguous.\n \nThe idea of an n-clique is that we want a maximal subgraph, i.e. with the most vertices, where each pair of vertices can\nbe connected by a path of length at most n. So a 1-clique is just a normal clique, a 2-clique is a clique where each\npair of vertices can be connected by a path of length at most 2, etc. \n\nThe path doesn't have to be the shortest path, just a path of length at most n. And the path can go over any vertex,\nnot just vertices that are part of the clique.\n \nThis can lead to two interesting scenarios: \n1. The diameter of the subgraph can actually be longer then n. This is due to the path being able to go over any vertex,\nnot just vertices that are part of the clique. So in the example below, the diameter of the subgraph is 3 even though it\nis a 2-clique. \n \n2. The subgraph can be disconnected. In the example below you can see two possible 2-cliques of many for the given graph.\nInterestingly, they are both disconnected, because if one of the vertices inbetween is included, then a different vertex\ncan no longer be included. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#13", "metadata": {"Header 1": "Communities", "Header 2": "Clustering", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#13", "page_content": "In general clustering is the process of grouping similar objects together. In graph theory, the clustering process can\nbe seen as a way to group vertices together i.e. to find communities that aren't based on specific rules like cliques\nor connected components. \nThere are two main approaches to clustering graphs: \n- bottom-up: start with each vertex in its own cluster and then merge clusters together\n- top-down: start with all vertices in one cluster and then split the cluster into smaller clusters"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#14", "metadata": {"Header 1": "Communities", "Header 2": "Clustering", "Header 3": "Girvan-Newman Clustering", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#14", "page_content": "The Girvan-Newman clustering algorithm is a bottom-up approach to clustering which is based on edge betweenness, hence\nit is also called edge betweenness clustering. The idea is to iteratively calculate the edge-betweenness of each edge in\nthe graph and then remove the edge with the highest edge-betweenness. \nThe thought process behind this is that the edges with the highest edge-betweenness are the edges that have the highest\ninformation flow. So by removing these edges, we are removing the edges that connect two groups/clusters/communities\ntogether. Eventually this will lead to two components, which are then the clusters. \n\n\n\n\n\n\n\n \nThe issue with this approach is that it is very computationally expensive. The edge-betweenness of each edge has to be\ncalculated, which is $O(|V||E|)$ and then that has to be done iteratively multiple times so the overall complexity can\nbe summarized to $O(n^3)$ which is not ideal for large graphs."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#15", "metadata": {"Header 1": "Communities", "Header 2": "Clustering", "Header 3": "LPA - Label Propagation Algorithm", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#15", "page_content": "The LPA is a more general algorithm which doesn't have to just be used for clustering graphs it can also be used to\ncluster data in general. However, I will explain it in the context of graph clustering. \n\nMaybe one day this can be done in the context of semi-supervised labeling\n \nLPA consists of 2 parts, the preparation and the actual algorithm. In the preparation we do the following: \n1. We assign each vertex a unique label from $0$ to $|V| - 1$. The labels in the end will be the clusters, which makes\nthis a bottom-up approach. \n2. We perform graph coloring. I will not go into detail about graph coloring here, but the idea is to color the graph\nsuch that no two connected/neighboring vertices have the same color whilst using the least amount of colors possible. \n\nMaybe add a link to the graph coloring chapter if it ever gets written.\n \nOnce the preparation is done, we can start the actual algorithm. The algorithm is very simple: \nFor each color (always in the same order) we go through each vertex (also always in the same order) and check the\nlabels of its neighbors and count how many times each one occurs. If there is a label that occurs more often than the\nothers, then we assign that label to the vertex. If there are multiple labels that occur the same amount of times, then\nthere are two options: \n- If the vertexes label is one of the labels that occur the most, then we keep the label.\n- If the vertexes label is not one of the labels that occur the most, then we assign it the label with the highest value.\nLowest would also work, as long as it is consistent. \nThis is repeated until the labels don't change anymore. The labels in the end then represent the clusters. The algorithm\nis very simple and fast making it a good choice for large graphs. However, it is not\ndeterministic, i.e. it can lead to different results depending on the order of the colors and vertices. This can be"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#16", "metadata": {"Header 1": "Communities", "Header 2": "Clustering", "Header 3": "LPA - Label Propagation Algorithm", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#16", "page_content": "is very simple and fast making it a good choice for large graphs. However, it is not\ndeterministic, i.e. it can lead to different results depending on the order of the colors and vertices. This can be\nmitigated by running the algorithm multiple times and then taking the most common result. \n\nAfter the initial setup, we get the graph below: \n \nWe will work through the graph in the following order: \n- Blue: $B, F$\n- Green: $D, A, H, C$\n- Brown: $E, G$ \n\n\nWe start with vertex $B$ which has the neighbors $A,C,D,E$ with the labels $0,2,3,4$. The vertex $B$ has the\nlabel $1$. Because all the neighboring labels occur once and the vertexes label is not one of the labels we\npick the one with the height value, which is $4$. So we assign the label $4$ to $B$.\n\n\nWe have a similar situation for the next vertex $F$ which gets assigned the label $7$.\n\n\n\nNow we do the same with the green vertices.\n\n\n\nLastly, we process the brown vertices in the given order.\n\nLuckily with this graph, we already have our clusters after the first iteration. We have two clusters, the\nvertices with the label 4 and the vertices with the label 7.\n\n \n \n\nMake my own images where the graph is processed alphabetically. And what if we want more then 2 clusters?\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#17", "metadata": {"Header 1": "Communities", "Header 2": "Clustering", "Header 3": "Louvain Clustering", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#17", "page_content": "The Louvain clustering algorithm is a bottom-up greedy approach to clustering which is based on modularity. So we first\nneed to understand what modularity is. \n#### Modularity \nModularity is a metric that measures the quality of a clustering. The idea is to compare the number of edges within a\ncluster with the number of edges between clusters. A good clustering would then have a lot of edges within a cluster\nand not many edges between clusters. \nModularity is defined as the fraction of edges of a graph within a cluster minus the expected fraction of edges within\na cluster if the edges were distributed randomly. The value of modularity is between $\\frac{-1}{2}$ and $1$, where\nany value above 0 means that the number of edges within a cluster is higher than the expected number of edges within a\ncluster if the edges were distributed randomly. The higher the value, the better the clustering, if the value is above\n0.3 then the clustering is considered to be good. \n$$\n\\text{modularity}(G) = \\frac{1}{2m} \\sum_{i,j \\in V} \\left( A_{ij} - \\frac{deg(i) deg(j)}{2m} \\right) \\delta(c_i, c_j)\n$$ \nwith the following definitions: \n- $A_{ij}$ is the weight of the edge between vertices $i$ and $j$\n- $m$ is the sum of all edge weights so for an unweighted graph $m = |E|$ and for a weighted graph $m = \\sum_{i,j \\in V} A_{ij}$.\n- $\\delta(c_i, c_j)$ is the Kronecker delta function (1 if $c_i = c_j$ and 0 otherwise), which is used to check if two\nvertices are in the same cluster. \n#### The Louvain Algorithm \nThe Louvain algorithm then tries to maximize the modularity of a graph in an iterative process until the modularity\ncannot be increased anymore, hence it is a greedy approach. \nInitially each vertex is in its own cluster. We then iteratively perform the following steps: \n- **Modularity Optimization:** For each vertex we check how the modularity would change if we would\nmove it to a neighboring cluster. If the modularity would increase, then we move the vertex to the neighboring cluster"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#18", "metadata": {"Header 1": "Communities", "Header 2": "Clustering", "Header 3": "Louvain Clustering", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/communities.mdx#18", "page_content": "- **Modularity Optimization:** For each vertex we check how the modularity would change if we would\nmove it to a neighboring cluster. If the modularity would increase, then we move the vertex to the neighboring cluster\nwhich would increase the modularity the most. If the modularity would not increase, then we leave the vertex in its\ncurrent cluster. Once we have gone through all vertices, we move on to the next step.\n- **Cluster Aggregation:** We then aggregate all vertices in the same cluster into a single vertex. This vertex has a\nself-looping edge with a weight equal to the sum of all the edges of the vertices in the cluster. The vertices resembling\nthe clusters are then connected to each other with edges of weight equal to the sum of all the edges between the\nclusters before the aggregation. We then go back to the first step and repeat the process until the modularity cannot be\nincreased anymore. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx#1", "metadata": {"Header 1": "Connectivity", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx#1", "page_content": "Graph connectivity is also known as graph resilience and is a measure of how well a graph can maintain its connectivity\nwhen vertices or edges are removed, i.e. how many vertices or edges can be removed before the graph becomes disconnected\n(from one connected component to multiple connected components) or has a higher number of connected components. \nWith this analysis technique we can find out how robust a graph is, i.e. how well it can handle failures which can be\nvery useful in real world applications such as communication, transportation, etc."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx#2", "metadata": {"Header 1": "Connectivity", "Header 2": "Bridges", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx#2", "page_content": "A Bridge is an edge that if removed would increase the number of connected components in the graph. In the graph below\nyou can quiet clearly see that the edge between vertices $3$ and $4$ marked in red is a bridge. \nexport const bridgeGraph = {\nnodes: [\n{id: 1, label: \"1\", x: 0, y: 0},\n{id: 2, label: \"2\", x: 0, y: 200},\n{id: 3, label: \"3\", x: 200, y: 100},\n{id: 4, label: \"4\", x: 400, y: 100},\n{id: 5, label: \"5\", x: 600, y: 0},\n{id: 6, label: \"6\", x: 600, y: 200}\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 2, to: 3},\n{from: 3, to: 4, color: \"red\", width: 5},\n{from: 4, to: 5},\n{from: 4, to: 6},\n{from: 5, to: 6}\n]\n}; \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx#3", "metadata": {"Header 1": "Connectivity", "Header 2": "Cut Vertices", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx#3", "page_content": "The same idea as a bridge also applies to vertices. A vertex is a cut vertex if removing it would increase the number\nof connected components in the graph. In the graph below you can quiet clearly see that the vertices $3$ and $4$ are cut\nvertices. These cut vertices are very important vertices as they are brokers between different parts of the graph. \nexport const cutVerticesGraph = {\nnodes: [\n{id: 1, label: \"1\", x: 0, y: 0},\n{id: 2, label: \"2\", x: 0, y: 200},\n{id: 3, label: \"3\", value: 5, x: 200, y: 100, color: \"red\"},\n{id: 4, label: \"4\", value: 5, x: 400, y: 100, color: \"red\"},\n{id: 5, label: \"5\", x: 600, y: 0},\n{id: 6, label: \"6\", x: 600, y: 200}\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 2, to: 3},\n{from: 3, to: 4},\n{from: 4, to: 5},\n{from: 4, to: 6},\n{from: 5, to: 6}\n]\n}; \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx#4", "metadata": {"Header 1": "Connectivity", "Header 2": "k-Connected Graphs", "Header 3": "k-Vertex-Connected Graphs", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx#4", "page_content": "A graph is $k$-vertex-connected if it has at least $k+1$ vertices and at least $k$ vertices have to be removed to disconnect\nthe graph. \nThe vertex connectivity of a graph $G$ is the largest $k$ such that $G$ is $k$-vertex-connected. So for example the graph\nbelow has a vertex connectivity of 2, because it is 2-vertex-connected. If we remove the vertices $4$ and $2$ the graph\nbecomes disconnected but if we only remove one vertex the graph stays connected. \nexport const vertexConnectedGraph = {\nnodes: [\n{id: 1, label: \"1\", x: 0, y: 100},\n{id: 2, label: \"2\", value: 5, x: 200, y: 0, color: \"red\"},\n{id: 3, label: \"3\", x: 200, y: 200},\n{id: 4, label: \"4\", value: 5, x: 400, y: 100, color: \"red\"},\n{id: 5, label: \"5\", x: 600, y: 100},\n{id: 6, label: \"6\", x: 800, y: 200},\n{id: 7, label: \"7\", x: 800, y: 0},\n{id: 8, label: \"8\", x: 1000, y: 100}\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 2, to: 4},\n{from: 3, to: 4},\n{from: 4, to: 5},\n{from: 5, to: 6},\n{from: 5, to: 7},\n{from: 6, to: 8},\n{from: 7, to: 8},\n{from: 2, to: 7},\n]\n}; \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx#5", "metadata": {"Header 1": "Connectivity", "Header 2": "k-Connected Graphs", "Header 3": "k-Edge-Connected Graphs", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/connectivity.mdx#5", "page_content": "The same idea as for vertex connectivity also applies to edge connectivity. A graph is $k$-edge-connected if it has at\nleast $k+1$ vertices and at least $k$ edges have to be removed to disconnect the graph. So the graph below is 2-edge-connected\nand also has an edge connectivity of 2. If we remove the edges $(2,5)$ and $(4,5)$ the graph becomes disconnected. \nexport const edgeConnectedGraph = {\nnodes: [\n{id: 1, label: \"1\", x: 0, y: 100},\n{id: 2, label: \"2\", x: 200, y: 0},\n{id: 3, label: \"3\", x: 200, y: 200},\n{id: 4, label: \"4\", x: 400, y: 100},\n{id: 5, label: \"5\", x: 600, y: 100},\n{id: 6, label: \"6\", x: 800, y: 200},\n{id: 7, label: \"7\", x: 800, y: 0},\n{id: 8, label: \"8\", x: 1000, y: 100}\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 2, to: 4},\n{from: 3, to: 4},\n{from: 4, to: 5, color: \"red\", width: 5},\n{from: 5, to: 6},\n{from: 5, to: 7},\n{from: 6, to: 8},\n{from: 7, to: 8},\n{from: 2, to: 5, color: \"red\", width: 5},\n]\n}; \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#1", "metadata": {"Header 1": "Diffusion", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#1", "page_content": "In networks, we can model the spread of information, disease, or other phenomena as a diffusion process. The diffusion\nprocess usually starts with an initial node or a set of initial nodes. The goal is then to model how the information\nspreads through the network. You can imagine why this would be important for modeling the spread of a disease or an\nadvertising campaign on social media where the goal is to reach as many people as possible."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#2", "metadata": {"Header 1": "Diffusion", "Header 2": "Innovation Diffusion", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#2", "page_content": "Already in 1962, Everett Rogers published a book called \"Diffusion of Innovations\" where he describes the spread of a\nnew idea or technology through a population. He split the adoption of a new idea into five stages: \n- **Knowledge/Awareness**: The individual is exposed to the innovation and gains knowledge of the innovation.\n- **Persuasion**: The individual is interested in the innovation and actively seeks information about the\ninnovation.\n- **Decision**: The individual makes a decision to adopt or reject the innovation.\n- **Implementation**: The individual implements the innovation and uses it as a trial.\n- **Confirmation**: The individual finalizes his/her decision to continue using the innovation. \nWhen analyzing the spread of a new innovation, Rogers found that the adoption of a new innovation follows a normal\ndistribution. \n- **Innovators 2.5%**: Innovators are the first individuals to adopt an innovation. Innovators are most often young\nand willing to take risks and have a high social status.\n- **Early Adopters 13.5%**: This is the second-fastest category of individuals who adopt an innovation. These individuals\nhave the highest degree of opinion leadership among the other adopter categories. Early adopters take more time to\nadopt an innovation than innovators due to more careful deliberation.\n- **Early Majority 34%**: Individuals in this category adopt an innovation after a varying degree of time. Most often,\nthe early majority waits to adopt an innovation until they see that the innovation has proven useful for others and are\nin contact with the early adopters.\n- **Late Majority 34%**: Individuals in this category will adopt an innovation after the average member of the society.\nThese individuals approach an innovation with a high degree of skepticism.\n- **Laggards 16%**: Individuals in this category are the last to adopt an innovation. Most often bound by traditions. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#3", "metadata": {"Header 1": "Diffusion", "Header 2": "Innovation Diffusion", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#3", "page_content": "- **Laggards 16%**: Individuals in this category are the last to adopt an innovation. Most often bound by traditions. \n \n\nWe can easily give some examples for the above distribution for when the first iPhone was released: \n- **Innovators (2.5%)**: These were the tech enthusiasts who camped outside Apple stores. They were excited and were\nwilling to embrace the new technology despite its high price and limited features compared to today's standards.\n- **Early Adopters (13.5%)**: The early adopters included individuals who closely followed tech trends and were\nquick to purchase the iPhone once they saw the positive reviews and early adopter experiences. They recognized the\niPhone's potential to change the way people communicate and access information.\n- **Early Majority (34%)**: As the iPhone gained popularity and started to prove its utility, the early majority\njoined in. These individuals might have been initially hesitant but were swayed by the success stories of the early\nadopters.\n- **Late Majority (34%)**: The late majority were more cautious and waited until the iPhone became a mainstream\nproduct. They wanted to ensure that any initial bugs or issues were resolved and that the price had become more\naffordable. Their decision to adopt the iPhone was influenced by its widespread acceptance and integration into daily\nlife.\n- **Laggards (16%)**: Laggards were the last to adopt the iPhone, often sticking with their traditional cell phones\nor resisting smartphones altogether. They were skeptical of the technology's benefits and preferred to maintain\ntheir existing routines and devices.\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#4", "metadata": {"Header 1": "Diffusion", "Header 2": "ICM - Independent Cascade Model", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#4", "page_content": "The Independent Cascade Model (ICM) is a probabilistic diffusion model that is based on the idea that the spread of information\ntravels through neighbors in a network and therefore has a cascading effect. The model is based on the following assumptions: \n- A node can only effect its neighbors.\n- A node can only be in one of two states: active or inactive. For example, a node can be infected or not infected.\n- A node only has one chance to activate its neighbors.\n- A node can only go from inactive to active. \nThe initial setup of the model is as follows: \n- Each edge has an attribute $p \\in [0,1]$, which is the probability that the node will take over the state of its neighbor.\nHow this probability is calculated depends on the application. For example, in the case of a disease, the probability\ncould be based on a persons age and immune system. In the case of an advertising campaign, the probability could be\nbased on the number of friends that have already seen the ad. You could also just use random probabilities.\n- A set of nodes $S$ is selected as the initial set of active nodes. All other nodes are inactive. \nThe model then proceeds in discrete time steps. In each time step, the following happens: \n1. For each node $v \\in S$, the node tries to activate each of its neighbors $u$. The activation is successful with\nprobability $p_{vu}$ so if we generate a random value $r \\in [0,1]$ and it is smaller or equal to $p$. If the activation\nis successful, $u$ is added to the set $S_{new}$.\n2. If $S_{new}$ is empty then the process terminates. Otherwise, $S$ is updated to $S_{new}$ and the process repeats\nfrom step 1. \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#5", "metadata": {"Header 1": "Diffusion", "Header 2": "ICM - Independent Cascade Model", "Header 3": "Spread Maximization", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#5", "page_content": "When working with the ICM model, we are often interested in finding the set of nodes $S$ that maximizes the spread, for\nexample in an advertising campaign. This is a [NP-Hard](../np) problem to solve, but we can use a greedy algorithm to find a good\nbut not necessarily optimal solution. (How is this an NP-Hard problem?) \nWe can denote the spread after the ICM model as $f(S)$ where $S$ is the set of initial nodes. The output of the function\nis the number of nodes that are active after the ICM model has finished. Using this we can then implement a greedy\nalgorithm that wants to maximize the spread, i.e. find the set of nodes $S$ that maximizes $f(S)$. \nHowever, we first need to change a few things about the ICM model to make it easier to work with because the model is\nnon-deterministic. Instead of using a random probability $p$ for each edge and then using a random number generator to\ndetermine if the edge is activated. We can instead use a fixed $p$ and fixed $r$ for each edge. Another possible approach\ncould be to define an \"activation function\" that takes the two nodes as input and defines if the edge is activated or not. \nFor example, we could define the activation function as follows: \n$$\na(u,v) = |u - v| \\leq 2\n$$ \nMost often, when wanting to maximize the spread, for example of an advertising campaign, we are also on a budget. This\nmeans that we can only select a limited number of nodes $k$ as the initial set of active nodes, i.e. $|S| \\leq k$. \nThe greedy algorithm then works as follows: \n\n\nInitialize $S = \\emptyset$.\n\n\nFor each vertex $v \\in V \\land v \\notin S$ compute $f(S \\cup \\{v\\})$.\n\n\nSelect the vertex $v$ where $f(S \\cup \\{v\\})$ is the highest and add it to $S$. If there are multiple vertices\nwith the same value, select one of them randomly.\n\n\nIf $|S| = k$ then terminate, otherwise repeat from step .\n\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#6", "metadata": {"Header 1": "Diffusion", "Header 2": "Linear Threshold Model", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#6", "page_content": "The threshold model is a diffusion model that is based on the idea that a node can only be activated if a certain\nproportion of its neighbors are already activated. The model is based on the same assumptions as the ICM model: \n- A node can only effect its neighbors.\n- A node can only be in one of two states: active or inactive. For example, a node can be infected or not infected.\n- A node only has one chance to activate its neighbors.\n- A node can only go from inactive to active. \nIn the model we define a threshold $t_v$ for each node $v$. The threshold is a value between $0$ and $1$ and defines\nthe proportion of neighbors that need to be active for the node to be activated. For example, if $t_v = 0.5$ then at\nleast half of the neighbors of $v$ need to be active for $v$ to be activated. \nFor the algorithm we then define an initial set of active nodes $S$ and then in each time step we do the following: \n\n\nFor each node $v \\in V \\land v \\notin S$ we compute the proportion of active neighbors $p_v$.\n\n\nIf $p_v \\geq t_v$ then we add $v$ to the set $S_{new}$.\n\n\nIf $S_{new}$ is empty then the process terminates. Otherwise, $S$ is merged with $S_{new}$, i.e. $S = S \\cup S_{new}$,\nand the process repeats back to the initial step .\n\n \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#7", "metadata": {"Header 1": "Diffusion", "Header 2": "Voter Model", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/diffusion.mdx#7", "page_content": "The voter model is a simple probabilistic diffusion model. To start the model, each node is assigned a random state\nwhich is either $0$ or $1$. In each time step, a node is selected at random and then one of its neighbors is also\nselected at random. The node then adopts the state of the selected neighbor. The process repeats until all nodes have\nthe same state."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/eulerianPath.mdx#1", "metadata": {"Header 1": "Eulerian Path", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/eulerianPath.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/eulerianPath.mdx#1", "page_content": "\nSeven Bridges of Königsberg and the Eulerian Path\n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#1", "metadata": {"Header 1": "General Definition", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#1", "page_content": "A Graph is one of the most fundamental but also diverse data structures in computer science. A Graph consists of a set\nof vertices $V$ and a set of edges $E$ where each edge is an unordered pair. Hence, $G=(V,E)$. They are used to\nrepresent relationships between various entities or elements (the vertices) by connecting them with edges. \nFor example, a graph can be used to represent a social network where the vertices are people and the edges represent\nwhether they are friends with each other or not, no edge signifying that they are not friends. In the below graph\n$G=(V,E)$ where: \n- $V=\\{\\text{Bob, Alice, Michael, Urs, Karen}\\}$ and\n- $E=\\{(1,2),(1,3),(2,4),(2,5)\\}$ \nexport const friendsGraph = {\nnodes: [\n{id: 1, label: \"Bob\"},\n{id: 2, label: \"Alice\"},\n{id: 3, label: \"Michael\"},\n{id: 4, label: \"Urs\"},\n{id: 5, label: \"Karen\"}\n],\nedges: [\n{from: 1, to: 2},\n{from: 1, to: 3},\n{from: 2, to: 4},\n{from: 2, to: 5}\n]\n}; \n"}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#2", "metadata": {"Header 1": "General Definition", "Header 2": "Metrics", "Header 3": "Degrees", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#2", "page_content": "If we do some quick analysis of this graph using the degree function which returns the number of edges connected to a\nvertex, we can see that $\\text{deg(Alice)}=3$ and therefore Alice has the most friends in this social network."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#3", "metadata": {"Header 1": "General Definition", "Header 2": "Metrics", "Header 3": "Order", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#3", "page_content": "The order of a graph is the number of vertices in the graph. So in the above example, the order of the graph is 5. So it\ncould also be called an order-5 graph."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#4", "metadata": {"Header 1": "General Definition", "Header 2": "Metrics", "Header 3": "Diameter", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#4", "page_content": "The diameter of a graph is the longest shortest path between two vertices in the graph. So in the above example, the\ndiameter of the graph is 3 as the longest shortest path is between Michael and Karen."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#5", "metadata": {"Header 1": "General Definition", "Header 2": "Metrics", "Header 3": "Density", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#5", "page_content": "The density of a graph is the ratio of the number of edges to the number of possible edges. So in other words how\ndensely connected the graph is. In a directed graph there are $|V|(|V|-1)$ possible edges. Which means the density of\na directed graph is: \n$$\nD = \\frac{|E|}{|V|(|V|-1)}\n$$ \nIn an undirected graph, there are $\\frac{|V|(|V|-1)}{2}$ possible edges. Which means the density of an undirected graph\nis: \n$$\nD = \\frac{|E|}{\\frac{|V|(|V|-1)}{2}} = \\frac{2|E|}{|V|(|V|-1)}\n$$ \nSo in the above example, the density of the graph is $\\frac{8}{20} = 0.4$."}} +{"id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#6", "metadata": {"Header 1": "General Definition", "Header 2": "Graphs of Functions", "path": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx", "id": "../pages/digitalGarden/cs/algorithmsDataStructures/graphsNetworks/generalDefinition.mdx#6", "page_content": "You might be more familiar with Graphs when talking about mathematical functions. In mathematics, a Graph of a Function\nis a visual representation of the relationship between the input values (domain) and their corresponding output values\n(range) under a specific function. \nFormally, a Graph of a Function can be defined as follows: \nLet $f$ be a function defined on a set of input values, called the domain $D$, and taking values in a set of output\nvalues, called the range $R$. The Graph of the Function $f$, denoted as $G(f)$, is a mathematical representation\nconsisting of a set of ordered pairs $(x, y)$, where $x \\in D$ and $y = f(x)$. Each ordered pair represents a point on\nthe graph, with $x$ as the independent variable (input) and $y$ as the dependent variable (output). \nIn other words, the Graph of a Function is a visual representation of how the elements in the domain are mapped to\nthe corresponding elements in the range through the function $f$. \nFor example, consider the following function: \n$$\nf(x) = 2x + 1\n$$ \nIts domain could be the set of all real numbers $\\Bbb{R}$, and its range could also be $\\Bbb{R}$. To represent this\nfunction graphically, we plot points on the Cartesian plane where the $x$-coordinate corresponds to the input value,\nand the $y$-coordinate is the output value obtained by evaluating $f(x)$. \n
\n