forked from wicknicks/dissertation
-
Notifications
You must be signed in to change notification settings - Fork 0
/
chapter2.tex
269 lines (196 loc) · 41.5 KB
/
chapter2.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
\chapter{Real World Context}
The existing definitions of the word context are usually tied to the applications they are supporting. In the photo tagging problem, photos can be taken in a variety of situations, and appropriate context can be obtained from a multitude of data sources. Existing definitions of context do not provide enough clues as to what context is relevant when many data sources are present. Should we consider all available objects and relations across time? Or, does context consist of co-occurring events only? Or, is it simply the set of objects in a social network? Or, is it a combination of both? In this chapter, we look deeper into these definitions, and extract a relation-centric perspective of contextual information. This perspective will equip us to associate all relevant objects, irrespective of their type, as context, and at the same time, provide a reasoning framework to evaluate which source is beneficial and which is not.
Our justification for the use of context in a personal face tagging applications begins with the observation: \textit{A photo captures the state of the real world at an instant of time}. Hypothetically, if there existed a database of all real-world entities, the various events occurring at different parts of the world, the roles objects played in these events, then we can simply query such a database to obtain the context for a photo. All the people who were physically closest to the camera at that time are going to be in the photo, and those who were distant from it will not be present. Such a database does not exist, but with the World Wide Web and mobile phones being omnipresent in our daily lives, a number of data sources and sensors are emerging which capture different facets of the real world. It can be argued that all their content can be semantically merged to construct our `real-world entity' database. The accurate and scalable merging of all the data sources in a scalable manner is an extremely challenging one. Since we are tagging only a specific set of photos, we will restrict our problem to selectively merge content which is relevant to photos.
In the following sections, we will look at some previous definitions of context, and why they are insufficient to selectively merge the content of various data sources. We will present our relation-centric view of context to help reason about which data sources are relevant and which are not, for a given photo. Finally, we present the model of context for the personal photo tagging problem.
\section{Previous Definitions}
One of the earliest studies on context was reported by Bill Schilit et al.\ in \cite{schilit1994context}. The focus of this study was how to build software in dynamic environments. The dynamics of the environments were largely due to people requiring different computational services at the different times, the modality of request (through a mobile device or through a workstation), and the environment of the device (are there cameras and projectors nearby if the task requires video conferencing?). This software-centric view of context highlights the importance of two things. One, context is always described with respect to an object. In this case it is the software which runs on processors distributed in a real world environment. Second, context is used to determine how this object interacts with the entities near it? For example, Schilit uses the example that a workstation should automatically load his favorite text editor when he approaches it; and a rooster music sample must be played whenever fresh coffee is prepared. Both very different and precise interactions even though they might share common background (environment or participating entities). We would not expect a text editor to be shown when coffee is prepared, and the rooster music to be played when an employee walks to a workstation.
\begin{figure}[t]
\centering
\includegraphics[width=0.5\textwidth]{media/chapter2/dey-def.png}
\caption{Information related to the situation of an object.}
\label{fig:anind-def}
\end{figure}
In his seminal paper, Anind Dey \cite{dey2001understanding} describes context \textit{as any information that can be used to characterize the situation of an entity. An entity is a person, place or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves}, as shown in figure \ref{fig:anind-def}. He proceeds to explain this definition with the example of an ``indoor mobile tour", arguing that there are there are two additional pieces of information which can be used: \textit{weather} and \textit{presence of other people}. if the user is present with his friends, they might visit sites that are of interest to everybody. There the presence of other people is important context. Because the tour is occurring indoors, weather does not affect the application. It is true that the weather has no direct effect on the application but what about the following scenarios:
\begin{itemize}
\item Could we use the weather information to serve different drinks in the cafeteria to boost the experience of the visitors? On a cold day, placing the hot chocolate kiosk next to the entrance and the ice cream kiosk closer on a warmer day might boost some sales.
\item If the tour is similar to Alcatraz, where a ferry ride takes people to the island, and back from it, a storm brewing in the ocean could lead to disrupted ferry services. Should the application warn its users who are leisurely touring at this time? Or should they continue the tour at the same pace, miss the last ferry and spend the night at Alcatraz?
\end{itemize}
They proceed to define Context-Aware computing as follows: \textit{A system is context-aware if it uses context to provide relevant information and/or services to the user, where relevancy depends on the user's tasks}. But, we need to ask ourselves why a system which uses this ``additional information" should be considered a context-aware system? There are numerous systems which would simply consider these ``additional information" as regular inputs. What is different between a system which takes in these inputs and processes them as regular data, and one which processes them as context?
\begin{figure}[t]
\centering
\includegraphics[width=0.75\textwidth]{media/chapter2/ka-obs.png}
\caption{Henricksen's observation about temporality of Context.}
\label{fig:karen-obs}
\end{figure}
Karen Henricksen et al.\ \cite{henricksen2002modeling} make the following interesting observation: Context information exhibits a range of temporal characteristics. Some context information can be static, for example the attributes of people using a system (for example, the date of birth of a person). But a large amount of information is dynamic. For example, the current geo position of a person or her social network, as shown in figure \ref{fig:karen-obs}. There is no straightforward way to obtain this dynamic information other than through sensors. But, such a approach tightly couples the application logic to the types of sensors used, and requires the system to convert the input data to usable representations. For example, the application requires explicit modules to convert GPS coordinates to readable addresses. The problem with such an approach is that there are many ad-hoc modules built to tackle the sensors, and therefore causing the context-awareness to be tied to a specific application.
More recently, Vaninha Vieira et al.\ \cite{vieira2011designing} uses a rule centric view of context to design their context sensitive system, Cemantika. Vaninha defines a contextual element as any piece of data or information which can be used to characterize an entity in an application domain, whereas the context of an interaction between an agent and an application is the set of instantiated contextual elements that are necessary to support the task at hand. Context awareness, for them, is to explicitly switch the task the system is executing under different conditions. For this they explicitly model the \textit{context sources} which includes heterogeneous and external sources like sensors, user dialog interfaces and databases. Figure \ref{fig:va-def} shows various data sources providing context. Some data sources are preferred over others depending on certain conditions predefined in the system. This allows the various processes to operate independently of the type of sources. It should be noted that the use of ontologies in describing knowledge and context sources is becoming increasingly popular (more similar systems are described in chapter 3).
\begin{figure}[t]
\centering
\includegraphics[width=\textwidth]{media/chapter2/va.png}
\caption{Modern context aware systems obtain data from different sources.}
\label{fig:va-def}
\end{figure}
\section{Relation-Centric View of Context}
The common ground behind these definitions is their object centric view of context. Context is largely a set of objects that ``surround'' a primary object (whose context is in question). This view is insufficient while addressing applications which are broader in scope, the photo tagging application for example, where users can take photos in very diverse environment, and a large number of sensors and sources of context exist. Specifically, it is non-trivial to identify which subset of available data qualifies to be relevant context for the given photo, and which is not. The two examples from chapter 1, are shown in figures \ref{fig:naaman-icmr} and \ref{fig:example-kasturi-show}. In order to tag the photo on the left, we exclusively used conference schedule information. To tag the photo on the right, we used personal information. Our motivation in this chapter is to extend the above definitions of context to allow context aware systems to better extract relevant context from these various sources.
Relations between entities change with time. The most relevant context for a photo is the set of relationships between entities at the time of photo capture. We define context for the photo as the \textbf{``set of relationships between real-world entities at the time of photo-capture"}. The entities include events, objects such as people, their pets or organizations they are affiliated to and geographical objects, such as landmarks (Eiffel Tower, Niagra Falls), restaurants, stadiums or highways. This definition must not seen as tightly coupled to photos, but can be applied to any multimedia object (such as audio, videos or tweets) or any real-world object at a given time. Figure \ref{fig:cn-def} shows the relations between the primary object and other real world objects contained in the sources change from time $T_1$ to $T_2$.
\begin{figure}[t]
\centering
\includegraphics[width=\textwidth]{media/chapter2/cn.png}
\caption{Utilizing relations to define context for the primary object.}
\label{fig:cn-def}
\end{figure}
Relationships can be of different types. They can be simple labels such as \texttt{friend-of} or \texttt{father-of} signifying a social relationship. Or they can impose constraints with relations such as \texttt{located-at} or \texttt{participant-of}, which relates an object to a location or event respectively, and asserts a constraint on its spatial attribute. In this dissertation, we will see that such relations, which impose property constraints are critical in algorithmically determining which information is relevant context.
\begin{figure}[t]
\begin{minipage}[b]{0.48\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter2/naaman.jpg}
\caption{Mor Naaman at ICMR.}
\label{fig:naaman-icmr}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[b]{0.45\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter1/kasturi-show.jpg}
\caption{Kasturi and Jain.}
\label{fig:example-kasturi-show}
\end{minipage}
\end{figure}
Using this relation centric view, we now look at how the examples from chapter 1 can be formulated in a systematic process to discover context. A system to construct context networks must establish a set of relationships to connect its nodes. For the two photos, we choose \textbf{participant-of}, which indicates that an object is a participant in an event, and \textbf{subevent-of} which indicates that an event is occurring within another super-event. Thus, any object can be related to other events through a \texttt{participant-of} edge, and events can be related to others events as well as entities through \texttt{subevent-of} and the \texttt{participant-of} edge respectively. Let us also assume that we have available to us the four types of data sources: event sources, place databases (like yelp.com), weather information sources and social networking information.
\begin{figure}[t]
\begin{minipage}[b]{0.48\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter2/naaman-1.png}
\caption{Primary objects.}
\label{fig:naaman-example-1}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[b]{0.45\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter2/naaman-2.png}
\caption{Associating \texttt{conference event} using the \texttt{participant-of} relation}
\label{fig:naaman-example-2}
\end{minipage}
\begin{minipage}[b]{0.48\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter2/white.png}
\label{fig:naaman-example-x}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[b]{0.45\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter2/white.png}
\label{fig:naaman-example-y}
\end{minipage}
\begin{minipage}[b]{0.48\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter2/naaman-3.png}
\caption{Associating the keynote event.}
\label{fig:naaman-example-3}
\end{minipage}
\hspace{0.5cm}
\begin{minipage}[b]{0.45\linewidth}
\centering
\includegraphics[width=\textwidth]{media/chapter2/naaman-4.png}
\caption{Associating the participants using the \texttt{participant-of} relation}
\label{fig:naaman-example-4}
\end{minipage}
\end{figure}
Figures \ref{fig:naaman-example-1} through \ref{fig:naaman-example-4} shows how the two relationships can be used to gather context for the photo in figure \ref{fig:naaman-icmr}. Figure \ref{fig:naaman-example-1} shows the initial graph created using the \texttt{photo-capture-event} and the photographer, \texttt{entity:ent81}. Figure \ref{fig:naaman-example-2} shows the result of adding context by associating an event with \texttt{entity:ent81}. Given this new graph with three nodes, a context aware system can find more context by trying to find objects which can be related through the two edges. Since one of the nodes is a \texttt{conference} event, it proceeds to find events occurring within it, and adds the keynote event, which also happens to be the super-event of the \texttt{photo-capture-event}. The result is shown in figure \ref{fig:naaman-example-3}. Finally, the keynote event is extended with relations to associate subevents or participants. In this case, the only new context available is the set of participants of the keynote event. These two entities are associated with the event as shown in figure \ref{fig:naaman-example-4}.
In the above example, given a graph containing primary objects, we grew it by relating objects from the real-world using a fixed set of relationships \{\texttt{participant-of}, \texttt{subevent-of}\}. No matter how many different sources or object types are included, we can use the relations and their semantics to reason which source is suitable, and extract context from it, if any, to construct and grow context networks.
Because of the type of relationships chosen, some information which was readily available (weather, place or social networking, for example) was not associated. But if we extend the relationship set to contain another relation \texttt{occurs-at}, then the place where the conference was held will be included in the context network. Similarly, the inclusion of a \texttt{friend-of} relation can relate entities with each other. If the social networking source reports that \texttt{entity:ent81} was a friend of Mor Naaman, an additional edge would be introduced between these nodes in the context network. Thus, relations are the key in determining which objects are context and which are not, and how they are related to the primary objects. Figure \ref{fig:context-network} shows one such context network for Mor Naaman's photo taken at ICMR 2013 with the additional \texttt{friend-of} edges.
\begin{figure}[h]
\centering
\includegraphics[width=0.75\textwidth]{media/chapter2/naaman-friends.png}
\caption{Context Network for Mor Naaman's photo with the additional \texttt{friend-of} edge.}
\label{fig:context-network}
\end{figure}
It must be noted that asserting a relation-centric view of context is not discounting the importance of the object centric view. The object centric view has been shown to be powerful in specific domains, where applications work within specific environments. The relation-centric view is largely suited for broader applications where there are very few assumptions exist on the scope of environments and changes in relationships between different entities in the environments.
\section{Modeling Context for Real-world Problems}
Traditionally, context has been modeled for specific domains. Henricksen et al.\ \cite{henricksen2002modeling} model context using context graphs to represent these models, and \cite{reignier2007context} provides a technique to construct petri nets for given context graphs to implement context-aware behavior. Although, these techniques work for specific domain (for example, \cite{reignier2007context} presents a domain of class presentations, and \cite{henricksen2002modeling} presents a case study in context-aware personal communications), creating context graphs \textbf{at system design time} for each and every case which might occur in the real world is impossible. Also, with the rapidly changing relations in the real-world, it is not clear what are the good principles and practices to build scalable systems around context graphs.
If we are to represent context networks computationally, we first need to understand the basic building blocks of such a network. The following sections list these blocks, and present their properties which are essential in constructing such networks. These blocks are fairly generic, and can be utilized to model context for various applications.
% NOTES
%The requirements must be split across different available tools (a more systems approach to modeling). Henrickson's CxG approach. Get rid of context awareness FULLY? 2.4.1, 2.4.2 part of 2.3; Dynamic linking is actually talking about source declaration. Write context awareness as a separate topic in the end of the chapter -- ``future work'', how the same tool now assist in multiple applications which involve photo taking.
\subsection{Object Types and Semantics}
Context is always specified with respect to a real world object at a given time. This \textbf{primary object} must be uniquely identifiable in the computational system, and must be an instance of one of the known classes. This object must have some real world attributes. For example, if the entity is an event, then the interval during which it occurs and the location of occurrence are two real world attributes. In the tour guide application, the visitor is one possible primary object whose context needs to be determined. In the photo application, possible types of interest include the \texttt{photo-capture-event}, the \texttt{photographer}, various possible events which can occur in the world, people and places where photography can occur.
Different entities bring with them very specific real-world semantics. While incorporating them into the context model, these semantics must be preserved. For example, an event exists only for a fixed time interval. People can only be present at one place at one point in time. The \texttt{photographer} is a person, and therefore inherits the property of being at only one place at a time. In chapter 4, we will see that these semantic properties play a very important role in the context discovery algorithm.
\subsection{Relationships and Constraints}
% two main points: sources and objects are decoupled, but relations cannot be.
% either declare them, or embed them in the discovery logic.
% a sweet spot can be found between genericness and richness.
Context is the set of relationships between real-world entities at a point in time. Instead of finding entity instances of a specific type, and qualifying them as context, in this work the relationship structure between entities is considered to be context. These relationships impose constraints on the instances they relate. For example, the subevent relation imposes a constraint on the spatiotemporal properties of the two events it relates. Such constraints are very common in real-world relationships. Another example of a real-world relation can be seen in naturally occurring chemical reactions, where it is not sufficient to have just two reactants at the same time and place, but the reaction could demand very specific environmental factors (like temperature or pressure). Thus, asserting constraints in terms of environmental factors. The context modeling tool should permit the specification of such specific constraints.
The relationships and constraints are not black boxes. In our work we decouple the context discovery logic from the sources and type of entities, but cannot do the same with relationship types. Their semantics must be \textbf{declared} or \textbf{tightly-coupled} to the modules of a context-aware system, and must be used to reason which entity is part of a context network and which is not. Pushing our example of chemical reactions, given all the materials and conditions which can be present at a given region, the relationships and constraints between different materials must be `machine-readable' to simulate the various ensuing reactions. Because we are fixing the number of relation types in our system, care must be taken to ensure that these relations are generic enough, at the same time are rich enough in terms of semantics and constraints.
In our application of photo tagging, social relations and presence of people in a particular event are used to rule them out of other events. Why? Because, a person can be present only at one place at one point in time, and if the events are too far apart or contain a very distribution of people than what a person commonly co-participates with, the chances of that person being present in that photo is very low. Such a \textbf{participation} and \textbf{co-occurrence} relation, which is generic yet semantically rich, must be exclusively utilized in a context discovery algorithm for photos. More concretely, consider a 10 level deep class hierarchy of entities in an application's ontology. The relationships which relate instances of classes at the top of hierarchy are more generic than the ones below, but the relations who only relate instances at the bottom of the hierarchy are richer. For example the \texttt{participant} relation only asserts the presence of an object at an event. But the \texttt{keynote-event-host} is a very specific relation which can make more assertions about the two objects in question (a keynote event at a conference and the host here is an academic person who has obtained a PhD and continues doing research at research establishment). The drawback being that the relationship is used in very few objects, and therefore cannot be extensively used in reasoning about entities in different domains.
\subsection{Temporal Semantics}
\begin{figure}[t]
\centering
\includegraphics[width=0.75\textwidth]{media/chapter2/allen.png}
\caption{Intervals and their relations in Allen's Interval Algebra.}
\label{fig:allen}
\end{figure}
Modeling context requires the ability to express relationships which assert temporal constraints. For example, during a lunch break at a conference, there are no co-occurring session events; a banquet is always the last event at a conference, which may be followed by one or more workshops. Temporal relations have been studied in literature \cite{allen1983maintaining, wolter2000spatio}, and can be reused for this purpose. Temporal relationships defined in Allen's Interval Algebra are shown in figure \ref{fig:allen}. Figure shows these relationships. In order to relate events with entities, we use the \texttt{occurs-during} relation defined in \cite{gupta2011managing}. For example, the event X is said to \texttt{occur-during} the interval $I$ iff X's temporal extent is equal-to $I$.
Additional relations \texttt{occurs-sometime-during} and \texttt{occurs-during-n} are also defined in \cite{gupta2011managing}. An event is said to \texttt{occurs-sometime-during} another interval $I$ if its interval is completely within $I$, but not \texttt{equal-to} the extent of $I$. This is similar to the \texttt{during} relation in figure \ref{fig:allen}. The \texttt{occurs-during-n} is used to represent events which occur multiple times within a given interval. This relation is accompanied by an arithmetic function $\phi(n)$ which asserts constraints on the number of occurrences within the interval.
\subsection{Spatial Semantics}
Real-world events play an important role in context, and therefore, we need to pay special attention to the spatial relationships between entities and events. A large amount of literature is available on spatial representation and reasoning framework originally known as RCC-8. Some of its common relationships are shown in figure \ref{fig:rcc8}. The important relations in the figure are \texttt{disconnected} (B and D), \texttt{externally-connected} (B and C), \texttt{tangential-proper-part} (A and B), \texttt{contained-in} (E and D) and \texttt{partial-overlaps} (C and D). In order to represent spatial relations between events and entities, we use the \texttt{occurs-at} relation defined in \cite{gupta2011managing}. An event E is said to \texttt{occur-at} region $S$ if the spatial extent of E completely lies within $S$. The relations between the different regions in figure \ref{fig:rcc8} are in table 2.1:
\begin{figure}[h]
\centering
\includegraphics[width=0.75\textwidth]{media/chapter2/rcc8-combined.png}
\caption{Spatial regions which can be represented in the RCC-8 framework.}
\label{fig:rcc8}
\end{figure}
\begin{table}[h]
\centering
\begin{tabular}{ | l | c | c |}
\hline
\textbf{Relation} & \textbf{Symbol} & \textbf{Example} \\ \hline
Disconnected & $DC$ & B $DC$ D \\ \hline
Externally Connected & $EC$ & B $EC$ C \\ \hline
Tangential Proper Part & $TPP$ & A $TPP$ B \\ \hline
Completely Contained-In & $CI$ & E $CI$ D \\ \hline
Partially Overlaps & $PO$ & C $PO$ D \\ \hline
\end{tabular}
\caption{RCC-8 relations in figure \ref{fig:rcc8}}
\end{table}
Additional relations \texttt{occurs-somewhere-at} and \texttt{occurs-at-n} are also defined in \cite{gupta2011managing}. An event is said to \texttt{occurs-somewhere-at}, if the actual region of occurrence is completely contained the given region, but not fully overlapping with it. The relation \texttt{occurs-at-n} is used when the event occurs at multiple places within the given region. This relation is accompanied with a function $\Psi(n)$, which asserts a constraint on the number of occurrences.
\subsection{Real-World Knowledge}
Modeling real world knowledge is critical in context based systems. Examples of knowledge are: An academic conference has at least one keynote talk; or Sodium reacts with atmospheric oxygen at room temperature; or water expands when it freezes. By explicitly modeling such networks of knowledge, we provide the opportunity to influence extraction of context from different sources. For example, if an object is associated with a keynote talk, data about the co-occurring conference is definitely going to be relevant context. Effectively, knowledge is the relations between real-world entities that have been documented so far. The primary motivation behind knowledge bases in context-aware systems is to transfer human experiences to the system. In a context discovery system, such knowledge bases play a pivotal role in reasoning about which relations need to be discovered. Given partially known information about an entity, knowledge indicates what general relations it shares with other entities, provides insight about what data sources must be searched for, and what can be ignored. In the above example, given the partial information about keynotes, and knowledge that keynotes occur with conferences, we can progress to discovering related conferences. At the same time, ignore all data sources relevant to social networks, sports games and weather information.
\subsection{Source Agnosticism}
Context is obtained from data sources or sensors. Therefore, it must be imperative to separate the sources from context. Context must be modeled irrespective of the nature of sources or their query abilities \cite{yerneni1999computing}. This provides the following advantage: context is now entirely defined in terms of objects, their possible inter-relationships and real world knowledge decides how objects are related to each other, irrespective of what data actually can be obtained. Second, with the growing number of data sources, personal and public sensors, this allows system engineers to plug-and-play different sources with ease, without disturbing the model and discovery algorithms.
This design requirement also brings a problem. How does context-aware system know what data sources are at its disposal, and how does it interact with them? The short answer to this question is to utilize data integration technology \cite{doan2005semantic, halevy2001answering, lenzerini2002data}. Data integration techniques provide a uniform query interface to a multitude of autonomous data sources, irrespective of their native query capabilities or data storage formats. This might not be most ideal in the long run as these technologies were built to support RDBMS-like analytics operations, and we might benefit by adding more ``context operations" into the integration layers, thereby allowing more fruitful grounds for query optimization. But for our current needs, it is advisable to re-use the ideas presented in these techniques to realize a fully working context-based system. The later chapters will describe in detail the data integration framework used to obtain data for the photo application.
There are many tools which partially support encoding models. For example, the ontology specification (OWL 2) supports specification of various entities, their class inheritance structure (\texttt{is-a}), their inter-relationships (where the inter-relationships can have their own hierarchy structure). But encoding time and spatial semantics is a relatively recent capability \cite{hobbs2006time} in such frameworks, and not ready for prime-time for applications like real-world context discovery. In our work, relations expressed in OWL are simply labels with domain and range restrictions and sometimes cardinality constraints, their semantics being entirely declared and maintained within the discovery engine.
\section{Context for Personal Photos}
In this section, we list the specific objects, relationships which will be used in constructing context networks, as shown in figure \ref{fig:context-network-large}, for tagging personal photos. Once such a context network is constructed, the search space is derived from it. The types used in the contextual domain, but not limited to, are the following:
\begin{figure}[h]
\centering
\includegraphics[width=0.85\textwidth]{media/chapter2/context-network-large.png}
\caption{A context network representing real-world entities and their relationships.}
\label{fig:context-network-large}
\end{figure}
\begin{itemize}
\item \textbf{Events}: events such as real world conferences, parties, trips or weddings, and their structure (for example, what kind of sessions, talks and keynotes are occurring within a particular conference). We also model the photo as media facet of a \texttt{photo-capture-event}, which signifies the moment where a person took a photo with his/her camera. Events in our framework are modeled according to the modeling primitives described in \cite{gupta2011managing, westermann2007toward}.
\item \textbf{Objects}: entities model users, people in their social graph, people with whom he/she corresponds with using email and other messaging services. The different organizations which can participate in events mentioned above. For example, a company Acme Corporation where persons X and Y are colleagues working in the same team.
\item \textbf{Geographical Objects}: Places play an important role. A place can referenced by a geo position, for example (33.643036, -117.841911), or an address for example ``Donald Bren Hall, University of California, Irvine''.
\end{itemize}
In order to relate the above objects, we use the following types of relationships, along with their constraints and semantics:
\begin{itemize}
\item \textbf{Spatiotemporal Relations}: Events occur during specific time intervals, and at some location. We use the relations \texttt{occurs-at} and \texttt{occurs-during} to model these properties \cite{gupta2011managing}. They are formulated using recursion as follows: if \texttt{E} is an event and \texttt{I} is a time interval, \texttt{E} occurs during the interval \texttt{I}, if it also occurs during all subintervals of \texttt{I}. Similarly, if \texttt{R} is a region, \texttt{E} occurs at \texttt{R}, if it occurs at all subspaces of \texttt{R}.
\item \textbf{Subevent Relation}: An event spatiotemporally contains all its subevents. The subevent relationship is an irreflexive, asymmetric and transitive relationship between a pair of events \cite{gupta2011managing}. For example, a talk event is a subevent of the conference during which it happens. A subevent relation between two events \texttt{S} and \texttt{C} entails the following spatio-temporal constraint: If \texttt{C} is the subevent of \texttt{S}, the location of occurrence of \texttt{C}, \texttt{C.location}, is completely contained within \texttt{S.location} and its interval of occurrence, \texttt{C.duration} is completely contained within \texttt{S.duration}.
\item \textbf{Participation Relation}: The relation \texttt{participant-of} is a relation between an event and object. If an person is performing a functional role in an event, s/he is said to be a participant-of that event. Note that this relation constraints the spatio-temporal boundary where the entity could be present during the interval of the event. If a person \texttt{P} participates in an event \texttt{E}, the location of \texttt{P}, \texttt{P.location} must be contained with the space of \texttt{E}'s occurrence, \texttt{E.location} at the time of its occurrence, \texttt{E.duration}.
\item \textbf{Social Relation (knows)}: This relation is defined in the FOAF ontology \cite{brickley2010foaf} as ``\texttt{knows} relates a Person object to another Person object that he or she knows". This is used to model relations in social networking sources, such as Facebook or Twitter, between people.
\end{itemize}
The above classes of contextual data can be obtained from a variety of data sources. Examples of data sources range from mobile phone call logs and email conversations to Facebook messages to a listing of public events at upcoming.com. We assign each of our data sources into one of the following classes:
\begin{itemize}
\item \textbf{Metadata}: Photos contain very valuable metadata in the form of EXIF tags. Prior research in multimedia systems has used them extensively to annotate photos with tags such as \texttt{landscape, portrait, indoor, outdoor} with very high accuracy \cite{boutell2004photo, sinha2008concept}. In our work, we use time and location tags from EXIF to populate the attributes of \texttt{photo-capture-event}s. Although we do not use them in our current work, the high importance of sensors embedded inside mobile phones deems their mention. Sensors like gyroscope, inclinometer, accelerometers and ambient light sensors can provide interesting cues about the environment \cite{patterson2005assisted, siewiorek2003sensay}.
\item \textbf{Personal Data}: include all sources which provide details about the particular user whose photo is to be tagged. Some examples of personal data sources include Google Calendar, Email and Facebook profile and social graph. Details include common personal attributes like name, date-of-birth, place of residence, place of work, the various they are attending (personal calendar), their personal preferences (type of food, music concerts, activity preferences).
\item \textbf{Social Data}: include all sources which provide contextual information about a user's friends and colleagues. For example, LinkedIn, Facebook and DBLP are some of the commonly used websites with different types of social graphs. The information includes the personal information of the user's friends, past or future activities where they participate together, their common friends (communities in social networks \cite{backstrom2006group, krawczyk2009communities}) and common preferences.
\item \textbf{Public Data}: include all sources which provide information about public organizations (for example restaurants, points of interest or football stadiums) or about public events (for example conferences, fairs, concerts or sports games). Some sources include Yelp, Upcoming, DBLP and Factual.
\end{itemize}
\begin{figure}[h]
\centering
\includegraphics[width=0.85\textwidth]{media/chapter2/personal-social-public-data-sources.png}
\caption{Personal, social and public data sources which can contribute relevant context.}
\label{fig:personal-social-public-sources}
\end{figure}
Figure \ref{fig:personal-social-public-sources} shows some commonly available personal, social and public sources. Social and public data sources are enormous in size, containing information about billions of events and entities. Trying to use them directly will lead to scalability problems similar to those faced by face recognition and verification techniques. But, by using personal data, we can identify which parts of social and public sources are more relevant. For example, if a photo was taken at Staples Center, Los Angeles, CA, an indoor stadium, we only need consider public events which such as concerts or sports games in the area. Thus, the role of personal information is twofold. \textbf{Firstly}, it provides contextual information regarding the photo. \textbf{Secondly}, it acts as a bridge to connect to social and public data sources to discover interesting people connected to the user who might be present in the event and therefore, the photo.
At this point we should revisit the \textbf{dynamic linking} property of a discovery algorithm. Given a stream of photos taken during a time interval, the source which contributed interesting context for a photo might not be equally useful for the one appearing next. This is because sources tend to focus on a specific set of event types or relationship types, and the two photos might be captured in different events or contains persons with whom the user maintains relations through different sources. For example, two photos taken at a conference might contain a user's friends in the first, but with advisers of these friends in the next. The friends might interact with the user through a social network, but their advisers might not. By using a source like DBLP, the relations between the adviser and friends can be discovered. We say that the temporal relevance of these context sources is \textbf{\textit{low}}. This requirement will play an important role in the design of our framework, as now, sources are not hardwired to photo, but instead need to be discovered gradually.
In short, we saw the different definitions of the term Context, and the different implications it has on solving problems. This chapter presented a relation centric view of context, and the implications it has in modeling context for building systems to solve problems using context based techniques. We also outlined the different types of contextual information which will be used in our specific application of tagging faces in photos. Chapter 4 will present a technique to construct context networks similar to the one shown in figure \ref{fig:context-network-large} using metadata, personal, social and public sources as shown in figure \ref{fig:personal-social-public-sources}. The next chapter, will present a short survey of various techniques relevant to our problem, and highlight the important ideas which helped develop the algorithms presented in this dissertation.
% Our justification for the use of context in a personal face tagging application begins with the observation: \textit{For a given user, the correctness of face tags for a photograph containing people she has never met is undefined}. This observation prepares us to understand what context is, and how contextual reasoning assists in tagging photos. The description of any problem domain requires a set of abstract data types, and a model of how these types are related to each other. We define contextual types as those which are semantically different from these data types, but can be directly or indirectly related to them via an extended model which encapsulates the original one. Contextual reasoning assists in the following two ways. \textbf{First}, contextual data restricts the number of people who might appear in the photographs. We can also argue that all the personal data of a user (her profile on Facebook, LinkedIn, email exchanges, phone call logs) provides a reasonable estimate of all these people who might appear in her photos. \textbf{Second}, by reasoning on abstractions in the contextual domain, we can infer conclusions on the original problem. We exploit this property to develop our algorithm in the later sections. Though context based search space pruning can be applied to a variety of recognition problems, we focus on tagging people in personal photos for concreteness, where, the image and person tag form the abstractions in the problem domain, and events and event based relations constitute the contextual domain.