There are millions of concepts in the Probase semantic network. We first perform clustering on all the concepts to construct the concept clusters. Concretely, we adopt a *K*-Medoids clustering algorithm [18] to group the concepts into 5,000 clusters. We adopt concept clusters instead of concepts themselves in the graph for noise reduction and computation efficiency. In the rest of this section, we use concept to denote concept cluster. We build a large knowledge graph offline, which is comprised of three kinds of sub-graphs, including (1) instance to concept graph, (2) concept to concept graph and (3) non-instance term to concept graph. Figure 6 shows a piece of the graph around the term *watch*.

**Figure 6.
**A subgraph of heterogeneous semantic network around *watch.***Instance to concept graph.** We directly use the Probase semantic network as the instance to concept graph. \(P\left(c|e\right)\) is served as the typicality probability of concept (cluster) \(c\) to instance \(e\), which can be computed by

\[P\left(c|e\right)=\sum _{{c}^{*}\in c}{P}^{*}\left({c}^{*}|e\right) ,\]

where \({c}^{*}\) represents a concept belonging to cluster \(c\), \({P}^{*}\left({c}^{*}|e\right)\) is the typicality of the concept \({c}^{*}\)to instance \(e\).

** Concept to concept graph** . We assign a transition probability \(P\left({c}_{2}|{c}_{1}\right)\) to the edge between two concepts \({c}_{1}\) and \({c}_{2}\). The probability is derived by aggregating the co-occurrences between all (unambiguous) instances of the two concepts.

\(P\left({c}_{2}|{c}_{1}\right)=\frac{{\sum }_{{e}_{i}\in {c}_{1},{e}_{j}\in {c}_{2}}n\left({e}_{i},{e}_{j}\right)}{{\sum }_{c\in C}{\sum }_{{e}_{i}\in {c}_{1},{e}_{j}\in c}n\left({e}_{i},{e}_{j}\right)}\) ,

where C denotes a set containing all concepts (clusters); and the denominator is applied for normalization. In practice, we only take the top 25 related concepts for each concept for edge construction.

** Non-instance term to concept graph** There are terms that cannot be matched to any instances or concepts, i.e., verbs or adjectives. For better understanding of the short text, we also mine lexical relationships between non-instance terms and concepts. Specifically, we use the Stanford Parser [19] to obtain POS tagging and dependency relationships between tokens in the text, and the POS tagging results reveal the type (e.g., adjective, verb, etc.) of each token. Our goal is to obtain the following two distributions.

\(P(z|t)\) stands for the prior probability that a term \(t\) is of a particular type \(z\). For instance, the word *watch* is a verb with probability \(P\left(verb|watch\right)=0.8374\). We compute the probability as \(P\left(z|t\right)=\frac{n\left(t,z\right)}{n\left(t\right)}\), where \(n\left(t,z\right)\) is the frequency term \(t\) annotated as type \(z\) in the corpus, and \(n\left(t\right)\) is the total frequency of term \(t\).

\(P(c|t,z)\) denotesthe probability of a concept \(c\), given the term \(t\) of a specific type \(z\). For example, \(P\left(movie|watch,verb\right)\) depicts how likely the verb *watch *is associated with the concept *movie* lexically. Speciﬁcally, we detect co-occurrence relationships between instances, verbs and adjectives in Web documents parsed by the Stanford Parser. To obtain meaningful co-occurrence relationships, we require that the co-occurrence be embodied by dependency, rather than merely appearing in the same sentence. We first obtain \(P\left(e|t,z\right)\), which denotes how likely a term \(t\) of type \(z\) co-occurs with instance \(e\):

\(P\left(e|t,z\right)=\frac{{n}_{z}\left(e,t\right)}{{\sum }_{{e}^{*}}{n}_{z}\left({e}^{*},t\right)}\) ,

where \(P(c|t,z)\) is the frequency of term \(t\) and instance \(e\) having a dependency relationship when \(t\) is of type \(z\). Then, taking instances as the bridge, we calculate the relatedness between non-instance terms (adjectives/verbs) and concepts.

\(P\left(c|t,z\right)=\sum _{e\in c}P\left(c,e|t,z\right)=\sum _{e\in c}P\left(c|e\right)×P\left(e|t,z\right)\) .

In the above equation, \(e\in c\) means that \(e\) is an instance of concept \(c\), and we make an assumption that a concept \(c\) is conditionally independent with term \(t\) and type \(z\) when the instance \(e\) is given.

** 4.2.2 Online Concept Inference**

We adopt the heterogeneous semantic graph built offline to annotate the concepts for a query. First, we segment the query into a set of candidate terms which use Probase as lexicon and identify all occurrences of terms in the query. Second, we retrieve the subgraph out of the heterogeneous semantic graph by concentrating on the query terms. Finally, we perform multiple random walks on the sub-graph to find the concepts with the highest weights after convergence.