<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>MetaOptimize - Latest Comments</title><link xmlns="http://www.w3.org/2005/Atom" rel="http://api.friendfeed.com/2008/03#sup" href="http://disqus.com/sup/all.sup#forumcomments-9c738287" type="application/json"/><link>http://metaoptimize.disqus.com/</link><description></description><atom:link href="http://metaoptimize.disqus.com/comments.rss" rel="self"></atom:link><language>en</language><lastBuildDate>Tue, 29 Nov 2011 18:54:50 -0000</lastBuildDate><item><title>Re: KEA Keyphrase Extraction as an XML-RPC service (code release)</title><link>http://metaoptimize.com/blog/2010/08/18/kea-keyphrase-extraction-as-an-xml-rpc-service/#comment-375353162</link><description>Hey it seems that when I setup the xml-rpc service, the KEA stopwords functionality is not running anymore(It's working if I run it in command line), any idea about this? Thanks!</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">MAX</dc:creator><pubDate>Tue, 29 Nov 2011 18:54:50 -0000</pubDate></item><item><title>Re: Discussion 2.0: Personalization</title><link>http://metaoptimize.com/blog/2011/05/22/discussion-2-0-personalization/#comment-245130201</link><description>It would be nice if something like this could become a reality, but the sad truth of it is that stack exchange-like sites thrive precisely because there's a carrot dangled in front of people's faces: you will become more respected, and people will care more about what you say, if your 'karma' score, or your 'reputation score' are higher.&lt;br&gt;&lt;br&gt;This has been proven time and time again in many fields, among them my favorite example is the story of Virginia Apgar (founder of the Apgar score): prior to publishing about the Apgar score, the rate of surviving live births was abysmally low, but as soon as doctors had something to measure themselves by the survival rate suddenly sky-rocketed -- if for no other reason than because they had something to brag about while playing golf.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Brian Vandenberg</dc:creator><pubDate>Thu, 07 Jul 2011 11:01:44 -0000</pubDate></item><item><title>Re: Discussion 2.0: Personalization</title><link>http://metaoptimize.com/blog/2011/05/22/discussion-2-0-personalization/#comment-214730203</link><description>It disturbed me that the TED talk about "filter bubbles" immensely  overemphasized the flaws of personalization on the web. Nevertheless it's a tough problem.&lt;br&gt;One idea could reuse the aardvark idea. In my personalized profile, the system could serve me with questions that I could possibly answer or discussions I could join, based on my current answering record. This is actively asking me to answer/join, by sending my an email once a week for example.This way it would be more engaging for professionals/experts/scientists, because they will be served with provoking high-level questions/discussions, and the easier questions would be served to less experienced so they have a low entry barrier, and find it more engaging. Occasionally the system has to explore in order to exploit more effectively (reinforcement learning). Basically the experts will be served with some newbie questions to test them and the newbies will be asked to participate into some more serious discussions.Also people have various behavior patterns. Some have interests into various topics, but cover them superficially, others only one but cover it in full depth. Better to compute expertise with respect to a topic. Anyway more mining of profile behaviors is could help.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Oliver Mitevski</dc:creator><pubDate>Tue, 31 May 2011 08:08:26 -0000</pubDate></item><item><title>Re: Discussion 2.0: Personalization</title><link>http://metaoptimize.com/blog/2011/05/22/discussion-2-0-personalization/#comment-214570027</link><description>Sorry, I thought these would make paragraphs :(</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Olivier Breuleux</dc:creator><pubDate>Mon, 30 May 2011 23:05:42 -0000</pubDate></item><item><title>Re: Discussion 2.0: Personalization</title><link>http://metaoptimize.com/blog/2011/05/22/discussion-2-0-personalization/#comment-214569065</link><description>My immediate concern was, too, that personalization would lead to an "I only see what I like" kind of feed. I see now that this is not what you meant, but then I'm confused about what exactly you do mean.If personalization can show me discussions about topics I like, that would indeed be useful. However, I am not sure about filtering the type of discourse: from your post I understand that personalization might skew somebody's feed towards either a flame-war style of discussion or towards a more reasoned academic style. Since arguments from both sides of any issue are considerably worse in the former case than the latter, I am afraid this might lead to an impoverishment of discourse, with facts, evidence and good arguments failing to reach a large segment of the population. To complicate matters even more, there is sometimes a divide where heated discussions lean on one side and reasoned discussions lean on the other.In my opinion, what a great discussion system needs is first and foremost a way to promote facts and sink misinformation. I would also like hate speech and ad hominem to be penalized for everyone. While some people might enjoy hate speech, I do not think society benefits from giving them what they want. A great discussion system needs to improve people, rather than simply fit their biases, and even though I know the latter is not what you intend, it seems very difficult to avoid it. The global effects of a dynamic system can be very difficult to predict, and by trying to make a personalized experience, you might inadvertently end up distributing less quality to less people - I suspect this is an attractor of personalization when it is applied to debate, though you might disagree.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Olivier Breuleux</dc:creator><pubDate>Mon, 30 May 2011 23:04:07 -0000</pubDate></item><item><title>Re: Discussion 2.0: Personalization</title><link>http://metaoptimize.com/blog/2011/05/22/discussion-2-0-personalization/#comment-214359185</link><description>To quote my response to Mike Altarriba:&lt;br&gt;&lt;br&gt;"I believe this is the most common misconception about my &lt;br&gt;proposal. I'm going to address this in an upcoming post. But the short &lt;br&gt;response is: Personalization of discussion doesn't filter to give you a &lt;br&gt;warm fuzzy feeling in your tummy that everything is right in the world. &lt;br&gt;Personalization of discussion gives you stimulating discussion, which &lt;br&gt;often-times means opposing viewpoints. It should also tie in discussions&lt;br&gt; on adjacent topics, to give a broader perspective on the context, &lt;br&gt;assuming you are the sort of person who doesn't like to have blinders &lt;br&gt;on.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Joseph Turian</dc:creator><pubDate>Mon, 30 May 2011 14:02:11 -0000</pubDate></item><item><title>Re: Discussion 2.0: Personalization</title><link>http://metaoptimize.com/blog/2011/05/22/discussion-2-0-personalization/#comment-214349665</link><description>I don't know about this.&lt;br&gt;&lt;br&gt;Personalization of comments streams can also lead to "I only see what I like" type of stream.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">gorbachev</dc:creator><pubDate>Mon, 30 May 2011 13:51:52 -0000</pubDate></item><item><title>Re: Discussion 2.0: Personalization</title><link>http://metaoptimize.com/blog/2011/05/22/discussion-2-0-personalization/#comment-214343698</link><description>To respond to your objections:&lt;br&gt; &lt;br&gt;"audience matters" Agreed. Ideally, personalization would automatically find the right community for you, and filter out the trolls.&lt;br&gt;&lt;br&gt;"text communication makes misunderstanding and anti-social behavior easy" People will learn quickly how to modulate their tone if the system buries their comments or if angry writing leads to stupid responses.&lt;br&gt;&lt;br&gt;"if individuals have their input filtered based on their own &lt;br&gt;subject matter / viewpoint preferences, this will drive their world view&lt;br&gt; such that their positions and beliefs will become even more polarized" Okay, so I believe this is the most common misconception about my proposal. I'm going to address this in an upcoming post. But the short response is: Personalization of discussion doesn't filter to give you a warm fuzzy feeling in your tummy that everything is right in the world. Personalization of discussion gives you stimulating discussion, which often-times means opposing viewpoints. It should also tie in discussions on adjacent topics, to give a broader perspective on the context, assuming you are the sort of person who doesn't like to have blinders on.&lt;br&gt;&lt;br&gt;"the fact that my choices mean that I don't see a particular poster or &lt;br&gt;post doesn't change the fact that said poster / posts are still there, &lt;br&gt;and still affecting the character of the online community." This is an interesting argument about ambient discussion. I'd like to explore it more. Can you give some examples? It seems to me that if something is indirectly stimulating discussion that is relevant to me, I should get shown it.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Joseph Turian</dc:creator><pubDate>Mon, 30 May 2011 13:45:25 -0000</pubDate></item><item><title>Re: Discussion 2.0: Personalization</title><link>http://metaoptimize.com/blog/2011/05/22/discussion-2-0-personalization/#comment-214322132</link><description>Mike, I think there is a mistake in your criticism of personalization. You point out that " the fact that my choices mean that *I* don't see a particular poster or post doesn't change the fact that said poster / posts are still there, and still affecting the character of the online community.".&lt;br&gt;&lt;br&gt;This, however, assumes that there is such a thing as _the_ character of the online community. If ultra-personalization as in this post is done, each person will experience one part of a continuum of different communities. So only troll-intersted people and troll-like people would ever se the trolls, ideally, and hence they would harm no one.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Alexandre Passos</dc:creator><pubDate>Mon, 30 May 2011 13:19:53 -0000</pubDate></item><item><title>Re: Discussion 2.0: Personalization</title><link>http://metaoptimize.com/blog/2011/05/22/discussion-2-0-personalization/#comment-211536930</link><description>I've spent a lot of time participating in online discussions, going all the way back to the electronic bulletin board days of the early 1980s, and there are some things I've noticed:&lt;br&gt;&lt;br&gt;* audience matters - Usenet (the global text-based news and discussion system which preceded the World Wide Web) was, at one time, inhabited almost exclusively by academia, students, some businesspeople, and other professionals. Every Fall, a new crop of students would start showing up, and would have to be acculturated as to the 'netiquette' of the Usenet world. Then, in September 1993, AOL (a public ISP) added direct access to Usenet from their system... and the "Eternal September" began, so named because now Usenet received not a trickle of new individuals once a year, but a continuous flood... and the Signal to Noise ratio dropped precipitously, and stayed that way.&lt;br&gt;&lt;br&gt;* text communication makes misunderstanding and anti-social behavior *easy*, and circumvents the social regulatory mechanisms we have when it comes to face to face interaction&lt;br&gt;&lt;br&gt;* if individuals have their input filtered based on their own subject matter / viewpoint preferences, this will drive their world view such that their positions and beliefs will become even more polarized, while simultaneously giving them the false sense that they hold views which are shared by, and supported by the majority &lt;br&gt;&lt;br&gt;I agree that "The core value of a dis­cus­sion sys­tem is to encour­age stim­u­lat­ing and engag­ing dis­cus­sion."&lt;br&gt;&lt;br&gt;I don't, however, think that personalization will help, because it does not address the effects of disruptive people or disruptive posts... the fact that my choices mean that *I* don't see a particular poster or post doesn't change the fact that said poster / posts are still there, and still affecting the character of the online community.&lt;br&gt;&lt;br&gt;And that, I think, points us to what is needed: A system which fosters a stimulating, engaging *community* which will in turn foster stimulating and engaging *discussion*. &lt;br&gt;&lt;br&gt;To do that, we need a system which provides some substitute for the social checks and balances we have inherent in face to face, community interaction, the checks and balances we've evolved over millennia of social and cultural evolution. I think the work being done on reputation economies and trust economies points us in a productive direction.&lt;br&gt;&lt;br&gt;Since I've already been long-winded enough, I'll stop here, but will be glad to continue this if there is interest.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mike Altarriba</dc:creator><pubDate>Wed, 25 May 2011 11:52:06 -0000</pubDate></item><item><title>Re: PyLucene 3.0 in 60 seconds — Tutorial sample code for the 3.0 API</title><link>http://metaoptimize.com/blog/2010/08/09/pylucene-3-0-in-60-seconds-tutorial-sample-code-for-the-3-0-api/#comment-179261614</link><description>Thanks for the examples. Took me a while to figure out how to build PyLucene on Ubuntu, but after that these examples worked perfectly.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Cerin</dc:creator><pubDate>Wed, 06 Apr 2011 14:47:17 -0000</pubDate></item><item><title>Re: NLP Challenge: Find semantically related terms over a large vocabulary (&amp;gt;1M)?</title><link>http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/#comment-163995757</link><description>I need database to find semantically related verbs</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Thilagaranim</dc:creator><pubDate>Fri, 11 Mar 2011 03:37:59 -0000</pubDate></item><item><title>Re: NLP Challenge: Find semantically related terms over a large vocabulary (&amp;gt;1M)?</title><link>http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/#comment-154416226</link><description>i want information about how to identify the semantic similar words</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Abiya Veni</dc:creator><pubDate>Tue, 22 Feb 2011 22:54:58 -0000</pubDate></item><item><title>Re: Code maintainability, and the joy of outsourcing</title><link>http://metaoptimize.com/blog/2010/03/11/code-maintainability-and-the-joy-of-outsourcing/#comment-132880088</link><description>This early detection leads to a more streamlined and efficient software development process and ensures the output of clean, highly maintainable code. ..</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">PMP Online</dc:creator><pubDate>Sat, 22 Jan 2011 08:05:58 -0000</pubDate></item><item><title>Re: NLP Challenge: Find semantically related terms over a large vocabulary (&amp;gt;1M)?</title><link>http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/#comment-130727774</link><description>I used a very simple llr-type approach, described at &lt;a href="https://probreasoning.wordpress.com/2011/01/16/hello-world/" rel="nofollow"&gt;https://probreasoning.wordpres...&lt;/a&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">probreasoning</dc:creator><pubDate>Tue, 18 Jan 2011 14:01:48 -0000</pubDate></item><item><title>Re: NLP Challenge: Find semantically related terms over a large vocabulary (&amp;gt;1M)?</title><link>http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/#comment-126363187</link><description>So, anything ever happen with this?  I checked out both lists mentioned above, but found nothing.  No new blog posts pertaining to this challenge either.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Griffin</dc:creator><pubDate>Sat, 08 Jan 2011 21:18:21 -0000</pubDate></item><item><title>Re: NLP Challenge: Find semantically related terms over a large vocabulary (&amp;gt;1M)?</title><link>http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/#comment-117138865</link><description>I've found your challenge too late to participate, but I can still answer it :&lt;br&gt;1 - creating good features is probably the hardest part, several possibilities are opened for such a large scale problem : local embedding with thematic dimensions (provided you can gather a few thematic collections), HOOI for the PCA, or you could use my (still undisclosed) method whose results are described here : &lt;a href="http://blog.guillaume-pitel.fr/index.php?post/2010/07/My-neighbours-are-nicer-than-yours-%3A%29" rel="nofollow"&gt;http://blog.guillaume-pitel.fr...&lt;/a&gt; &lt;br&gt;2 - finding the top-K nearest neighbours can be easily done (I think) with LSH or KD-trees&lt;br&gt;&lt;br&gt;As for the speed of feature extraction, I did this experiment once : 500K words as vocabulary, 20 * 40M documents (they were word windows), using my method, it took approx. 30min on a quad core with a Geforce280 (I was also experimenting with GPGPU computing).</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Guillaume Pitel</dc:creator><pubDate>Wed, 22 Dec 2010 13:08:06 -0000</pubDate></item><item><title>Re: NLP Challenge: Find semantically related terms over a large vocabulary (&amp;gt;1M)?</title><link>http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/#comment-116416988</link><description>woo very interesting never thought of this as a solvable problem but i will look at it and give it a good go. &lt;a href="http://tinyurl.com/stop-snoring-remedies" rel="nofollow"&gt;http://tinyurl.com/stop-snorin...&lt;/a&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">anthony</dc:creator><pubDate>Tue, 21 Dec 2010 16:39:17 -0000</pubDate></item><item><title>Re: NLP Challenge: Find semantically related terms over a large vocabulary (&amp;gt;1M)?</title><link>http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/#comment-100318310</link><description>"the" is a content word when it is missing. In other words, a text without "the" is not a typical document at all. The removal of so-called "non-content" words from all documents takes some good techniques off the table.&lt;br&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">bista</dc:creator><pubDate>Sun, 21 Nov 2010 16:50:50 -0000</pubDate></item><item><title>Re: NLP Challenge: Find semantically related terms over a large vocabulary (&amp;gt;1M)?</title><link>http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/#comment-96125900</link><description>&lt;i&gt;The data isn't clean.&lt;/i&gt;&lt;br&gt;&lt;br&gt;Welcome to the real world.&lt;br&gt;&lt;br&gt;&lt;i&gt;I'd love to throw out the numbers (at least the low ones) and the extra characters like dashes.&lt;/i&gt;&lt;br&gt;&lt;br&gt;You can do that if you like.&lt;br&gt;&lt;br&gt;&lt;i&gt;Leaving the goal of "semantically defined" vague doesn't really give us much to shoot at.&lt;/i&gt;&lt;br&gt;&lt;br&gt;Part of this task is to see how people define the problem. Part of the exercise is learning through evaluation and looking at people's outputs. Yeah, it's less well-defined and less academic that way. I find that more interesting.&lt;br&gt;&lt;br&gt;I think it will also be interesting to see if there is a mismatch between people's interpretations of what "semantically related" means and what methods produce a certain interpretation.&lt;br&gt;&lt;br&gt;*Is the goal really to find semantic relationships based on YOUR data, or just to find semantic relationships based on web-mined data?*&lt;br&gt;&lt;br&gt;The goal is to find semantic relationships over a particular vocabulary. You can use the data set that generated that vocabulary. And/or you can use auxiliary data.&lt;br&gt;&lt;br&gt;But I apologize to the extent that you don't like the setup. This is my first time doing a challenge and I'm trying to learn for next time.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Joseph Turian</dc:creator><pubDate>Thu, 11 Nov 2010 00:53:57 -0000</pubDate></item><item><title>Re: Hacker News, automagically organized [unofficial] - MetaOptimize</title><link>http://metaoptimize.com/projects/autotag/hackernews/#comment-95725080</link><description>Would love to see this used to visualize what topics get comments like "this isn't HN" yet still get many upvotes.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">natep</dc:creator><pubDate>Tue, 09 Nov 2010 22:51:23 -0000</pubDate></item><item><title>Re: NLP Challenge: Find semantically related terms over a large vocabulary (&amp;gt;1M)?</title><link>http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/#comment-95264200</link><description>I like this problem, but I think it would be better by making the following adjustments:&lt;br&gt;&lt;br&gt;1) The data isn't clean.  I'd love to throw out the numbers (at least the low ones) and the extra characters like dashes.  If the goal is to find semantic meaning, you're adding more noise than value by including them.&lt;br&gt;&lt;br&gt;2) Leaving the goal of "semantically defined" vague doesn't really give us much to shoot at.  I went for co-occurrence and produced reasonable results.  Another person interpreted it as "similar" and produced something different.  What's the goal here?  Any good algorithm needs a spec.&lt;br&gt;&lt;br&gt;3) Is the goal really to find semantic relationships based on YOUR data, or just to find semantic relationships based on web-mined data?  The data that you zipped up isn't terrible, but it's surprisingly noisy.  We can find better data on Twitter, Wikipedia, or on most other crawls that I've seen.  &lt;br&gt;&lt;br&gt;The goal here is good.  The setup is not.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mikes</dc:creator><pubDate>Mon, 08 Nov 2010 23:06:44 -0000</pubDate></item><item><title>Re: Code maintainability, and the joy of outsourcing</title><link>http://metaoptimize.com/blog/2010/03/11/code-maintainability-and-the-joy-of-outsourcing/#comment-95254660</link><description>I believe outsourcing is necessary in today’s business environment to save on cost. It has become necessary to cut the corner every where to remain competitive. Outsourcing can also help decimate the costs, which are incurred, making the process cost effective It also allows for your company to focus on your core competencies and develop your in-house processes, which in turn reduces lead-time and brings about celerity in the market. &lt;br&gt;&lt;br&gt;Ray&lt;br&gt;&lt;br&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">KPO</dc:creator><pubDate>Mon, 08 Nov 2010 22:26:21 -0000</pubDate></item><item><title>Re: NLP Challenge: Find semantically related terms over a large vocabulary (&amp;gt;1M)?</title><link>http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/#comment-95220141</link><description>I would be very excited if you tried an LLR approach. Ted Dunning was suggesting that style of approach to me too.</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Joseph Turian</dc:creator><pubDate>Mon, 08 Nov 2010 20:10:58 -0000</pubDate></item><item><title>Re: NLP Challenge: Find semantically related terms over a large vocabulary (&amp;gt;1M)?</title><link>http://metaoptimize.com/blog/2010/11/05/nlp-challenge-find-semantically-related-terms-over-a-large-vocabulary-1m/#comment-94768090</link><description>Your preprocessing on the training dataset shows that you tend to find "collocations" but not "semantic related". Moreover, the unique operation on the terms makes the dataset less useful because co-occurrence frequency count is critical for building a reasonable similarity or association measure.
&lt;br&gt;
&lt;br&gt;I see the biggest challenge for this task is efficiency. Since you are only looking for "semantic related", a first order affinity like "PMI" or "LLR" might work better and definitely faster than distributional similarity approaches. "LLR" is more preferential than "PMI" when deal with rare words with frequency less than 5.
&lt;br&gt;
&lt;br&gt;The paper "DIRT – Discovery of Inference Rules from Text" addresses a similary problem with a approximation to speed up the comparison. The vocabulary size in their case is 220,000, 5 time smaller than your vocabulary. However, the work is done eight years ago with a much slower machine than today's.
&lt;br&gt;
&lt;br&gt;Lushan Han
&lt;br&gt;
&lt;br&gt;PhD student from UMBC</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lushan Han</dc:creator><pubDate>Sun, 07 Nov 2010 10:38:27 -0000</pubDate></item></channel></rss>
