<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Inovia Blog</title>
	<atom:link href="http://blog.inovia.fr/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.inovia.fr</link>
	<description></description>
	<lastBuildDate>Tue, 03 Jul 2012 13:41:13 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>Chef de projet Big Data H/F</title>
		<link>http://blog.inovia.fr/chef-de-projet-big-data-hf/</link>
		<comments>http://blog.inovia.fr/chef-de-projet-big-data-hf/#comments</comments>
		<pubDate>Tue, 03 Jul 2012 13:08:35 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Announcements]]></category>

		<guid isPermaLink="false">http://blog.inovia.fr/?p=239</guid>
		<description><![CDATA[Pour soutenir sa croissance, Inovia recherche actuellement un chef de projet Big Data. Pour les candidatures, veuillez fournir un CV à contact@inovia.fr En tant que Chef de Projet Big Data, vous assistez le Directeur de Pôle sur la coordination et le développement &#8230; <a href="http://blog.inovia.fr/chef-de-projet-big-data-hf/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>Pour soutenir sa croissance, Inovia recherche actuellement un chef de projet Big Data.</h2>
<p><strong>Pour les candidatures, veuillez fournir un CV à</strong> <a href="mailto:contact@inovia.fr?Subject=Candidature">contact@inovia.fr</a></p>
<p><strong></strong>En tant que <strong>Chef de Projet Big Data</strong>, vous assistez le Directeur de Pôle sur la coordination et le développement de projets à <strong>forte volumétrie</strong>. A travers ce poste, vous pourrez acquérir une expérience riche dans une société de référence du web français ; vous développerez vos compétences de gestion de projet orienté data.</p>
<p>&nbsp;</p>
<h3><strong></strong>DESCRIPTIF DU POSTE</h3>
<ul>
<li>Animer une équipe de 2 développeurs et 1 data analyst.</li>
<li>Rédiger les spécifications fonctionnelles et techniques détaillées.</li>
<li>Former son équipe aux best practices Inovia.</li>
<li>Réaliser les chiffrages et les plannings de l’équipe.</li>
<li>Contribuer au savoir faire du pôle de 7 personnes.</li>
<li>Reporter au directeur de pôle le suivi projet.</li>
<li>Formaliser les tâches d&#8217;études pour les ingénieurs.</li>
<li>Monitorer la balance Qualité / Risque / Prix / Délai du projet.</li>
</ul>
<p><strong> La mission envisagée se concentre sur un ensemble d&#8217;études de données analytics à forte volumétrie : de 1 million de lignes à 60 milliards.</strong></p>
<p>&nbsp;</p>
<h3><strong></strong>PROFIL</h3>
<ul>
<li>Diplôme d&#8217;ingénieur Grandes Ecoles, spécialisation IT.</li>
<li>Anglais courant indispensable.</li>
<li>Expérience personnelle ou professionnelle en développement web et en gestion de projet.</li>
<li>Maîtrise parfaite du développement PHP et du SQL. Notions de MDX appréciées.</li>
<li>Des compétences en Hadoop sont un plus.</li>
<li>Vous êtes curieux des innovations technologiques.</li>
<li>Esprit d’analyse et d&#8217;organisation, soucieux du détail.</li>
<li>Plaisir d’apprendre et d’exceller.</li>
<li>Culture générale importante.</li>
</ul>
<p>&nbsp;</p>
<h3>MODALITES</h3>
<ul>
<li>Titre de transport pris en charge à 100%.</li>
<li>Rémunération à 39k brut minimum (selon expérience).</li>
<li>Formation continue aux méthodes de gestion CMMI.</li>
<li>Séminaire annuel au ski d’une semaine entièrement pris en charge.</li>
<li>Sur Paris 9e, dans des locaux de qualité supérieure.</li>
<li>Chaine de production rôdée et très bien outillée.</li>
<li>Equipe de haute volée, reconnue internationalement.</li>
</ul>
<p><strong> Nous vous fournissons tous les atouts de la réussite. Souhaitez vous avoir l’expérience et le plan de carrière que vous espérez depuis toujours ?</strong></p>
<p>Pour les candidatures, veuillez fournir un CV à <a href="mailto:contact@inovia.fr?Subject=Candidature">contact@inovia.fr</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inovia.fr/chef-de-projet-big-data-hf/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chef de projet Application critique H/F</title>
		<link>http://blog.inovia.fr/chef-de-projet-application-critique-hf/</link>
		<comments>http://blog.inovia.fr/chef-de-projet-application-critique-hf/#comments</comments>
		<pubDate>Tue, 03 Jul 2012 13:00:34 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Announcements]]></category>

		<guid isPermaLink="false">http://blog.inovia.fr/?p=232</guid>
		<description><![CDATA[Pour soutenir sa croissance, Inovia recherche actuellement un chef de projet. Pour les candidatures, veuillez fournir un CV à contact@inovia.fr En tant que Chef de Projet, vous assistez le Directeur de Pôle sur la coordination et le développement de projets à forte composante &#8230; <a href="http://blog.inovia.fr/chef-de-projet-application-critique-hf/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>Pour soutenir sa croissance, Inovia recherche actuellement un chef de projet.</h2>
<p><strong>Pour les candidatures, veuillez fournir un CV à <a href="mailto:contact@inovia.fr?Subject=Candidature">contact@inovia.fr</a></strong></p>
<p><strong><br />
</strong>En tant que Chef de Projet, vous assistez le Directeur de Pôle sur la coordination et le développement de projets à forte composante Mobile et Backoffice. A travers ce poste, vous pourrez acquérir une expérience riche dans une société de référence du web français ; vous développerez vos compétences de gestion de projet web.</p>
<p>&nbsp;</p>
<h3><strong></strong>DESCRIPTIF DU POSTE</h3>
<ul>
<li>Rédiger les spécifications fonctionnelles et techniques détaillées</li>
</ul>
<ul>
<li>Animer une équipe de 4 personnes</li>
</ul>
<ul>
<li>Former son équipe aux best practices Inovia</li>
</ul>
<ul>
<li>Réaliser les chiffrages et les plannings de l’équipe</li>
</ul>
<ul>
<li>Contribuer au savoir faire du pôle de 8 personnes</li>
</ul>
<ul>
<li>Reporter au directeur de pôle le suivi projet</li>
</ul>
<ul>
<li>Formaliser les tâches de développement pour les ingénieurs</li>
</ul>
<ul>
<li>Monitorer la balance Qualité / Risque / Prix / Délai du projet</li>
</ul>
<p><strong>La mission envisagée se concentre sur la réalisation d’une plateforme à haute disponibilité, forte criticité et très interconnectée. La plateforme est web et mobile.</strong></p>
<h3><strong><br />
</strong>PROFIL</h3>
<ul>
<li>Diplôme d&#8217;ingénieur Grandes Ecoles, spécialisation IT</li>
</ul>
<ul>
<li>Anglais courant indispensable</li>
</ul>
<ul>
<li>Vous disposez d&#8217;une expérience personnelle ou professionnelle en développement web et en gestion de projet</li>
</ul>
<ul>
<li>Vous maîtrisez le développement PHP et le JavaScript</li>
</ul>
<ul>
<li>Vous êtes curieux des innovations technologiques</li>
</ul>
<ul>
<li>Esprit d’analyse et d&#8217;organisation, soucieux du détail</li>
</ul>
<ul>
<li>Plaisir d’apprendre et d’exceller</li>
<li>Culture générale importante</li>
</ul>
<h3>MODALITES</h3>
<ul>
<li>Titre de transport pris en charge à 100%</li>
</ul>
<ul>
<li>Rémunération à 38k brut minimum (selon expérience)</li>
</ul>
<ul>
<li>Formation continue aux méthodes de gestion CMMI</li>
</ul>
<ul>
<li>Séminaire annuel au ski d’une semaine entièrement pris en charge</li>
</ul>
<ul>
<li>Sur Paris, 9e, dans des locaux de qualité supérieure</li>
</ul>
<ul>
<li>Chaine de production rôdée et très bien outillée</li>
</ul>
<ul>
<li>Equipe de haute volée, composée de 20 experts</li>
</ul>
<p><strong>Nous vous fournissons tous les atouts de la réussite. Souhaitez vous avoir l’expérience et le plan de carrière que vous espérez depuis toujours ?</strong></p>
<p>Pour les candidatures, veuillez fournir un CV à <a href="mailto:contact@inovia.fr?Subject=Candidature">contact@inovia.fr</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inovia.fr/chef-de-projet-application-critique-hf/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Meet us @Horizons Informatiques, 17th November 2011</title>
		<link>http://blog.inovia.fr/meet-us-horizons-informatiques-17th-november-2011/</link>
		<comments>http://blog.inovia.fr/meet-us-horizons-informatiques-17th-november-2011/#comments</comments>
		<pubDate>Tue, 15 Nov 2011 16:13:54 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Announcements]]></category>

		<guid isPermaLink="false">http://blog.inovia.fr/?p=217</guid>
		<description><![CDATA[Pierre, Eric and Bat from Inovia will be present @Horizons Informatiques the 17th of November 2011. We will live tweet the events that will occur during this event. You can find more details about the event here. Free web tech testing and nice job &#8230; <a href="http://blog.inovia.fr/meet-us-horizons-informatiques-17th-november-2011/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Pierre, Eric and Bat from Inovia will be present @Horizons Informatiques the 17th of November 2011. We will <a href="http://twitter.com/#!/inoviateam">live tweet</a> the events that will occur during this event. You can find more details about the event <a href="http://forum.iiens.net/">here</a>.</p>
<h3>Free web tech testing and nice job offers</h3>
<p>You want to test you knowledge about web technologies? Then come and pass the test! A few questions will be asked. The best of you will maybe get one of the 5 positions currently opened @Inovia. Good money, good environment in one of the most skilled team in Paris!</p>
<p>Check out the past <a title="Lead developer position in Zend Framework" href="http://blog.inovia.fr/lead-developer-position-in-zend-framework/">Zend Framework position described</a> on our website to see the advantages we have to offer.</p>
<h3>Discussions and debates</h3>
<p>Just come to discuss with us about your vision of cloud computing, big data or x-commerce. We will have a few drinks, candies and we will demo you some of our products and projects.</p>
<p>Here is the list of the different conferences held:</p>
<ul>
<li>14.00 : <em>Travel shopping by <a href="http://www.amadeus.com/fr/fr.html">Amadeus</a></em></li>
<li>15.15 : <em>Smartphone development by <a href="http://www.aubay.com/">Aubey</a></em></li>
<li>16.30 : <em>BI 2.0, new challenges by <a href="http://www.softcomputing.com/">Soft Computing</a></em></li>
</ul>
<p><span style="font-size: small;"><span class="Apple-style-span" style="line-height: 24px;">Here is a map to access the site: </span></span><a href="http://forum.iiens.net/images/plan2010.png">Map for Horizons Informatiques</a></p>
<p>Sounds great? So see you @Horizons Informatiques!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inovia.fr/meet-us-horizons-informatiques-17th-november-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Inovia Labs now opened!</title>
		<link>http://blog.inovia.fr/inovia-labs-now-opened/</link>
		<comments>http://blog.inovia.fr/inovia-labs-now-opened/#comments</comments>
		<pubDate>Thu, 15 Sep 2011 16:00:15 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Announcements]]></category>

		<guid isPermaLink="false">http://blog.inovia.fr/?p=213</guid>
		<description><![CDATA[We are pleased to deliver you some of our best Zend Framework components. 100% open source, 100% high quality, 100% free! This is only the beginning, more to come soon. Check this out! &#160; &#160; PHP Analytics A Zend Framework &#8230; <a href="http://blog.inovia.fr/inovia-labs-now-opened/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>We are pleased to deliver you some of our best Zend Framework components. 100% open source, 100% high quality, 100% free! This is only the beginning, more to come soon.<br />
<a href="http://labs.inovia.fr" alt="inovia labs"> Check this out!</a><br />
&nbsp;<br />
&nbsp;</p>
<div class="embed">
<h2><a style="text-decoration: none; text-transform: none;" title="PHP Analytics" href="http://labs.inovia.fr/phpanalytics/">PHP Analytics</a></h2>
<h3>A Zend Framework lib to build analytics.</h3>
<p>You need to add analytics and business intelligence in a web app ? PHP Analytics is the way to go. Use out-of-the-box components like Pivot or cross tables, or build your own MDX advanced indicators. All Zend Framework compliant.</p>
<p>&nbsp;</p>
<h2><a style="text-decoration: none; text-transform: none;" title="PG Bench" href="http://labs.inovia.fr/pgbench/">PG Bench</a></h2>
<h3>A collection of scripts to benchmark Postgresql.</h3>
<p>Postgresql is a powerful database, but you need to be an expert to configure it. With Pg Bench, you can test different standard situations and check how your postgresql is performing.</p>
<p>We currently support TPC-H and SSB benchmarks.</p>
<h2><a style="text-decoration: none; text-transform: none;" title="Transl8" href="http://labs.inovia.fr/transl8/">Transl8</a></h2>
<h3>An in-page translator for your web apps.</h3>
<p>An easy-to-use in-app translator for your Zend Framework applications.</p>
<p>For everyone experiencing headaches at translating webapps, we built Transl8, so that even non-technical users can translate applications seamlessly, directly from the application itself.</p>
</div>
<div class="embed"><a href="http://labs.inovia.fr/phpanalytics/"><img class="alignnone size-medium wp-image-151" title="phpanalytics" src="http://labs.inovia.fr/files/2011/09/phpanalytics-400-300x268.png?3de2ce" alt="" width="300" height="268" /></a></p>
<p><a href="http://labs.inovia.fr/pgbench/"><img class="alignnone size-medium wp-image-75" title="pgbench" src="http://labs.inovia.fr/files/2011/09/pgbench-300x275.png?3de2ce" alt="" width="300" height="275" /></a><a href="http://labs.inovia.fr/transl8/"><img class="alignnone size-medium wp-image-76" title="transl8" src="http://labs.inovia.fr/files/2011/09/transl8-300x256.png?3de2ce" alt="" width="300" height="256" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.inovia.fr/inovia-labs-now-opened/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Lead developer position in Zend Framework</title>
		<link>http://blog.inovia.fr/lead-developer-position-in-zend-framework/</link>
		<comments>http://blog.inovia.fr/lead-developer-position-in-zend-framework/#comments</comments>
		<pubDate>Tue, 30 Aug 2011 12:30:09 +0000</pubDate>
		<dc:creator>inovia</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Zend]]></category>

		<guid isPermaLink="false">http://blog.inovia.fr/?p=186</guid>
		<description><![CDATA[Inovia opened a new position in Paris and looks for new talents. During the recruitment phase, you&#8217;ll get to pass two job interviews: one by phone, the other one in our office. During the first interview on the phone, we&#8217;ll &#8230; <a href="http://blog.inovia.fr/lead-developer-position-in-zend-framework/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>Inovia opened a new position in Paris and looks for new talents.</h2>
<p>During the recruitment phase, you&#8217;ll get to pass two job interviews: one by phone, the other one in our office. During the first interview on the phone, we&#8217;ll ask you some technical questions to evaluate your technical level.<br />
<a href="mailto:contact@inovia.fr">So get a job interview and evaluate yourself for free!</a></p>
<p><a href="../wp-content/uploads/2011/08/IMG_3905.jpg"><br />
</a></p>
<h2>Job description</h2>
<p>This job is the perfect experience for young <strong>talented</strong> developers.<br />
Your responsibility will be first to work with the rest of the team on our <strong>open source</strong> libraries and integrate them inside our clients applications.<br />
Then the position can progress to operational <strong>management</strong> (3 developers) within a year.</p>
<h2>General qualifications</h2>
<ul>
<li> min. master or engineering degree</li>
<li> immediately operational in PHP5 / SQL / HTML / CSS</li>
<li> Good theoretical knowledge</li>
<li> fluent in English and French</li>
</ul>
<h2>Required Technical Skill Set</h2>
<ul>
<li> PHP 5 OO</li>
<li>Design patterns and software architecture</li>
<li>HTML / CSS / JS</li>
<li>PgSQL / MySQL</li>
<li>Eclipse</li>
</ul>
<h2>Major plus bonus</h2>
<ul>
<li>Zend Framework</li>
<li>ExtJS, JQuery</li>
<li>CI, Unit testing</li>
<li> Magento</li>
</ul>
<h2>Context</h2>
<p>The team is welcoming and skilled, the office is spacious. You will have a lot of <strong>freedom</strong> and <strong>support</strong> in your daily job.<br />
We keep track of state-of-the-art technologies and like <strong>innovative</strong> approaches.<br />
<strong>Position opened in Paris, France. </strong><br />
<strong>Revenue: 37.5k€ minimum. </strong></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inovia.fr/lead-developer-position-in-zend-framework/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Search engine vs Database in BI part 2: structure is value</title>
		<link>http://blog.inovia.fr/search-engine-vs-database-in-bi-part-2-structure-is-value/</link>
		<comments>http://blog.inovia.fr/search-engine-vs-database-in-bi-part-2-structure-is-value/#comments</comments>
		<pubDate>Fri, 26 Aug 2011 12:30:49 +0000</pubDate>
		<dc:creator>inovia</dc:creator>
				<category><![CDATA[BI]]></category>
		<category><![CDATA[Databases]]></category>

		<guid isPermaLink="false">http://blog.inovia.fr/?p=110</guid>
		<description><![CDATA[In the first part, we discussed the differences between structured and unstructured approaches. Now we will try to see how unstructured data tools can help us to structure data. Adding structure is adding value As we saw earlier, the more &#8230; <a href="http://blog.inovia.fr/search-engine-vs-database-in-bi-part-2-structure-is-value/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>In the first part, we discussed the differences between structured and unstructured approaches. Now we will try to see how unstructured data tools can help us to structure data.</h2>
<h3>Adding structure is adding value</h3>
<div class="wp-caption alignright" style="width: 310px"><a href="http://blog.inovia.fr/files/2011/09/structure.jpg"><img class="  " src="http://blog.inovia.fr/files/2011/09/structure-300x274.jpg" alt="The eiffel tower" width="300" height="274" /></a><p class="wp-caption-text">Without structure, the most visited attraction would have been only a bunch of metal.</p></div>
<p>As we saw <a href="http://blog.inovia.fr/auto-partitioning-on-postgresql-part-1/">earlier</a>, the more the data are structured the more potential your data have. It is only a matter of hiding the complexity to each group of users, which is not always simple. So let&#8217;s stop thinking about the data and take a look at the application and the users.</p>
<p>If we need to evaluate the value of a BI app, one approach I often use is <strong>R.E.C</strong>.: <em>Reference, Empowerment, Comfort</em>. To empower users, we must have a data model as close as possible as people see their job. The more meaningful, relevant structures we can provide, the more value the app will have. So this is definitely a functional question.</p>
<p>So now that we know we need to provide the most meaningful data structures to users to maximize the value out of our application, these questions remain:</p>
<ul>
<li>when data are unstructured at the origin, what can we do to structure them, in a meaningful way?</li>
<li>Should I use a db or a search engine for this?</li>
</ul>
<p>&nbsp;</p>
<h3>The example: a list of recipes</h3>
<div>
<p>A search engine is way better to process natural language than a database. It is also really good to fasten full text search. In general, search engines are great to extract info from unstructured structures. But should the data warehouse be also unstructured for BI purposes?</p>
<p>We will discuss where the value of data is located, and see if the functionalities of current search engines are sufficient. A concrete illustration of partially unstructured data is a list of recipes in full text. Let’s say you have this info about a recipe :</p>
<div id="attachment_126" class="wp-caption alignleft" style="width: 310px"><a href="http://blog.inovia.fr/files/2011/09/recipes.jpg"><img class="size-medium wp-image-126 " src="http://blog.inovia.fr/files/2011/09/recipes-300x225.jpg" alt="recipes" width="300" height="225" /></a><p class="wp-caption-text">Recipes can be really unstructured ! courtesy of Pirate Johnny (c)</p></div>
</div>
<p><span style="font-family: monospace">title : Burger</span></p>
<p><code> course category : entree<br />
complexity to prepare: 1 out of 5<br />
time to prepare: 10 min<br />
description: “ a delicious burger home made with a juicy steak, toasted bread, cryspy lettuce. Can be served with ketchup and love.”<br />
Instructions : “Toast the bread first. Then cook in a frying pan the meat. Salt at the end of the cooking.”<br />
source: “allrecipes.com”</code></p>
<p>&nbsp;</p>
<div>Now you want to be able to run statistics over a large set of recipes. The factual fundamental part of this recipe is the description, which is unstructured.</div>
<p>&nbsp;</p>
<h3>Extracting business related terms: the thesaurus</h3>
<p>From a data perspective, one can see a recipe as a collection, an assemblage of ingredients. A “<em>burger</em>” is composed of “<em>bread</em>”, “<em>steak</em>”, “<em>cheese</em>”, “<em>lettuce</em>” and “<em>ketchup</em>” for instance.<br />
The simplest way to model this situation is to use a <strong>tag</strong> mechanism. So we use a <em><strong>Full Text Indexer</strong></em> to process the description and extract the tags. Of course, I’ll need to get a base of what is a ingredient and what is not (in our burger “<em>love</em>” is not an ingredient, even if some french people would say so). The list of recognizable words is called a <strong><em>thesaurus</em></strong>.<br />
With the help of our Full Text Indexer and the reference thesaurus, each recipe now have a list of ingredient tags associated with it. Now I’m able to count the ten most used ingredients, count the most used ingredient with steak&#8230;<br />
But is using a Full Text Indexer with a simple thesaurus sufficient?</p>
<h3>Adding structure to the maelström of tags</h3>
<p>Common situation is a synonym or a connex term : a “<em>steak</em>” is indeed a “<em>beef steak</em>”. When one wants to build indicators for all the <em>beef</em> recipes, we need to count a recipe with <em>steak</em> as a recipe containing beef. The tags are connected between each others and we must model this in our thesaurus.</p>
<p>What will really sublime the data is indeed the organization of the tags. This organization requires <strong>hierarchies</strong> (<em>steak</em> is a specialization of <em>beef</em>, which is a specialization of <em>meat</em>), <strong>segmentation</strong> (<em>alcohol-free</em>, <em>alcool</em>), <strong>collections</strong> (<em>japanelo</em>, <em>tacos</em>, <em>burritos</em> are in <em>mexican</em> family).<br />
These structures can’t be handled right now by any search engine I know of, but on the other hand only a few database engines are able to manage a <em>thesaurus</em> imho. So here, a search engine combined with a database really make sense. The search engine for the Full Text Indexer capability, and the database for the structure aspect.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inovia.fr/search-engine-vs-database-in-bi-part-2-structure-is-value/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Auto partitioning in postgresql – Part 2</title>
		<link>http://blog.inovia.fr/auto-partitioning-in-postgresql-%e2%80%93-part-2/</link>
		<comments>http://blog.inovia.fr/auto-partitioning-in-postgresql-%e2%80%93-part-2/#comments</comments>
		<pubDate>Thu, 21 Jul 2011 15:24:21 +0000</pubDate>
		<dc:creator>inovia</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[db]]></category>
		<category><![CDATA[postgres]]></category>

		<guid isPermaLink="false">http://blog.inovia.fr/?p=136</guid>
		<description><![CDATA[After having created the partitions, now we will create a maintenance function that will help us running queries on every partition. &#160; We saw in the first part how to dynamically create the needed partitions. Just add this function in &#8230; <a href="http://blog.inovia.fr/auto-partitioning-in-postgresql-%e2%80%93-part-2/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>
After having created the partitions, now we will create a maintenance function that will help us running queries on every partition.</h2>
<p>&nbsp;<br />
We <a href="http://blog.inovia.fr/auto-partitioning-on-postgresql-part-1/">saw in the first part</a> how to dynamically create the needed partitions.<br />
Just add this function in Postgresql (via psql for instance) :</p>
<pre class="brush: sql"> CREATE OR REPLACE FUNCTION run_on_partitions(text,text) RETURNS INTEGER AS $$
DECLARE
partition RECORD;
tablename TEXT = $1;
sql TEXT = $2;
sqlReplaced TEXT;
BEGIN
tablename := tablename || &#039;%p&#039;;
FOR partition IN SELECT relname::text as rel FROM pg_class WHERE relname::text LIKE tablename AND relkind = &#039;t&#039; ORDER BY relname LOOP
sqlReplaced := replace(sql, &#039;&#039;,partition.rel);
RAISE NOTICE &#039;Executing: %&#039;, sqlReplaced;
EXECUTE  sqlReplaced;
END LOOP;
RETURN 1;
END;
$$ LANGUAGE plpgsql;</pre>
<p>&nbsp;<br />
This function is really useful. To call it, simply do:</p>
<pre class="brush: sql"> SELECT run_on_partitions(&#039;tablename&#039;,&#039;CREATE INDEX _idx ON  USING btree(name)&#039;);</pre>
<p>&nbsp;<br />
The tag &lt;PARTITION&gt; will be replaced by the name of the partition derived from the master table ‘tablename’. We can now create indexes, primary keys on each table&#8230;without having to run the query against each table by hand.<br />
Let’s create a full text search index on these items:</p>
<pre class="brush: sql">SELECT run_on_partitions(&#039;tablename&#039;,&#039;CREATE INDEX _fts_idx ON  USING gin (to_tsvector(name::text || &#039;&#039; &#039;&#039;::text) || description));&#039;);</pre>
<p> &nbsp;<br />
The search time has now been greatly improved.<br />
&nbsp;<br />
Yeepa!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inovia.fr/auto-partitioning-in-postgresql-%e2%80%93-part-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>It is summer time !</title>
		<link>http://blog.inovia.fr/it-is-summer-time/</link>
		<comments>http://blog.inovia.fr/it-is-summer-time/#comments</comments>
		<pubDate>Thu, 30 Jun 2011 14:35:24 +0000</pubDate>
		<dc:creator>inovia</dc:creator>
				<category><![CDATA[Project Management]]></category>

		<guid isPermaLink="false">http://blog.inovia.fr/?p=89</guid>
		<description><![CDATA[Right now the weather in Paris is pretty hot. People tend to take large breaks to lunch. Our employees decided to turn this situation into a team building event. Every day, two different persons cook summer recipes for everyone. It &#8230; <a href="http://blog.inovia.fr/it-is-summer-time/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Right now the weather in Paris is pretty hot. People tend to take large breaks to lunch. Our employees decided to turn this situation into a team building event. Every day, two different persons cook summer recipes for everyone.</p>
<p><a href="http://blog.inovia.fr/files/2011/09/photo-300x225.jpg"><img class="size-medium wp-image-90 alignright" src="http://blog.inovia.fr/files/2011/09/photo-300x225.jpg" alt="" width="300" height="225" /></a>It is going on for a week or so and I can tell you as a manager that this is great because:</p>
<p>&nbsp;</p>
<ul>
<li>you earn time, don&#8217;t have to go outside to choose where you gonna eat, what and to end up queuing.</li>
<li>Once a week, you will have to spend 45 min shopping goods and cooking for everybody, but cooking is definitely a more enjoyable activity than waiting to get your order. Inovia people like creative activity, and cooking is considered like this in France.</li>
<li>you eat healthy food. So you are in a better shape for a great afternoon of work.</li>
<li>you earn money, because cooking for 15 people or so cost definitely less than buying already made food. Count 3.5 euros per person, all included.</li>
<li>you share a meal with your coworkers. This is the coolest part !</li>
</ul>
<p>Luckily our office in Paris got an amazing kitchen where any food delight can come true.</p>
<p>We wish you a happy summer time !</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inovia.fr/it-is-summer-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Auto partitioning on Postgresql &#8211; Part 1</title>
		<link>http://blog.inovia.fr/auto-partitioning-on-postgresql-part-1/</link>
		<comments>http://blog.inovia.fr/auto-partitioning-on-postgresql-part-1/#comments</comments>
		<pubDate>Mon, 27 Jun 2011 13:15:57 +0000</pubDate>
		<dc:creator>inovia</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[db]]></category>
		<category><![CDATA[postgres]]></category>

		<guid isPermaLink="false">http://blog.inovia.fr/?p=63</guid>
		<description><![CDATA[We will share a simple approach to migrate existing tables to partitioned versions of them with Postgresql. The partitions will be created on demand with just a few lines of codes. It is useful during off-line migration, and can be &#8230; <a href="http://blog.inovia.fr/auto-partitioning-on-postgresql-part-1/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>We will share a simple approach to migrate existing tables to partitioned versions of them with Postgresql.<br />
The partitions will be created on demand with just a few lines of codes. It is useful during off-line migration, and can be used in production with applications that don’t do so much INSERT.</h2>
<h3>Approach</h3>
<p>Let’s say you want to partition a table based on time period.<br />
In Postgres, we handle partitioning with inheritance. You will have a master table, used to regroup the different partitions, and one table per partition. I let you read the excellent Postgres documentation about this: <a rel="no-follow" href="http://www.postgresql.org/docs/current/static/ddl-partitioning.html" target="_blank">http://www.postgresql.org/docs/current/static/ddl-partitioning.html</a></p>
<p>As you can see, there is a lot of tedious work to create partitions, tables, indexes&#8230; Here we provide you a non optimal solution in terms of insert performance, but definitely more practical in terms of risks and time to settle than the method presented in the manual.</p>
<p>I will show you how to conduct it in a simple case and simplify a bit the manual operation you need to do. From this, you’ll be able to build your own solution.<br />
We got a large fact table called tablename. There is a column on this table called period which can have values like 2011_01 which means January of 2011.<br />
We will create one partition per period. So we will create the tables:<br />
- tablename_2010_01p<br />
- tablename_2010_02p<br />
- tablename_2010_03p<br />
&#8230;</p>
<p>The p at the end helps us detecting that this object is a partition. I admit there is a large space to improve it.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>Create the master table</h3>
<pre class="brush: sql">

-- Create the master table tablename_with_partition:
-- You should use the same exact schema as the one from the non partitioned table.
CREATE TABLE tablename_with_partition
(
id integer NOT NULL DEFAULT nextval(&#039;tablename&#039;::regclass),
&quot;name&quot; character varying(150) NOT NULL,
description text NOT NULL,
period character varying(10) NOT NULL
CONSTRAINT tablename_with_partition_pkey PRIMARY KEY (id)
) WITH ( OIDS=FALSE);
</pre>
<p>&nbsp;</p>
<h3>Auto partition creation during insert</h3>
<pre class="brush: sql">

-- Attach a magic function to the insert of this table:
CREATE OR REPLACE FUNCTION create_partition_and_insert()
RETURNS trigger AS
$BODY$
DECLARE
partition VARCHAR(25);
BEGIN
partition := TG_RELNAME || &#039;_&#039; || NEW.period || ‘p’;
IF NOT EXISTS(SELECT relname FROM pg_class WHERE relname=partition) THEN
RAISE NOTICE &#039;A partition has been created %&#039;,partition;
EXECUTE &#039;CREATE TABLE &#039; || partition || &#039; (check (period = &#039;&#039;&#039; || NEW.period || &#039;&#039;&#039;)) INHERITS (&#039; || TG_RELNAME || &#039;);&#039;;
END IF;
EXECUTE &#039;INSERT INTO &#039; || partition || &#039; SELECT(&#039; || TG_RELNAME || &#039; &#039; || quote_literal(NEW) || &#039;).*;&#039;;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
</pre>
<p>This function will try to insert in the correct partition and creates it if it doesn’t exist. This approach is not the most efficient, but if you got less than 20M records, it works definitely fine.<br />
Here we attach the function defined earlier to the insert operation on tablename_with_partitions:</p>
<pre class="brush: sql">

CREATE TRIGGER tablename_insert_trigger
BEFORE INSERT ON tablename_wtih_partition
FOR EACH ROW EXECUTE PROCEDURE create_partition_and_insert();
</pre>
<p>Notice that I didn’t named the trigger with the name “tablename_with_partitions_insert_trigger”. It is because I planned to substitute the partitionned master table and the old non partionned version later on.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>Migrate the data</h3>
<pre class="brush: sql">

-- Copy data from the non partitioned to the partitioned version:
INSERT INTO tablename_with_partition SELECT * tablename;
</pre>
<p>It can take a while. But it is a nice approach to do this way.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>Testing everything is fine</h3>
<p>Let’s check everything is ok:<br />
- “SELECT count(*) FROM tablename_2010_1;”: you got proper data in one partition.<br />
- “SELECT count(*) FROM tablename_with_partition;” returns the sum of all the elements in the partition<br />
- “SELECT count(*) FROM ONLY tablename_with_partition;” returns 0. It is normal we didn’t insert one line of data in the master table.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>What’s next ?</h3>
<p>If everything is fine, we are almost done. We’ll need to build the indexes and substitute tablename by tablename_with_partition.<br />
We’ll see this in the next article by creating a smart function that can execute on every partition an arbitrary SQL command.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inovia.fr/auto-partitioning-on-postgresql-part-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Search engine vs Database in BI part 1: data structure</title>
		<link>http://blog.inovia.fr/search-engine-vs-database-in-bi-part1-data-structure/</link>
		<comments>http://blog.inovia.fr/search-engine-vs-database-in-bi-part1-data-structure/#comments</comments>
		<pubDate>Mon, 27 Jun 2011 09:24:13 +0000</pubDate>
		<dc:creator>inovia</dc:creator>
				<category><![CDATA[BI]]></category>

		<guid isPermaLink="false">http://blog.inovia.fr/?p=59</guid>
		<description><![CDATA[BI apps face a real data deluge these days. The engineers need new approaches to deal with this. Is a search engine, well known for handling humongous data sets, adapted to a BI context ? Business intelligence is the science &#8230; <a href="http://blog.inovia.fr/search-engine-vs-database-in-bi-part1-data-structure/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h2>BI apps face a real data deluge these days. The engineers need new approaches to deal with this. Is a search engine, well known for handling humongous data sets, adapted to a BI context ?</h2>
<p>Business intelligence is the science of gathering and structuring data to help making decisions. We will call Business Intelligence BI for short. One major technical challenge is that BI generally involves dealing with large volume of data, which cause troubles to the classic database approach. The search engines (SEs) are a new approach to handling data. Is a search engine a go-to solution for BI usage ?</p>
<p>SEs are particularly good for dealing with large amount of data. This quality is based upon smart architectural solutions to common problems. In essence a SE :</p>
<ul>
<li>crawls raw data</li>
<li>indexes the different data (classify and interpret)</li>
<li>provides functionalities to look for data related to a theme (search)</li>
</ul>
<p>To illustrate our talk, we&#8217;ll take as an example a BI program that target employee&#8217;s emails. All the company&#8217;s communications are classified and then one can search for emails and computes indicators on this data.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>Data structured vs. unstructured: how ambiguous your data are</h3>
<p>The classification process in BI have to understand the meaning of each email. If I write down this information:</p>
<p>&nbsp;</p>
<p><strong>baptiste.manson@inovia.fr pierre.cornic@inovia.fr I&#8217;m not here now.</strong></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>We can interpret different facts:</p>
<ul>
<li>Baptiste is not here and notified Pierre;</li>
<li>Baptiste is not here and notified Pierre just now;</li>
<li>Baptiste and Pierre are not here.</li>
</ul>
<p>We can say that the data is ambiguous.</p>
<p>Now if the data we received are:</p>
<p>&nbsp;</p>
<p><strong>baptiste.manson@inovia.fr to pierre.cornic@inovia.fr said &#8220;I&#8217;m not here right now&#8221;.</strong></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>Now we understand without ambiguity what is the related fact to the piece of information:</p>
<ul>
<li>Baptiste said to Pierre that he wasn&#8217;t here when he sent the email.</li>
</ul>
<p>It is due to the new structure elements that we added: baptiste.manson@inovia.fr *to* pierre.cornic@inovia.fr *said* *&#8221;*I&#8217;m not here right now*&#8221;*.</p>
<p>To be able to understand what is going on in the company, we need the data to be structured.</p>
<p>Search engines are particularly known for dealing with unstructured data. They are performant when a user wants to compare an unstructured question to unstructured documents, like books or websites. Take the most famous one: Google. Finding an exact match can be complicated.<br />
For instance, I looked for what obama said to sarkozy last week and I typed : &#8220;obama said to sarkozy last week&#8221;&#8230; I find first a quote from Sarkozy to Obama. Then the next links are related to common declarations of the two men together. Only the fourth link was about a quote from Obama to Sarkozy.</p>
<p>The factual details described in my search and in the websites are not correctly interpreted.<br />
Another issue is that “last week” is badly intepreted, as I wanted to look for declarations within last week, and not articles containing the words “last week”. The root of the issue is the unstructured aspect of the data I was looking for.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>The business item concept</h3>
<p>We saw that structured data are essential in a BI context to remove ambiguities. SEs are not particularly designed for dealing with structured data, even though a solution exists: one should deal with &#8220;business items&#8221;. This concept, proposed by Exalead, is in practice really hard to maintain. Check <a href="http://blog.exalead.com/2009/10/05/database-vs-search-engines-the-space-locality-bottleneck/" target="_blank">this article</a> for further info. We&#8217;ll see why in the next paragraph.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>Consistency of the data: what is conform</h3>
<p>My first paragraph was about data structure. The new SE support data structure to some extend, introducing the concept of &#8220;business item&#8221;. This concept brings clever solutions to performance wise technical issues but also introduce functional limitations.</p>
<p>The first obvious limitation is to ensure consistency. Consistency is to make sure that every single data respects a standard. For instance, that an email has a sender, and this sender must exist in our user referential. The conformity checks must be supported at all cost by the application using the SE. As there is no centralized description of the whole structure with constraints, it implies that every developer must check carefully the details. To be fair, large database based applications in BI must also drop some of the conformity check features for performance reasons.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>Manage changes: how the data repository can live after its conception</h3>
<p>The second problem is that It is common in BI to go back on already stored data and discover a new way to use them, a new interesting indicator to build. Like splitting the mail addresses in two, one part for the name and another for the domain.</p>
<p>Usually, data inside a SE are dead data, they cannot be changed in a new way. On the contrary, a database can easily update later data already stored.<br />
Sure, Solr and Exalead provide such functionality but it is in a limited upsert way (cancel and replace). It means that if one wants to update data, it needs to extract every business items, change it and push again in the repository. It is like not having the “search and replace” function in excel. This limitation causes headaches to engineers when the functional requirements are changing.<br />
This can explain why some people that run a Proof of Concept around a SE as a main data repository can be disappointed later on because they can&#8217;t easily manage changes.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<h3>Se or not SE ?</h3>
<p>One quality in SE is its speed to index large volume of unstructured data. But its defunct is to store data in an ambiguous way and not let users change the structure later on.</p>
<p>This is why at Inovia we decided to let the SE work only as an extension of a database. In our guidelines, the SE shouldn&#8217;t be the main repository of data.</p>
<p>We have a database that allows us to finely structure data and a search engine for the unstructured data when the database is not relevant anymore. Read <a href="http://gilbane.com/search_blog/2009/02/native_database_search_vs_comm.html" target="_blank">this article</a> for further info about the features of SE over databases.</p>
<p>Postgres got a mechanism like this called Full Text Search. We will discuss in another post a concrete business case that successfully mixed the two approaches, database and search engine.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.inovia.fr/search-engine-vs-database-in-bi-part1-data-structure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using apc
Object Caching 849/849 objects using apc

Served from: blog.inovia.fr @ 2013-05-23 16:11:29 -->