Movie provider XML configuration file

Are you missing your favourite movie site?

Movie provider XML configuration file

Postby Dolu » Mon Dec 21, 2009 3:31 pm

TViXiE gives you a simple way for developing your own MovieProvider based on your favorite website.

Providers are located in the folder named "\Providers" within the TViXiE main directory.
There are two kind of providers, DLL and XML. In this post I'll try to explain how to make your own XML configuration file.

Beware! Prerequisite :
First of all, you have to be comfortable enough with regular expressions. If not, you can refer to the many websites offering help about it.

I suggest you to download the two softwares below before going further.
- Expresso 3.0 (http://www.ultrapico.com) - Registration is free!
- XML Notepad 2007 (from Microsoft - but any other XML editor should also do the job)

The point is that as it's an xml file, some characters are converted and do not reflect the exact regular expression. That's why using an XML application to read it make it easier to understand and modify.
For instance in the xml configuration file you could find :
Durées:\s(?<1>.*?) 

The real regular expression is :
Durées:\s(?<1>.*?)&nbsp;

The whole configuration file has to be enclosed within <MyMPConfig>...</MyMPConfig> tags.
Below is a sample with "Allocine.fr.ini" :
Code: Select all
<?xml version="1.0" encoding="utf-16"?>
<MyMPConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <!-- Identification of the movie provider (shown in Tvixie) -->

  <Name>Allocine.fr</Name>
  <HomePage>http://www.allocine.fr</HomePage>
  <Language>French</Language>
  <Author>Dolu</Author>
  <Version>0.1</Version>

  <!-- Data retrieval support (true or false, if not present then false) -->

  <SupportsCastAndDirector>true</SupportsCastAndDirector>
  <SupportsCover>true</SupportsCover>
  <SupportsPlot>true</SupportsPlot>
  <SupportsRating>false</SupportsRating>

  <!-- List of urls to look for movies (you can set only one or add as many as you want) -->
  <!-- They are used in the order of the file. -->
  <!-- Enabled = true or false (to disable temporarely an url without remove it from configuration file) -->
  <!-- Url = {0} will be replaced by the title of the movie  -->
  <!--       some websites offers to cycle through result pages, then you could use {1} for it -->
  <!-- RegExp = regular expression used to parse result list, should return 'link' and 'title' within one regexp -->
  <!-- MoviePlotLink = will try to get plot of the movie from those urls (in that particular order), and will stop as soon as it gets one -->
  <!--                 should at least contains one value. -->

  <!-- MaxResult = will stop movie retrieval with this url after it finds this number of movie -->
  <!-- NextPage = only used with page cycle (parameter {1} of the Url used for searching), loop while we find it -->

  <SearchUrls>
    <Search>
      <Enabled>true</Enabled>
      <Url>http://www.allocine.fr/recherche/default.html?motcle={0}&amp;rub=1&amp;page={1}</Url>
      <RegExp>&lt;h.&gt;&lt;ashref="(?&lt;Link&gt;/film/fichefilm_gen_cfilm=.*?)"sclass="link1"&gt;(?&lt;Title&gt;.*?)&lt;/a&gt;&lt;/h.&gt;(?&lt;Original&gt;.*?)(?:&lt;/td|&lt;div)</RegExp>
      <MoviePlotLink>
        <string>http://www.allocine.fr/{0}</string>
      </MoviePlotLink>
      <MaxResult>10</MaxResult>
      <NextPage>"films suivants&lt;/a&gt;"</NextPage>
    </Search>
  </SearchUrls>

  <!-- MaxResult = global limitation for number of movie results -->

  <MaxResult>10</MaxResult>

  <!-- Regular expressions to find Title, Original Title, Year and Runtime -->

  <RegExp_Title>title&gt;(?&lt;1&gt;.*?)s-sAlloCiné</RegExp_Title>
  <RegExp_OriginalTitle>Titresoriginals:s&lt;i&gt;(?&lt;1&gt;.*?)&lt;/i&gt;</RegExp_OriginalTitle>
  <RegExp_Year>desproductions:s(?&lt;1&gt;.*?)&lt;/h</RegExp_Year>
  <RegExp_Runtime>Durées:s(?&lt;1&gt;.*?)&amp;nbsp;</RegExp_Runtime>

  <!-- Regular expressions to look for plot -->
  <!-- options allow to remove specials characters before using regexp -->

  <ReplaceLFbySpaceBeforeRegExpPlot>true</ReplaceLFbySpaceBeforeRegExpPlot>
  <ReplaceCRbySpaceBeforeRegExpPlot>true</ReplaceCRbySpaceBeforeRegExpPlot>
  <ReplaceTBbySpaceBeforeRegExpPlot>true</ReplaceTBbySpaceBeforeRegExpPlot>

  <RegExp_Plot>&lt;tdsvalign="top"sstyle="padding:10\s0\s0\s0"&gt;&lt;divsalign="justify"&gt;&lt;h.&gt;(?&lt;1&gt;(?:.|[trvnf])*?)&lt;/h</RegExp_Plot>

  <!-- Regular expressions to look for genres, directors and actors -->

  <RegExp_Genres>Genres:s(?&lt;1&gt;.*?)&lt;/h</RegExp_Genres>
  <RegExp_Directors>Réaliséspars(?&lt;1&gt;.*?)&lt;/h</RegExp_Directors>
  <RegExp_Actors>Avecs(?&lt;1&gt;.*?)&amp;nbsp;</RegExp_Actors>

  <!-- Regular expressions to look for cover -->
  <!-- RegExp_CoverLinkFront = regexp used on movie page -->
  <!-- RegExp_CoverUrl = used to look for a "galery" url on movie page -->
  <!-- RegExp_HowGotoCoverUrl = regexp to construct and follow "galery" link -->
  <!-- RegExp_CoverLinkFront2 = regexp to extract cover link from "galery" page -->

  <RegExp_CoverLinkFront>&lt;tdsvalign="top"swidth="..."&gt;(?:[stnrvf])*?&lt;imgssrc="(?&lt;1&gt;.*?)"s</RegExp_CoverLinkFront>
  <RegExp_CoverUrl>galerievignette_gen_cfilm=(?&lt;1&gt;.*?).html"\sclass="link5"</RegExp_CoverUrl>
  <RegExp_HowGotoCoverUrl>http://www.allocine.fr/film/galerievignette_gen_cfilm={0}.html</RegExp_HowGotoCoverUrl>
  <RegExp_CoverLinkFront2>class='photo'\ssrc='(.*?)'\stitle</RegExp_CoverLinkFront2>

  <!-- Regular expressions to find Rating, it stops after it finds one non empty result (done in that order) -->

  <RegExp_Rating />

  <!-- Encoding used to send Title, and to read HTML pages -->

  <TitleEncoding>Default</TitleEncoding>
  <ContentEncoding>Default</ContentEncoding>

</MyMPConfig>

That thread will only contains release of the DLL. Configuration files will be posted on other threads to meet actual release method.
If you find it useful or want to submit bug or need new feature, please use this thread.
To submit a bug, please don't forget to mention any useful information (version, configuration file, ...).

I decided to start an "how to" to help people starting their own movie provider configuration file.
I'll add and complete steps from time to time when I could do it.


How to create your own configuration file ?

1 - Create the configuration file


Find a name you want to use for the new movie provider you'll be working on. Usually I use the domain name of the website I'm working on. For this tutorial, let's assume I want to parse the website "MyDomain.com".

Go to the "...\Tvixie\Provider\" folder and create an empty text file "MyDomain.com.xml".

2 - Simple configuration file


With notepad (or any other text file editor) copy/paste the code below :
Code: Select all
<?xml version="1.0" encoding="utf-16"?>
<MyMPConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <!-- Identification of the movie provider (shown in Tvixie) -->

  <Name>
  </Name>
  <HomePage>
  </HomePage>
  <Language>
  </Language>
  <Author>
  </Author>
  <Version>
  </Version>

  <!-- Data retrieval support (true or false, if not present then false) -->

  <SupportsCastAndDirector>
  </SupportsCastAndDirector>
  <SupportsCover>
  </SupportsCover>
  <SupportsPlot>
  </SupportsPlot>
  <SupportsRating>
  </SupportsRating>

</MyMPConfig>


Inside the name tags, put the name ou want to give to your movie provider:
<Name>MyDomain.com</Name>
Do the same for the "HomePage" and "Language" tags :
<HomePage>http://www.mydomain.com</HomePage>
<Language>English</Language>
(all of this will appear in the tvixie movie provider box)
<Author>Your name</Author>
<Version>1.5</Version>

Depending on the data you choose to manage, set "true" or "false" in the following tags :
<SupportsCastAndDirector>false</SupportsCastAndDirector>
<SupportsCover>true</SupportsCover>
<SupportsPlot>true</SupportsPlot>
<SupportsRating>false</SupportsRating>
(I decided to only support cover and plot)

From that point, if you run TViXiE you should be able to see your movie provider in the list, but it won't do anything yet.

3 - Setup the search


We need now to determine the URL to use to do the first search, the one that will give the movie list result.
The website you're working on should have a search box. Enter any movie title and do a search.
Your browser should give you an URL like this :
http://www.mydomain.com/search.php?name=movie+title&param1=dummy
You could have a more complex URL, but the important thing is that you see the title you were looking for.
In the configuration file, the movie title on the URL should be replace by this : {0}. That means that MyMovieProvider will insert the movie title in this place.

While you're on the movie list result, ask your browser to display the source code of the current page. We'll need this to determine the regular expression.
The regexp need to return two values, the movie title and the link to the movie page (no specific order).
In this step, there are many possibilities. You'll have to work with Expresso to look for the good one.
Let's assume, mine looks like this :
/title_(?:exact|substring|popular)/.*?link=/(?<Link>title/tt[^<]*?)/';">(?<Title>[^<].*?)</a>
Depending on the movie link format (absolute or relative URL) we'll construct the MoviePlotLink tag.
It could look like :
<MoviePlotLink>
<string>http://www.mydomain.com/film/{0}</string>
</MoviePlotLink>
or
<MoviePlotLink>
<string>{0}</string>
</MoviePlotLink>

At last, you can decide the maximum number of result you want to get within the MaxResult tag.

It's done, you should now have this more into your configuration file :
...
<SearchUrls>
<Search>
<Enabled>
true</Enabled>
<Url>http://www.mydomain.com/search.php?name={0}&param1=dummy</Url>
<RegExp>/title_(?:exact|substring|popular)/.*?link=/(?&lt;Link&gt;title/tt[^&lt;]*?)/';"&gt;(?&lt;Title&gt;[^&lt;].*?)&lt;/a&gt;</RegExp>
<MoviePlotLink>
<string>http://www.mydomain.com/film/{0}</string>
</MoviePlotLink>
<MaxResult>10</MaxResult>
<NextPage></NextPage>
</Search>
</SearchUrls>

...


How to setup the encoding type?


You can tell which encoding type to use for the title request and for the content retrieval :
Code: Select all
...
  <TitleEncoding>windows-1254</TitleEncoding>
  <ContentEncoding>windows-1254</ContentEncoding>
...
 

Supported encoding are : default, utf-8, windows-XXXX

How to add debug feature to your configuration file?


You can use the "Debug" tag (I usually set it up at the end of the file) :
Code: Select all
...
  <Debug>true</Debug>
</
MyMPConfig> 

There are many XML configuration files already in use by TViXiE. You can take a look to help you doing your own.

Dolu ;)
TViXiE needs you ! You find it useful ? please consider donating to the project at http://www.tvixie.com/
Dolu
Team Member
Team Member
 
Posts: 478
Joined: Wed Sep 10, 2008 8:13 pm
Location: Paris, France

Share On:

Return to Movie Info Providers

Who is online

Users browsing this forum: No registered users and 2 guests

cron