Page 1 of 2 12 LastLast
Results 1 to 15 of 17

Thread: Wikipedi to compliment or standalone - questions

  1. #1
    Registered Dan Morgan's Avatar
    Join Date
    Feb 2004
    Location
    UK
    Posts
    283

    Wikipedi to compliment or standalone - questions

    Hi guys,

    I have been looking around at the whole Wikipedia thing and there are some questions. Obviously Wikipedia is a good idea to promote existing or a standalone site, but is there any way to automate the content, like some script which can grab an node like in AWS and branch out from there?

    Thanks,

    Dan

  2. #2
    Registered Member incka's Avatar
    Join Date
    Aug 2003
    Location
    Wakefield, UK, EU
    Posts
    3,801
    I'm confused about what you are talking about, but to get wikipedia really you'll need to SSH your server and WGET the files.

  3. #3
    Registered Dan Morgan's Avatar
    Join Date
    Feb 2004
    Location
    UK
    Posts
    283
    Hmm, it appears I totally missed this - sorry.

    Basically, I have seen lots of sites with wiki content. However they are in the same style as wikipedia, in terms of layout, which suggests to me they are using the wiki CMS.

    However, these pages are not all the same as the live wikipedia.org pages suggesting they are not feeds but from a db.

    Therefore how is this generally done, especially considering sites which concentrate on a niche area so only want a small portion of the overall wiki content. What would be the best way of adding say, 500 pages of wiki content to an existing site, or a standalone site.

    Ta,

    Dan

  4. #4

  5. #5
    Registered Member incka's Avatar
    Join Date
    Aug 2003
    Location
    Wakefield, UK, EU
    Posts
    3,801
    They are too big to download to your hard drive before they update.

  6. #6
    Registered Dan Morgan's Avatar
    Join Date
    Feb 2004
    Location
    UK
    Posts
    283
    How often do they update? According to that page, the English current version is 370 odd megabytes, which would take about 50 minutes for me to download.

    [edit]
    Okay, circa 400 mb file sat on my desktop, but I cannot get an executable from http://sources.redhat.com/bzip2/ to work on XP. Can anyone else?

  7. #7
    Registered tomek's Avatar
    Join Date
    Jun 2004
    Posts
    102
    Okay, circa 400 mb file sat on my desktop, but I cannot get an executable from http://sources.redhat.com/bzip2/ to work on XP. Can anyone else?
    well it says: This executable was built on a Windows 2000 SP 2 machine; I have no idea if it actually works on 95/98/ME/NT/XP.

    I run linux so I can't answer your question...

  8. #8
    Registered Dan Morgan's Avatar
    Join Date
    Feb 2004
    Location
    UK
    Posts
    283
    Quote Originally Posted by tomek
    well it says: This executable was built on a Windows 2000 SP 2 machine; I have no idea if it actually works on 95/98/ME/NT/XP.

    I run linux so I can't answer your question...
    Yeah, I set the exe to Win 2k compatibility mode and still no joy.

    [edit]

    Zipzag saves the day. Extracting now...

  9. #9
    Registered tomek's Avatar
    Join Date
    Jun 2004
    Posts
    102
    on which page to you plan to use the wikipedia data and what do you want to do with it?

  10. #10
    Registered Dan Morgan's Avatar
    Join Date
    Feb 2004
    Location
    UK
    Posts
    283
    Quote Originally Posted by tomek
    on which page to you plan to use the wikipedia data and what do you want to do with it?
    To be honest I have no real plans, just wanted to have a play around and see what was possible.

    MySQL balked at the file after about 3 minutes of processing, so I am just trying to circumvent that, plus it do not think it will be possible to just extract the tiny part in relation, of the database.

    While I would like a dynamic feed, I am not sure it would be the best port of call at this point.

    Dan

  11. #11
    Registered Member incka's Avatar
    Join Date
    Aug 2003
    Location
    Wakefield, UK, EU
    Posts
    3,801
    You are not gettign this - CUR means UPDATES. OLD means the thing before the updates.

  12. #12
    Registered Dan Morgan's Avatar
    Join Date
    Feb 2004
    Location
    UK
    Posts
    283
    Quote Originally Posted by incka
    CUR means UPDATES. OLD means the thing before the updates.
    You think?

    I don't think you are getting what I am getting

  13. #13
    Registered tomek's Avatar
    Join Date
    Jun 2004
    Posts
    102
    doesn't OLD contain the articles histories i.e. the old versions of it?

  14. #14
    Registered Member incka's Avatar
    Join Date
    Aug 2003
    Location
    Wakefield, UK, EU
    Posts
    3,801
    Wikipedia has over 1 million articles.

    1 MILLION ARTICLES.

    400MB = 1000000 ARTICLES?

    400MB = 419430400 LETTERS

    1000000 articles at 1000 words an article is 1000000000

    419430400
    1000000000

    Maybe CUR is, but its only about 400 letters per article, which seems small to me...

  15. #15
    Chronic Entrepreneur
    Join Date
    Nov 2003
    Location
    Tulsa, Oklahoma, USA
    Posts
    1,112
    There are 1 million articles if you count all languages available in wikipedia. Dan said that the English current version is 370-odd MB. The English version has "only" 363,559 articles.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •