User talk:Bluemoose/DataBaseSearchTool

Hey Bluemoose. It might be useful for the not so technical among us to give some instructions for the following problem I encountered: First I downloaded the .NETframework then your program. When I types my first inquiry into your program I get the message:

"Please open up an "Articles" XML data-dump file from the file menu See the About menu for where to download this file".

So I followed these instructions, but when I clicked on "open XML dump" I got a new screen with a file name written in "current or articles XML file" and my only options are "open" or "cancel". When I click on open I get the error message: current or articles XML file does not exist. I have no idea what to do from here. Fuhghettaboutit 01:06, 28 January 2006 (UTC)[reply]

The file you want is here, the one called pages-articles.xml.bz2, I havent linked to that page directly in the program because when a newer dump is available it will be on a different page. thanks Martin 09:46, 28 January 2006 (UTC)[reply]

Sorry Bluemoose, but still bewildered. I understand now to some extent--the dump has all text files on wikipedia as of a certain date that gets redumped periodically as Wikipedia changes(?), but I still don't know how to access that file with your program. When I try to access the XML file, it looks in my computer--must I download the 997 MB file to my computer in order to do this? I need to be spoonfed here. Thanks for any help. Fuhghettaboutit 15:35, 28 January 2006 (UTC)[reply]

OK, download the 997MB file to your computer, extract it (it is a .bz2 file which is just like a normal .zip file) with a program like winzip or winrar, then start up the database search tool and "open" the extracted file which will be called enwiki-20060125-pages-articles.xml then you are ready to start searching. hope that helps Martin 16:33, 28 January 2006 (UTC)[reply]

It does indeed. Thank you. In fact, that's the conclusion I had sort of reached above, but I was having trouble swallowing the fact that I needed to downloaded almost 1,000 MB to my computer first. Thanks again. Fuhghettaboutit 16:53, 28 January 2006 (UTC)[reply]

I am more than happy to do any searches for you, just let me know. Martin 17:14, 28 January 2006 (UTC)[reply]

Thank you

thank you, Thank you, THANK YOU. This software is incredibly useful for my work on the Ancient Egypt project!

—-- That Guy, From That Show! ^(talk) 2006-02-22 03:06Z

Thanks, some technical questions

Hi Bluemoose, thanks for your great tool. I'm working on a tool that process a full xml-dump (with history ~190 GB) and wondered if C#'s XML - features could handle such a vast amount of data in one file. I'd appreciate it, if you could fill me in with some minor details of your SW. Would be a shame to buy a new harddisk and recognizing that there's no possibility for parsing ;)

MMF (Sorry, no account --> no signing)

Well you certainly won't want to load the whole thing into memory ;-), but yes, I can't see why the xml features will not be able to handle a file of any size. Not sure what "SW" means, but just ask about any other details. Martin 14:56, 28 February 2006 (UTC)[reply]

Thanks for your answer. I assume you are using the native XML - C# - API with XMLTextReader etc? What I'm interested in, is some exchange of experiences about the maximum size .NET's XML-API can handle (Microsoft states 2GB max per file). Or ar you using SAX.NET for parsing? (With JAXP (Java-Sax) from SUN I should be able to parse 190GB of XML) And SW stands for Software, I'll try to be more precise - I'm not writing in my native language (obviously ;) ) The problem - I need the full history of articles for statistical purposes, and parsing the online wikipedia in HTML seems a bit tough to handle. But 190 GB of XML in one file - OMG :D Maybe you could write a short statement about your used classes (Standard .NET or some kind of SAX for .NET) and the maximum filesize you have sucessfully tested with your tool. Thx in advance
MMF