PDA

View Full Version : is this even possible?



Chris
11-19-2005, 04:58 PM
I'm not sure this is possible. But I thought that possibly with ajax & js it might be.

I'm trying to think of a better way to take a raw text file of a book and divide it into chapters for my literature site. Automating it doesn't work that well since different books use different ways to mark new chapters.

anyways....

Can anyone think of a way to display the text on a page and have a user divide it up just by clicking on where a new chapter starts?

I see at as putting in the equivalent of a pagebreak and then upon submitting each page could be filed into a different DB row.

Mike Hunt
11-19-2005, 06:23 PM
A wiki would probably work.

Emancipator
11-19-2005, 07:18 PM
I see what you are trying to say Chris.. and a wiki in my opinion is not what your looking for. You want to be able to use xmlhttp to have a user go down a page of content and flag chapter breaks and then automatically have it update. I would suggest it is more then doable without any trouble.

Chris
11-19-2005, 09:34 PM
How would you go about it Emancipator?

Mike Hunt
11-19-2005, 09:50 PM
A wiki would too totally work.

Chris
11-20-2005, 06:25 AM
A wiki is a system whereby anyone can edit a document.

I'm trying to turn a browser into a GUI for inserting pagebreaks.

Kings
11-20-2005, 12:25 PM
I agree with Emancipator, it's definately possible. All you need is some (advanded) JS and Ajax to make it all work.

Emancipator
11-20-2005, 01:03 PM
kings i think with a simple onclick and xmlhttp(aka ajax) it could be done. Glad we both agree :P Makes me feel smurter.

Chris
11-20-2005, 02:18 PM
onclick what though? Cut here, copy all above text, paste into variable?

Kings
11-20-2005, 03:09 PM
Can you show an example of how your text looks, so we can help you better? Thanks.

Emancipator
11-20-2005, 05:38 PM
All I would suggest is that you have a whack of text.... and the "staffer" clicks everywhere they want to break a chapter, or whatever kind of break you want. When you click the code inserts a unique marker.. When they then hit SUBMIT it takes the data and dumps it into the database with the new breaks. I would even go further to create a chapter abstract automatically and keywords as well.

Not rocket science, but certainly something that your average idiot wont be able to do. :)

Chris
11-20-2005, 06:03 PM
So javascript can automatically insert a unique marker into a website?

What I don't get is how the location of the click is translated.

Can JS calculate line numbers or is it limited to mere coordinates? (which would be of no help due to scrolling). Would I need to overlay above the text a grid that represented line numbers so that the click could be appropriated calculated?

Its this part of the equation, the user simply clicking and the website knowing exactly where in the text they clicked, that I don't get.

Bryan
11-20-2005, 06:20 PM
Chris,

If you must do it manually, the first step would be to automatically split the text into paragraphs. Depending on formatting, this might be where two newline characters are next to eachother.

Then a web app could render each paragraph into a p tag, and add an onclick handler to that. The onclick handler would transmit the number corresponding to the paragraph back to the server, and mark it as a page break.

The whole system could be expanded as much as necessary... for example, to delete sections with no actual content, to insert chapter names (if they go into separate database fields), or to add page breaks within a chapter.

-Bryan

Chris
11-20-2005, 06:26 PM
Thanks... so when dividing up the page with paragraphs there'd need to be some sort of a label on each <p> tag that corresponds to a number? Or can js access that info like document.page.text.paragraph.2 (its been a long time since I did any of my own js).

Then the server side code would take that number, and end the chapter after the close of that paragraph (by loading the text in php, finding X </p> tag, and doing a substr()) - (assuming the person was instructed to click on the last paragraph in a chapter, rather than in the break itself).

The New Guy
11-20-2005, 07:09 PM
I am confused. You want the entire document to be loaded, then allow users to pick which part of the document is a new chapter. Then have the whole document be broken up that way. To which each user can break it up in his own way? Is that correct?

Chris
11-20-2005, 07:15 PM
yes.

Basically I'd like to farm out this data entry and I'd like the system to be as easy to use as possible since I'd be paying hourly. So point-and-click functionality would be great.

Westech
11-20-2005, 09:23 PM
If all of your books are originally in text files then it may be easier to do this outside of a web browser. It's a relatively trivial task to write a program to parse through a text file and divide it up based on whatever the chapter separator is.

If the chapter seperator pattern differs for each file you or your hired help could quickly look at each text file and tell the program what unique text pattern seperates chapters (for example "------" or "<newline>Chapter <one or more numbers><newline>") and then let the program separate the file into chapters and even output a text file in mySQL dump format that you could import into a mysql database.

I'll bet that you could get a program like this done for a few hundred dollars on one of the freelancer sites.

chromate
11-21-2005, 04:22 AM
Yeah, I think the "KISS" principle applies :) Why not just send the text file to the freelancer and get them to enter "*chapter*" where they think the chapter breaks should be. They wont have any problem with this and I can't see it making any difference to the amount of time it takes them. Then just parse the document into the database using these chapter breaks.

Emancipator
11-21-2005, 07:05 AM
Westech and Chromate your ideas are good as well... The only thing then is you have to trust whoever does it to enter it into the DB correctly. Where as with the original method chris was looking into they just have to be able to CLICK and the code does the rest. Inserts chapter, database entry, etc.

The New Guy
11-21-2005, 08:06 AM
I am late for school, so this is extremely rushed, but I think this will do what you want. All the user has to do is highlight the chapter title (or any unique text he feels defines the chapter) and it will add a delimiter to the beginning of said text. The script will then save the file and when viewed it will break the book into chapters by the delimiter (I used *). The JS only works in IE because I am rushing.



<?php
?>
<script>
function formatText () {
var selectedText = document.selection.createRange().text;

if (selectedText != "") {
// Add Delimiter
var newText = "*" + selectedText;
document.selection.createRange().text = newText;
}
}
</script>
<?php
// Script

$delimiter = '*';
$filename = "book.txt";

if ($_GET['action'] == 'view'){
$handle = fopen($filename, "r");
$contents = null;
while (!feof($handle)) {
$contents .= fread($handle, filesize($filename));
}
$contents = explode ($delimiter,$contents);
$chapters = count($contents) - 1;

// Show Chapter Links
for ($c = 1; $c<=$chapters; $c++){
echo '<a href="#'.$c.'">Chapter '.$c.'</a><br />';
}
// Show Book By Chapters
for ($c = 0; $c<=$chapters; $c++){
echo '<a name="'.$c.'"></a>'.$contents[$c].'<br /><br />';
}
}
elseif ($_GET['action'] == 'edit' && isset($_POST['submit'])){
$book = $_POST['book'];

// Let's make sure the file exists and is writable first.
if (is_writable($filename)) {

// In our example we're opening $filename in append mode.
// The file pointer is at the bottom of the file hence
// that's where $somecontent will go when we fwrite() it.
if (!$handle = fopen($filename, 'w')) {
echo "Cannot open file ($filename)";
exit;
}

// Write $somecontent to our opened file.
if (fwrite($handle, $book) === FALSE) {
echo "Cannot write to file ($filename)";
exit;
}

echo "Success, wrote to file ($filename)";

fclose($handle);

} else {
echo "The file $filename is not writable";
}
}
elseif ($_GET['action'] == 'edit'){
$handle = fopen($filename, "r");
$contents = null;
while (!feof($handle)) {
$contents .= fread($handle, filesize($filename));
}
// Delimiter Form
echo '<form name="form" action="chris.php?action=edit" method="post">';
echo '<textarea name="book" cols="100" rows="20">'.$contents.'</textarea><br />';
echo '<input type="button" value="New Chapter" onclick="formatText()" />';
echo '<input name="submit" type="submit" value="Submit" />';
echo '</form>';
}
?>

Chris
11-21-2005, 11:26 AM
If all of your books are originally in text files then it may be easier to do this outside of a web browser. It's a relatively trivial task to write a program to parse through a text file and divide it up based on whatever the chapter separator is.

If the chapter seperator pattern differs for each file you or your hired help could quickly look at each text file and tell the program what unique text pattern seperates chapters (for example "------" or "<newline>Chapter <one or more numbers><newline>") and then let the program separate the file into chapters and even output a text file in mySQL dump format that you could import into a mysql database.

I'll bet that you could get a program like this done for a few hundred dollars on one of the freelancer sites.
There are so many different ways though that a book can be divided up that I think this would be too difficult.

If all books used Chapter ## it'd be easy, but some use Book instead of Chapter, or Section. Some use both Book and Chapter or Book and Section (As in Book 1 Chapter 2). Some use roman numerals. Poetry books use none of these things, just poem titles. Others don't use book, chapter, or section, just numbers or roman numerals by themselves.

By the time you right the code to come up with every conceivable possibility it would have been easier to code a more complicated point and click interface.

Chris
11-21-2005, 11:30 AM
I think Chromate's idea has merit. Instead of having them click I could just have them insert a unique code with ctrl-v so it'll be nearly as easy as clicking. Then I can parse it based on that code.

I wonder if an entire book would be too much for the explode() function to handle.

Emancipator
11-21-2005, 12:04 PM
clicking is way easier then pasting and does exactly the same thing.

The New Guy
11-21-2005, 12:18 PM
I wonder if an entire book would be too much for the explode() function to handle.

Well, thats where caching would come in. So you could hash the last modified date, and compare it to a stored hash to see if it has been changed. If it has been changed get the whole book and parse it for the delimiter, then in a sperate text file record where each delimiter occurs. So, when you view the chapter instead of exploding the entire book, you would read the file which contains the delimiter locations, and fseek to the correct area. So you would only pull that chapter out of the file, rather then the whole book and exploding it.

Chris
11-21-2005, 12:33 PM
clicking is way easier then pasting and does exactly the same thing.
Except you'd have to deal with accidental clicks and the like.

Maybe I should have a contest, who can build the best system.

Emancipator
11-21-2005, 01:08 PM
to each their own, myself I would go with the simplest method and to me a scroll and clicking is way faster.. but ive been known to be wrong in the past.

chromate
11-21-2005, 02:34 PM
Except you'd have to deal with accidental clicks and the like.

Not only that, but if a freelancer is dealing with a whole book they're unlikely to want to do the whole thing in one sitting. So either there needs to be code there for them to save and return, or they have to keep their computer running so they don't lose their place. Notepad, or whatever, has all they need to navigate around the document and save it. They'll be familiar with the environment and get the job done faster as a result. A freelancer that can't paste in a simple delimiter, is one you don't want to hire anyway!

Concentrate on the code to parse the text document into chapters using the simple delimiters. Do a quick test - explode() may handle it fine.

Personally I would forget about having to deal with creating the interface for the freelancer to put those delimiters in. Why re-invent the wheel? Just send them the text file and tell them to get on with it.

... Maybe I'm just too lazy ;) haha

BTW Chris, do you think Google Books is going to have any negative effects on your site?

Chris
11-21-2005, 02:49 PM
it only takes 3-5 minutes to do each book (longer on the so I don't think all "All in one sitting" issue will happen. Its also not really a freelancer I'd be hiring, but someone in India or wherever for simple data entry. I want to make things as easy as possible for them, and as quick as possible for them.

As for Google Books (and MSN's new project) yes... I worry. But I think my site is well positioned. all the clones out there might suffer but I think I'll be fine because of the additional features I've added.

Emancipator
11-21-2005, 03:45 PM
using the system i am thinking of you would be able to definitely do it in a few minutes per book.