Convert subtitles .SRT files into other formats

I’ve been messing around lately with various HTML5 demos (like popcorn.js) which add interactivity to online audio and video. One great example of these is Happyworm’s Hyperaudio, which uses the transcript of an audio file to link to time in the audio, and various other cool functionality. One frustrating part of using these libraries is getting subtitles or a transcript into a useable format. I’ve written the function below (which I’ve just rewritten in PHP from the Hyperaudio javascript “parseSRT” function) trying to make it easier to get subtitles in a useable format. I was looking to make some improvements to this before posting, but having no time, just posted this (kind of ugly) script. Suggestions for improvements are always welcome… you can test converting your own SRT file on this Demo SRT Converter form.


<?php
// subtitles transform
// get an input .SRT file and output new format

function parseSRT($input) {
//remove blank lines from input
$input = removeEmptyLines($input);
//create final output array
$srt_array = array();
$lines = preg_split ('/$\R?^/m', $input);
$output = "";
$i = 0;
foreach($lines as $line){

$line = strip_tags(trim($line));
//is the current line a sequencially line number
is_numeric($line) ? $is_line_num = true : $is_line_num = false;
if($is_line_num){
$line_number = (int)$line;
//set the array with line nums
$srt_array[] = $line_number;

} else{
//not a numbered line
//is the current line the SRT time range
if(preg_match('/(\d+):(\d+):(\d+),(\d+) --> (\d+):(\d+):(\d+),(\d+)/', trim($line), $match)){
$is_line_range = true;
//get the begin and end in thousands of seconds HH:MM:SS,MMM
$begin = (intval($match[1]) * 3600000) + (intval($match[2]) * 60000) + intval($match[3] * 1000) + intval($match[4]);
$end = (intval($match[5]) * 3600000) + (intval($match[6]) * 60000) + intval($match[7]  * 1000) + intval($match[8]);
$total_time = $end - $begin;

} else {
//the text line
//get the num chars in line (include whitespace)
$line_length = strlen($line);
//get the time per character
if($total_time){
$time_per_char = round($total_time / $line_length, 0);

//split the line into words
$words = explode(" ", $line);
$num_words = count($words);

//get the amount of time for each word
$word_count = 0;
foreach($words as $word){
$word_length = strlen($word);
if($word_count == 0){
//set default on first word
$word_time = 0;
}
//$word_time is addative so we also need $current_word_time
//to track the current iteration an zero on each loop
$current_word_time = 0;

$chars = str_split($word);
$char_count = 0;
foreach($chars as $char){
//try to improve the accuracy by giving
//vowels a weight of 1.5x and commas a 2x weight
//and adding an extra "time per character" for spaces
if($char_count == (count($chars) - 1)){
//add a space to the last letter
$space = $time_per_char;
$char_count = 0;
} else {
$space = 0;
}
if($char == "," || $char == "."){
$word_time = $word_time + ($time_per_char * 1.5) + $space;
$current_word_time = $current_word_time + ($time_per_char * 1.5) + $space;
} elseif(preg_match_all('/[aeiou]/i',$char,$matches)) {
$word_time = $word_time + ($time_per_char * 1.25) + $space;
$current_word_time = $current_word_time + ($time_per_char * 1.25) + $space;
} else {
$word_time = $word_time + $time_per_char + $space;
$current_word_time = $current_word_time + $time_per_char + $space;
}

//round
$word_time = round($word_time);
$current_word_time = round($current_word_time);

//get tbe begining time of the word
if($word_count == 0){
//the first word
$word_begin = $begin;
} elseif($word_count == $num_words - 1){
//the last word, set to "end" current begin time is greater
//than the end of the line
$word_begin = ($begin + $word_time > $end - $current_word_time) ? $begin + $word_time: $end - $current_word_time;
} else {
$word_begin = $begin + $word_time;
}

//get the end time of the word
$word_end = $word_begin + $word_time;

$char_count++;
} //end foreach letter

//set the current lines array and add to the final output array
$word_array = array( word => stripslashes($word),
word_length => $word_length,
word_begin => $word_begin,
word_end => $word_end,
word_time => $word_time,
word_class => ereg_replace("[^A-Za-z0-9]", "", $word )
);
//add the word array to the line
$line_array[$word_count] = $word_array;
//increment word count
$word_count++;
} //end foreach word

//add the line to the final output
$srt_array[$line_number] = $line_array;
//reset line array
$line_array = array();

}

}
}
$i++;

//clear temp vars
$output_line = "";

}
return $srt_array;
}

function removeEmptyLines($string){
return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
}

function outputSRTArray($srt_array,$format,$options) {
switch ($format)
{
case "hyperaudio":
foreach($srt_array as $line){

if(is_array($line)){
//add paragraph tag
if($options['ptags']){
echo '<p>'. PHP_EOL;
}
foreach($line as $key => $val){
echo '<span m="' . $val['word_begin'] . '" oval="' . $val['word_class'] . '">' . $val['word'] . '</span>' . PHP_EOL;
}
//end paragraph tag
if($options['ptags']){
echo '</p>' . PHP_EOL;
}
}

}
break;
case "json":
echo json_encode($srt_array);
break;
}
return true;
}

?>

Here’s an example of how the function could be used to provide subtitles for varies uses. In this example, I’m grabbing an .SRT file from the crowdsourcing subtitling site Universal Subtitles. Using this completely subtitled video,  I’m able to use the SRT returned by UniSubs API, run it through the above function and output the data as JSON.

<?php
//include the subtitles transform function
include('srt-converter.php');

// Universal Subtitles API Key
$api_key = '';

//REST API request to retreive subtitles
$video_url = 'http://www.youtube.com/watch?v=_cUdFx-Y8yI';

$url = "http://www.universalsubtitles.org/api/1.0/subtitles/";
$url .= "?video_url=" . urlencode($video_url);
//$url .="&amp;amp;amp;language_id=en";
$url .= "&amp;amp;amp;sformat=srt";
$fetched = file_get_contents($url);

// case error
if(!$fetched){
print("Error, content fetched = ".$fetched);
} else {
//SRT returned from UniSubs
$data = $fetched;
$format = 'json';
//output json
outputSRTArray(parseSRT($data),$format,$options);
}
?>

Ideally, the Universal Subtitles API could return JSON by itself. (I’m not sure if it doesn’t already,) but this could easily be made into a dynamic script to provide subtitles in various different formats to provide the content for your HTML5 audio/video projects. (For example, just changing the header of the example above would give you a valid JSON file.)

As always, comments and suggestions for improvement are always appreciated. Here’s the Demo form to convert a SRT file into HTML.

Posted in Web Development | Tagged , , , | Leave a comment

Giant Squid Wallpaper

Giant Squid Wallpaper

This is an old ink + watercolors + photoshop image I made quite some time ago. Still one of my favorites… in fact, I think this is the only drawing or illustration I’ve done, that ever made any money. Some anonymous soul actually bought a print on ImageKind. 2 dollars that went straight into my pocket! :-) I’ve created a free wallpaper for download. Thanks

Posted in Art | Tagged | Leave a comment

Access a custom PHP page, without getting the WordPress 404 Page Not Found

Not exactly rocket science, here, but I thought I’d post this, which was a little something that took me WAY to long to figure out today. :-)

Basically, I had a custom script in a subdirectory of a wordpress intsall which needed to be accessed in the browser directly. Naturally, requests for PHP files are rewritten to the WordPress index by the .htaccess file in the root directory, which results in a Page Not Found page when trying to access your script. To get around this, a condition needs to be added to the rewrite rules. The standard mod_rewrite rules looks something like this:


<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

The following line can be added just before the rewrite rule to ignore a directory named “custom.”


RewriteCond %{REQUEST_URI} !^/(custom) [NC]

So… the complete block looks like this, and now my pretty PHP page in the “custom” directory can be access in the browser.


<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
##don't rewrite requests to the custom script
RewriteCond %{REQUEST_URI} !^/(custom) [NC]
RewriteRule . /index.php [L]
</IfModule>
Posted in Web Development | Tagged , | Leave a comment

Learning to paint with arcylics could be one hobby too many.

OLYMPUS DIGITAL CAMERA

One of my new year’s resolutions this year was to sit down and practice painting. The idea was to make, at least 12 paintings in 2012. (I don’t know why, but everything seems like a good idea when numbers match up like that, 12 in ’12.) This is my first one. Not too thrilled about the outcome (it looked way better in my head,) but at least I finished it, as it sat on the bookshelf unfinished for several weeks. :-)

I got the idea to start painting after doing the cover to my comic book with acrylics, last year. For whatever reason, that… basically my first attempt at painting… was pretty easy, and I figured it would be something I could get into. To be honest, though, as a person who’s medium of choice has always been ink or digital (Photoshop,) I’m finding this to be a bit challenging. Wonder if others have had the same problem, or if long time painters also feel like they’re working from the wrong side of the brain when they switch over to black and white line art or digital painting.

Anyway, I’ve still got 11 paintings to do. (hmm.. nothing “catchy” about that number.)

Posted in Art | Tagged | 51 Comments

Siri predicts the end of the world

Siri will 2012 be the end of the world?

Had a pretty good New Year’s Eve, last night. Not great… but good. Either way I’m looking forward to all the “end of the world” topics in 2012, which anyone who’s been paying attention to the GOP presidential race knows, there will be a lot of. So, I wanted to start the year off with my own little comic… here it is.

Here’s the google search trend to keep your eye on. Hope everyone has an excellent 2012!

 

Posted in Comics | Tagged , | 86 Comments

Creating a simple “Apple style” button with CSS rounded corners and drop shadow

CSS3 has several easy properties that are making life much easier for designers. In the past, to apply some of these simple styles to an element, required nested divs, multiple background images, and/or some other javascript trickery. I remember working on a Drupal website some years ago, where the site’s owners were what I call… “Round Corner Evangelists.” This was the bulk of our “project meetings.” Basically, going through the site, and pointing at any element that wasn’t already “rounded” and saying, “maybe we could put some rounded corners on that.” “See that text inside the rounded box?.. maybe we could make a rounded box inside the rounded box!”

Whatever. It’s now pretty easy to round anything you like, although (like always) browser support is going to be hit or miss. Note that, until browser support is universal we’re going to be using browser prefixes on properties to target specific vendors. The two most common are  “-webkit-” for WebKit browsers like Safari and Chrome, and “-moz-” for Mozilla.

To create our “Apple style” button we’re going to apply “border-radius,” “box-shadow,” and a gradient background to the following HTML:


<div class="apple-style">

<a href="http://apple.com/itunes">iLink</a>

</div>

First, lets set some default styles for our link, and then add a gradient:


.apple-style a  {

width:80px;
height:40px;
color:#333;
text-decoration:none;

font:18px "Lucida Grande","Lucida Sans Unicode",Helvetica,Arial,Verdana,sans-serif;

/* align "button" text */
display:table-cell;
vertical-align: middle;
text-align:center;

/* gradient */

background: -webkit-gradient(linear, 0% 0%, 0% 100%, from(#EFEFEF), to(#BBBBBB));
background: -webkit-linear-gradient(top, #EFEFEF, #BBBBBB);
background: -moz-linear-gradient(top, #EFEFEF, #BBBBBB);

}

Next, let’s add the styles for the rounded corners and drop shadow.


.apple-style a{

/* the drop shadow */
-moz-box-shadow: 0 0 5px #333333;
-webkit-box-shadow: 0 0 5px #333333;
box-shadow: 0 0 5px #333333;
/* the rounded corners */
-moz-border-radius: 25px;
border-radius: 25px;

}

Now you should have a button that looks something this (in supported browsers):

Apple Style Button

This was a pretty simple example, but I think it works for an introduction. Of course, you can get a lot more information about browser support and browser prefixes and other topics like “CSS Transitions” for hover effects. View a demo of this button.

Thanks, please leave any questions or comments in the Comment box, below. (you know… the textarea with square corners.)

Posted in Design | 89 Comments

If North Korea had the Internet…

North Koreans on Twitter

I drew this picture a while ago, but decided to post this as there’s a bunch of news out today on the “Dear Leader’s” 3-day funeral tour. There’s also (of course) a lot twitter noise. Its kind of hilarious watching a supposedly solemn communist funeral, and at the same time reading thousands of mocking, snarky tweets. But I’d like to hear from the North Koreans directly. Is there even one North Korean on twitter? Even one guy that the whole world could follow?

Had a good North Korea joke but I'm gonna save it for reunification.
@andykhouri
Andy Khouri

 

Posted in Comics | 44 Comments

Welcome to Graphic Silence

Welcome to Graphic Silence

Welcome to the Graphic Silence Blog!

I’ve decided to set this blog up this year to finally showcase some of  my silly little projects in one central repository (or tomb.) Of course… I can’t promise any coherence to the content. I plan on posting some drawings and comics, some web development type stuff I do… maybe I can also talk some of my talented friends into posting some music or writing. So that’s the idea. A blog about nothing in particular. No mission. No focus. No $0.27 cents revenue from Google Adsense.

Sounds like a good idea, already.

I’ve got a twitter and facebook account for GS, if you want to get in touch. Also, check out a comic comic book I drew last year… hard copies are available for purchase on Amazon, but it can also be read for free online.

Again, thanks for stopping by…


#welcome-message{

font-size::xxx-large;

}

 

Posted in Art, Comics, Design, Web Development | Tagged , | 27 Comments