Latest Posts

5/8/2008 3:35:58 PM
CSV Splitter
Author:

As usual, rather than download a small program I am certain is available to no end, I decided to reinvent the wheel in order to manage one of my client's need.

The primary function is to split large CSV files and maintain the headers in order to create smaller files for uploading to a catalog website that limits file upload size.

Built on .NET 2.0, does not include framework, no installation

Download

0 Comments

4/23/2008 9:10:37 PM
Competetive Data Scraping
Author:

Brute Research

In my work I spend a lot of time gathering data from non API sources, which has the upside of being very rewarding when you pull it off successfully, but has the downside of forcing you into potentially uncharted and ethically-challenging territory.

Size up the target

What's the take?

How much data am I going to be scraping? If I am about to scrape a measly thousand pages this doesn't take a lot of thought, just get in there and get it, but if I am confronted with hundreds of thousands of pages to scrape or even millions the strategy becomes very different. You ever try storing a few hundred thousand files in a single directory? Don't.

What's the security like?

Where are the guards, how might I be noticed? Some websites employ mechanisms to detect the likes of me coming, and will block IP addresses automatically for a certain amount of time, showing up on this radar tends to lead towards blacklisting. Always have a healthy list of proxy servers or a map to all the local wireless spots in your area... your choice.

How much time should I take?

Can I get away with it all day, or should I wait until after dark when the customers are gone? A client has expectations and you really have to be able to sit down and determine exactly what it will take to get the amount of information needed. When you are looking at scraping hundreds of thousands of website pages you'll find that shaving off a half a second per 10 pages, or some other minute improvement, will add up very quickly. I usually break such project into phases like

  1. Amount of time to get the data
  2. Amount of time to parse out the data
  3. Amount of time to structure the data in a user-friendly format.

Why?

Why would anyone want to do this? Aside from any malicious use of screen scraping like RSS scraping for content generation or just plain plagiarism and the like, for a legitimate business owner, there just so happens to be two very good reasons.

First, some data is just not publicly available in a useable format, in this day and age it would seem that screen scraping is becoming less and less of a practiced art, but there are still some darkened corners of the Internet that just beg for someone to come along and use that secluded data in a more productive manner.

Second, in many instances cost is a big factor. There are certain organizations that will sell you their data or allow access to it on a subscription basis when that same data is simply sitting on a public website somewhere waiting to be plucked. In addition to the second reason, there are often caveats to the data provided by the source in which a tiered level of information distribution is employed, meaning you get the data and find out that in order to get what you really wanted you have to pay more.

What to expect

Expect to need more hardware, more custom software and more coffee. Possibly some sort of counseling from time to time. Data scraping for profit can be painfully complex. The wear on you wont compare to the wear on your hardware though.

0 Comments

4/17/2008 5:08:14 PM
Date Image Thing
Author:

I want to be trendy, too

For shiggles, I wanted to add a nice little trendy date format to my postings, but everything I found seemed a bit too wordy for me. It always looks like this:

<div class="post-date">
<span class="month">10</span>
<span class="day">04</span>
<span class="year">1977</span>
</div>

This site is dynamically generated, so it wouldn't take too much effort in order to implement such a structure, but I'm stubborn... so here goes

Overprocessed, Nerdy Post Date Image Generation


<%@ Import Namespace="System.Drawing" %>
 <%@ Import Namespace="System.Drawing.Imaging" %>
 <script language="VB" runat="server">
 Sub Page_Load(sender as Object, e as EventArgs)
 dim strDt as string = request.querystring("dt")
 dim strMonth as string = left(MonthName(Month(strDt)),3).ToUpper()
 dim strDay as string = Day(strDt).ToString()
 if strDay.Length=1
 	strDay="0" & strDay
 end if
 dim strYear as string = Year(strDt).ToString()
 
  Dim baseMap as Bitmap = new Bitmap(95, 13)
  '13 cuts it off, which looks cool -- see emersian.com
  Dim myGraphic as Graphics = Graphics.FromImage(baseMap)
  Dim upBrush as SolidBrush = new SolidBrush(Color.black)
  Dim downBrush as SolidBrush = new SolidBrush(Color.steelblue)
  Dim MonthFont as Font = new Font("tahoma", 11,FontStyle.Bold)
  Dim dtFont as Font = new Font("tahoma", 14,FontStyle.Bold)
  myGraphic.FillRectangle(new SolidBrush(Color.white), 0, 0, 100, 25)  
  myGraphic.DrawString(strMonth, MonthFont, upBrush, 0, 0)
  myGraphic.DrawString(strDay, MonthFont, downBrush, 30, 0)
  myGraphic.DrawString(strYear, MonthFont, upBrush, 50, 0)
  myGraphic.TextRenderingHint = System.Drawing.Text.TextRenderingHint.AntiAlias 
  
  Response.ContentType = "image/gif"
  baseMap.Save(Response.OutputStream, ImageFormat.GIF)

  myGraphic.Dispose()
  baseMap.Dispose()
 End Sub
 </script>
%>

Usage

< img src="dt.aspx?dt=DATE STRING HERE" />

1 Comments

4/16/2008 12:22:40 AM
SwishMax Image Explorer
Author:

The Flash

Usage

The image to load is designated in the Onload if the "holder" sprite
img.loadMovie("your file here");

The Download

Download swish file

0 Comments

4/15/2008 11:31:43 PM
Entrepreneurial Wisdom from Gangster Movies
Author: Joe Maddalone

"Leave the gun. Take the cannoli."

The Godfather, Clemenza

Don't lose sight of what is actually important to your survival and what is not. So many of us get caught up in the clutter of various advertising scenarios and side projects that we can easily forget how we started and who our bread and butter customers really are. I recently reviewed the last few years of accounts and came to the realization that one of my most neglected avenues of income had added up to equal the payments of my largest client. Needless to say I have reinvested efforts into it and am beginning to see real results

"I'm a spoke on a wheel. I am, and so are you."

Donnie Brasco, Lefty

Somewhere I once read that in five years you will be the people you associate with, the books you read and the music you listen to. This sounds a bit harsh, but I have to admit that I have ssen it firsthand and it's solid advice.

Surround yourself with the people you admire. Collaborate and invite critique. Seek out those who challenge and inspire you.

"Every dog has his day"

Scarface, Tony Montana

Getting dismayed is natural, it's really those who keep an eye on the prize that prevail. Fundamentally, you cant lose if you don't play, but can't win either. Keep doing what it is that you love, keep trying new things and your day will come.

"I look at this town, and I don't like what I see."

Copland,Sheriff Freddy Heflin

Many, many of the most successful businesses around, especially all these web startups are founded on the idea of fixing an existing problem or adding a functionality that was needed. See a problem fix it. Just look at 37Signals, who knew that TaDaLists would turn into Basecamp, Hirise, etc.

Darren Rowse, who lives in "The House that Google Built" started only two years ago - sounds crazy doesn't it?

"I'm going to give you an opportunity: get out of this."

Suicide Kings, Charlie Barret

Know your exit. Know your exit. Know your exit.

"Back home, they put me in jail for what I'm doing. Here, they give me awards."

Casino, Sam Rothstein

I am not saying break the rules, but be certain to push them all the way to their limits. No one ever got rich by not pushing the boundaries of either customers or industry. This is absolutely true of todays online businesses. Look at the most successful eBayers... thousands and thousands of items listed, Bloggers with hundreds of sites, Google isn't just sitting back and raking it in they are constantly pushing the boundaries of what they can get away with. You should to.

"...you were a little out of line yourself."

Goodfellas, Jimmy Conway

A wrong is a wrong is a wrong, don't be afraid to admit when you are wrong. As long as you learn from your wrong decisions they are valuable decisions

"You don't need proof when you have instinct."

Reservoir Dogs, Joe

Maybe a bit of both is really needed, but so many seem to fall into the pattern of playing it safe when what is needed is a little bit of courage

Despite what people may thing the Internet is still very much like the Wild Wild West

Got one?

I am certain I left out a ton of great lines from gangster movies that could be added here, perhaps you know one?

1 Comments

4/15/2008 8:00:00 PM
Google Charts with Adsense
Author: Joe Maddalone

Google Adsense with Google Charts

Data visualization is such a help sometimes, I don't know why Google does not utilize the charts API in their Adsense reporting, but having been experimenting with the Charts API while developing this site a bit I decided to utilize it for some actually valuable research

And the most successful blog killer is?

In reviewing Jan 1 thru Apr 15 this year and last, I have come to the conclusion that server downtime is my biggest enemy.

It's not really my hosting companies fault though, I am just endlessly tweaking code and trying it out on the production server when I should be using a test environment first. Well that changes today, no more tweaking without testing. Hunkering down and reinvesting myself into all this has been great so far, I can't believe how motivated I am.

I hope you folks will bare with the breakneck pace of posts lately, just trying to keep up with my own mind

Now if I could only get past this -- Starting Sep 13, 2007 only websites with over 100,000 daily page views across user pages will be eligible to participate in the AdSense API program. I could really have something.

2 Comments

4/15/2008 7:36:12 PM
Getting Your Blog on the Map
Author: Joe Maddalone

As I mentioned before I have been out of the blogging/SEO mind set for about a year

Just with some initial modifications such as better CSS, regular updates and getting involved in the communities I enjoy a significant increase in traffic has occurred on this site specifically. So here are my methodologies at this point

Get on the map

Manual submission seems to be a long dead concept, however I still think it is absolutely the way to go despite the whole Web 2.0 notion that between Twitter and RSS feeds the world will take notice. However I am also a fan of WebCEO for that quick fix initial submission to SERPs (and yes that is an affiliate link, but the Free version is still great!)

Regular updates = Traffic

Regular updates means just that. I recall Darren Rowse used to do his blog-a-thons where he blog non-stop for 24 hours and I commend his efforts, but outside of that if you can only commit to posting once a week just make sure you do it each week. Modern SE SPiders keep track of the regularity of updates and come to check at those intervals determined by statistical data, keep them fed and happy. Your readers don't want to come back to the same page each day either, if they know you'll be updating at a regular interval they will return at those intervals.

Be aware of your visitors interests

Keeping track of how users get to your site can leverage a great amount of insight as to whether what you are writing works or not. Keep an eye on the search terms users entered to get to your site and follow suit

Interact

Read those comments. Answer the questions and be a part of your own community

0 Comments

4/14/2008 5:49:16 PM
Virtual Map is pretty dang easy
Author: Joe Maddalone

Example

Microsoft Virtual Earth is this easy to implement

Code

<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <title></title>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
      <script type="text/javascript"
      src="http://dev.virtualearth.net/mapcontrol/v5/mapcontrol.js">
      </script>
      <script type="text/javascript">
      var map = null;

      function GetMap()
      {
         map = new VEMap('myMap');
         map.LoadMap();
      }
      </script>
   </head>
   <body onload="GetMap();">
      <div id='myMap'
      style="position:relative; width:500px; height:400px;"></div>
   </body>
</html>

0 Comments


LATEST POSTS

CSV Splitter Reinventing the wheel one app at a time. Competetive Data Scraping Not yet an Olympic sport. Date Image Thing GDI+ to overprocess blog post dates! SwishMax Image Explorer Quick and easy image explorer in Swish Entrepreneurial Wisdom from Gangster Movies Solid advice from my favorite gangster flicks

MOST POPULAR

Multiple IEs in Windows Firefox Vs. The World Who Is Xperya? ActionScript Form Fields IE 7 beta 2 standalone Text Link Ads

IP Address Tool Adsense Avril Lavigne Babies Class Action Coffee iCarly Ipods Jake Long Lost Wedding Planning Wizards of Waverly Place Free Online Games Bratz iCarly Hannah Montana Webkinz Zack & Cody Drake & Josh Caillou Barney Blue's Clues Curious George SpongeBob Bratz Thingz Kim Possible Dora Diego Lazytown Wonder Pets Backyardigans Naked Brothers Cheetah Girls Phil of the Future Zoey 101 Jake Long Unfabulous Amanda Show Hilary Duff Yu-Gi-Oh Cards Raven James Blunt Grey's Anatomy Lost House Prison Break Heroes Wedding Plans Family Life Adsense Revenue Dirtbikes Tattoo Fonts Coffee Factor Start A Diet Breast Health Nutritional Data Window Blinds Free After Rebate Free Tech Books Chicag Web Design Ipods MySpace Free Text Messaging Search by ISBN Class Action Brangelina Free Tech Books Famous Quotes Area Codes