The Cleaner
Task To Be Addressed
The simple .URL file. Your link back to a website you might wish to revisit in the future. Inaccessible to most users because they lack a programmer's editor (like UltraEdit), they reside in your Favorites directory (and each of the subdirectories you have created). Windows Explorer offers no hints as to the contents. Ideal for hiding tracking links. Mind you, not all .URL files are as bad as the one I am going to show you, but it sure looks like fertile ground to me. Let's have a peek inside one...
[DEFAULT]
BASEURL=http://sustainablesources.com/
[DOC#564]
BASEURL=http://www.youtube.com/embed/uGsmKY_RrmI
ORIGURL=http://www.youtube.com/embed/uGsmKY_RrmI
[DOC_google_ads_frame1]
BASEURL=http://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-1859845853929504&output=html&h=600&slotname=1323859760&w=160&lmt=1336790857&ea=0&flash=11.2.202.235&url=http%3A%2F%2Fsustainablesources.com%2F&dt=1336790857526&shv=r20120502&jsv=r20110914&saldr=1&correlator=1336790857554&frm=20&adk=2902830490&ga_vid=1099170898.1336790858&ga_sid=1336790858&ga_hid=1957588677&ga_fc=0&u_tz=-420&u_his=245&u_java=1&u_h=1080&u_w=1920&u_ah=1050&u_aw=1920&u_cd=32&u_nplug=0&u_nmime=0&dff=tahoma&dfs=11&adx=-2&ady=-2&biw=932&bih=935&oid=2&docm=8&fu=0&ifi=1&dtd=50
ORIGURL=http://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-1859845853929504&output=html&h=600&slotname=1323859760&w=160&lmt=1336790857&ea=0&flash=11.2.202.235&url=http%3A%2F%2Fsustainablesources.com%2F&dt=1336790857526&shv=r20120502&jsv=r20110914&saldr=1&correlator=1336790857554&frm=20&adk=2902830490&ga_vid=1099170898.1336790858&ga_sid=1336790858&ga_hid=1957588677&ga_fc=0&u_tz=-420&u_his=245&u_java=1&u_h=1080&u_w=1920&u_ah=1050&u_aw=1920&u_cd=32&u_nplug=0&u_nmime=0&dff=tahoma&dfs=11&adx=-2&ady=-2&biw=932&bih=935&oid=2&docm=8&fu=0&ifi=1&dtd=50
[DOC_I1_1336790857644]
BASEURL=https://plusone.google.com/_/+1/fastbutton?bsv=p&url=http%3A%2F%2Fsustainablesources.com%2F&size=standard&count=false&hl=en-US&jsh=m%3B%2F_%2Fapps-static%2F_%2Fjs%2Fgapi%2F__features__%2Frt%3Dj%2Fver%3DNec4xg3wDg8.en_US.%2Fsv%3D1%2Fam%3D!AuYF0E1N7E-Ine7KrA%2Fd%3D1%2Frs%3DAItRSTMUSSt3OSnDgL9qnPccCbWYHQBtyg
ORIGURL=https://plusone.google.com/_/+1/fastbutton?bsv=p&url=http%3A%2F%2Fsustainablesources.com%2F&size=standard&count=false&hl=en-US&jsh=m%3B%2F_%2Fapps-static%2F_%2Fjs%2Fgapi%2F__features__%2Frt%3Dj%2Fver%3DNec4xg3wDg8.en_US.%2Fsv%3D1%2Fam%3D!AuYF0E1N7E-Ine7KrA%2Fd%3D1%2Frs%3DAItRSTMUSSt3OSnDgL9qnPccCbWYHQBtyg#id=I1_1336790857644&parent=http%3A%2F%2Fsustainablesources.com&rpctoken=538808861&_methods=onPlusOne%2C_ready%2C_close%2C_open%2C_resizeMe%2C_renderstart
[InternetShortcut]
URL=http://sustainablesources.com/
IDList=
IconFile=http://sustainablesources.com/wp-content/themes/atahualpa3.7.10/images/favicon/favicon.ico
IconIndex=1
[{000214A0-0000-0000-C000-000000000046}]
Prop3=19,2
That was pretty bad, no? All your browser needs in order for you to navigate your way back to this site is listed under the bracketed text "[InternetShortcut]" - the site's URL= address. Nothing more. But look what we have here. Links to YouTube, Google Leads, Doubleclick (ulgh) Google's Plus One, a link to an icon file and then some cryptic code. Each time to click your link to return to the site, all of this unnecessary hoohah gets triggered without your knowledge. Well friends, this is not for me.
Development
The idea is simple. Take the above example, eliminate the unnecessary and rewrite the .URL as follows:
[InternetShortcut]
URL=http://sustainablesources.com/
I will post the commented source code in the implementation section. It is written in C++ so you will need to download a C++ development environment so that the program can be compiled to work on your machine. I use the Code::Blocks IDE (available at http://www.codeblocks.org). In addition, there are some limitations:
Implementation
// The CLEANSE program acts on
the file created with the DOS "dir *.url /o:n /x /w" command.
//
// Volume
in drive C is OS
// Volume Serial Number is 780A-AE63
//
// Directory
of C:\Users\Scott\Programming\Beginning_Programming-CPP2\QuickPrep1
//
//11/22/1999**02:32*PM***************132*COMPUT~1.URL*Computers & Structures,
Inc. Home Page.url
//11/22/1999**02:20*PM***************124*COSMOS~1.URL*COSMOS, the line of
powerful FEA software and design analysis.url
//11/22/1999**02:24*PM***************287*DOWNLO~1.URL*Download FEMAP and
mtabSTRESS FEA finite element analysis sof.url
//11/22/1999**02:01*PM***************116*ENGINE~1.URL*Engineering
News-Record-enr.com homepage.url
//11/22/1999**02:11*PM***************120*LUSASF~1.URL*LUSAS Finite Element
Analysis - Home Page.url
//11/22/1999**02:28*PM***************128*MATHTO~1.URL*Mathtools.net Scientific
portal for MATLAB, MIDEVA, Excel, C,.url
//11/22/1999**01:41*PM***************126*RESEAR~1.URL*RESEARCH ENGINEERS.url
//11/22/1999**02:21*PM***************124*WELCOM~1.URL*Welcome to AutoFEA -
Finite Element Analysis Software.url
//**************8 File(s) 1,157 bytes
//**************0 Dir(s) 1,336,025,403,392 bytes free
//***************************************^**********^****Note filename position.
//0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
// 1 2 3 4 5 6 7 8 9 X 1 2
//
// The contents of each .url file are
reviewed, the URL= line is extracted, and the file rewritten with just the label
// [InternetShortcut] and the URL= text. Yes, more includes than necessary. Pare
them down if you like.
//
#include <cstdio>
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <iosfwd>
#include
<cstring>
#include <sstream>
#include <string>
using namespace std;
//
// Global constants
const int MaxArrayChars = 1000; // Length of
"lines" as read from files.
//
// Replacement for the EOLN() function from
Pascal. Checks for Enf Of LiNe.
bool bEOLN(char ch)
{
if (ch == '\n') {
return true; }
else { return false; }
}
//
// Fill the cLine array
with blanks. Array spaces 0..254. Set length to zero.
// The number of cells
is not passed in the parameter.
void ClearLine(char caLine[], int& nLineL)
{
for (int n = 0; n < MaxArrayChars; n++) { caLine[n] = ' '; }
nLineL = 0;
}
//
// Using the supplied file, retrieve one line of characters and
return the
// number of characters found.
//
// Aside: The character
array variable begins with "c" for character and "a" for array.
// The
integer variable nLineL begins with "n" for integer and ends with "L" for
length.
// nLineL returns the number of occupied spaces in the array -> 1 if
[0] is filled,
// 7 if [0..6] are filled.
//
void ReadLine(istream&
inputFile, char caLine[], int& nLineL)
{
ClearLine(caLine,nLineL);
inputFile.get(caLine[0]);
while ((!inputFile.eof()) &&
(!bEOLN(caLine[nLineL])))
{
nLineL++;
inputFile.get(caLine[nLineL]);
}
}
// Write a line to the output file.
void WriteLine(ostream&
outputFile, char caLine[], int nLineL)
{
if (nLineL != 0)
{
for (int
m = 0; m < nLineL; m++) { outputFile.put(caLine[m]); }
outputFile << endl;
}
else { outputFile << endl; }
}
//
// The .url file looks like:
//[DEFAULT]
//BASEURL=http://www.autofea.com/ (Several of these statements
may be present.)
// (There may also be lonks to Google AdSense, Facebook,
DoubleClick, etc.)
//
//[InternetShortcut] (Only one of these)
//URL=http://www.autofea.com/ (Only one of these)
//Modified=40F884897235BF01AC
//(There may be more crap down here related to
Favorit.ICO)
//
//
void GetURL(istream& inputFile, char caURL[], int&
nURLL)
{
bool Flag = false;
while (!Flag)
{
ReadLine(inputFile,caURL,nURLL);
if ((caURL[0] == 'U') && (caURL[1] == 'R')
&& (caURL[2] == 'L')) { Flag = true; }
else { ClearLine(caURL,nURLL); }
}
}
//
// See above for filename position.
//
void
ExtractFilename(char caLine[],char caFilename[],int& nFilenameL)
{
nFilenameL = 0;
for (int n = 39; n <=50; n++)
{
switch (caLine[n])
{
// Eliminate forward spaces in filename using ' ' -> break.
case ' ': break;
default :
caFilename[nFilenameL] = caLine[n];
nFilenameL++;
}
}
}
//
// Build a string out of an array of characters
//
void
BuildString(char cFilename[], int nFilenameL, string& strFilename)
{
string strFn(1,cFilename[0]);
for (int n = 1; n < nFilenameL; n++) { strFn +=
cFilename[n]; }
strFilename = strFn;
}
//
// Reminder area for
previous functions
//
// bool bEOLN(char ch)
// void ClearLine(char
caLine[], int& nLine)
// void ReadLine(istream& inputFile, char caLine[],
int& nLineL)
// void WriteLine(ostream& outputFile, char caLine[], int
nLineL)
// void GetURL(istream& inputFile, char caURL[], int& nURLL)
//
void ExtractFilename(char caLine[],char caFilename[],int& nFilenameL)
// void
BuildString(char cFilename[], int nFilenameL, string& strFilename)
//
int
main(int nNumberofArgs, char* pszArgs[])
{
// Initialize main input and
output files and check for good input.
ifstream in_stream("dirlist.txt",
ios_base::in);
if (!in_stream)
{
cout << "Could not open file.
Exiting.";
exit(1);
}
//
// End of initialization.
//
// Main
program body.
int nTempL = 0; // Temporary Line Length
int nURL = 0; //
URL length
int nfilenameL = 0; // filename length (the same for both read and
write)
char caTempLine[MaxArrayChars]; // Temporary Line - an array of
characters.
char caULine[MaxArrayChars]; // Holds the .URL link text.
char
caTempFN[MaxArrayChars]; // Holds the .URL filename. Use standard array even
though only 12 spaces required.
string FNstr; // An assembled string from
array characters.
int nEndofList = 0; // Flag to indicate we are through
processing directory lines.
string sTempFN1; // The identifier for each .URL
to be read.
string sTempFN2; // The identifier for each .URL to be written.
//
// Dispose of first five lines in the "dirlist.txt" file.
for (int n =
1; n <= 5; n++) { ReadLine(in_stream,caTempLine,nTempL); }
//
// If the
file lacks .url entries, the first character is an "F" as in "File not found".
// Otherwise it will be a number. A space in the first position signals the end
of the
// .url entries. When nEndofList is found, value increases and loop is
terminated.
//
for (;;)
{
ClearLine(caTempLine,nTempL);
ReadLine(in_stream,caTempLine,nTempL);
if (nTempL != 0)
{
switch
(caTempLine[0])
{
case 'F':
cout << "No .url files found.";
nEndofList = 2;
exit(1);
case ' ':
cout << "End of .url listing found.
Normal termination.";
nEndofList = 1;
break;
default:
ExtractFilename(caTempLine,caTempFN,nfilenameL);
BuildString(caTempFN,nfilenameL,FNstr);
ifstream sTempFN1(FNstr.c_str(),
ios::in);
GetURL(sTempFN1,caULine,nURL);
sTempFN1.close();
//
ofstream sTempFN2(FNstr.c_str(), ios::out);
sTempFN2 <<
"[InternetShortcut]\n";
WriteLine(sTempFN2,caULine,nURL);
sTempFN2.flush();
sTempFN2.close();
}
if (nEndofList > 0) { break; }
}
else { cout << "Blank line encountered, skipping..."; }
}
in_stream.close();
return 0;
}
What makes the above code look funny is the lack of proper indentation. Bless HTML. Copy and paste and re-establish the indenting per your preference.